Crosstab

Creates a cross table (also referred as contingency table or cross tab). It can be used to analyze the relation of two columns with categorical data and does display the frequency distribution of the categorical variables in a table.

This node provides chi-square test statistics and, in case of a cross tabulation of 2x2 dimension, Fisher's exact test. Both statistics test the null hypothesis of no association between the row variable and the column variable. The p-values are provided in the view and in the second output port.

Options

Row variable
The input column used as the row variable in the cross-tabulation.
Column variable
The input column used as the column variable in the cross-tabulation.
Weight column
Applies a numeric weight for each record in the input causing the Crosstab node to treat each record as if it were repeated WEIGHT number of times.
Enable hiliting
If enabled, the hiliting of a cell in the crosstab will hilite all cells with same categories in attached views. Depending on the number of rows, enabling this feature might consume a lot of memory.

Input Ports

Icon
Input table containing columns with categorical data.

Output Ports

Icon
The cross table in list form.
Icon
The table with the statistics.

Views

Cross tabulation
The following properties are displayed in the cross tabulation view:
Frequency: The cell frequency.
Expected: The expected frequency which is computed as (column total / total) * row total.
Deviation: The deviation is computed as Frequency - Expected.
Percent: The percent is the relative frequency computed as Frequency / total.
Row Percent: The row percent is computed as Frequency / row total.
Column Percent: The column percent is computed as Frequency / column total.
Cell Chi-Square: The contribution of this cell to the value of the Chi-Square statistic. The Cell Chi-Square sums up to the value of the Chi-Square statistic.

For some properties the row totals and column totals are displayed beside the table and underneath the table, respectively.
You can control the size of the displayed table with the Max rows and the Max columns controls.

The statistics table provides chi-square test statistics and, in case of a cross tabulation of 2x2 dimension, Fisher's exact test. Both statistics test the null hypothesis of no association between the row variable and the column variable. You can reject the null hypothesis when the p-value (Prop) is less than a significance value which is typically 0.01 or 0.05. In this case the result is said to be statistically significant. Please bear in mind that the Chi-Square test is based on some assumptions.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.