Mosaic plot between two categorical columns

Discover relationship between two categorical features
============================================
Draws mosaic plot between two categorical variables. Input variable values (labels) are abbreviated to four letters in the graph. Data types of the two columns must be of string type.

Output plot indicates both the p-value of chi-sqaure test and Pearson Residuals. Null hypothesis is that there is no relationship between the features and both are independent. Value of p less than 0.05, as a rule of thumb, indicates relationship .

Pearson residuals = (obs - exp) / sqrt(exp)

Pearson residuals are calculated for each cell. As a rule of thumb, a cell with pearson residual of 3 or greater contributes more to relationship. Intensity of colour in each cell also indicates the extent observed values deviate from expected values.

Note that mosaic() splits the data in the order in which the variables are provided: first on Ist categorical feature then on IInd categorical feature.

For more about mosaic plots please see Wikipedia: https://en.wikipedia.org/wiki/Mosaic_plot . For more about pearson's residuals please see: https://www.statology.org/pearson-residuals/ . For a more thorough overview, please see: https://www.datavis.ca/courses/VCD/vcd-tutorial.pdf. The component needs R's vcd, package.

Options

Categorical feature--I
Must be of string type
Categorical feature--II
Must be of string type

Input Ports

Icon
Input data may be KNIME Data frame

Output Ports

Icon
Mosaic plot between two variables. p-value less than 0.05 may indicate relationship between two categories

Nodes

Extensions

Links