Mosaic plot between three categorical columns

Discover relationship between three categorical features
===========================================
Draws mosaic plot between three categorical variables. Input variable values (labels) are abbreviated to four letters in the graph. Data types of the three columns must be of string type.

Output plot indicates both the p-value of chi-sqaure test and Pearson Residuals. Null hypothesis is that there is no relationship between the features and all are independent. Value of p less than 0.05, as a rule of thumb, indicates relationship .

Pearson residuals = (obs - exp) / sqrt(exp)

Pearson residuals are calculated for each cell. As a rule of thumb, a cell with pearson residual of 3 or greater contributes more to relationship. Intensity of colour in each cell also indicates the extent observed values deviate from expected values.

Note that mosaic() splits the data in the order in which the variables are provided: first on Ist Categorical feature, then on IInd and finally on IIIrd.

For more about mosaic plots please see Wikipedia: https://en.wikipedia.org/wiki/Mosaic_plot . For more about pearson's residuals please see: https://www.statology.org/pearson-residuals/ . For a more thorough overview, please see: https://www.datavis.ca/courses/VCD/vcd-tutorial.pdf. The component needs R's vcd, package.

Options

Ist Categorical feature
Select a string data type
IInd Categorical feature
Select a string data type%%00009%%00009%%00009
IIIrd Categorical feature
Select a string column

Input Ports

Icon
Input data may be KNIME data frame

Output Ports

Icon
Mosaic plot between three variables. p-value less than 0.05 may indicate relationship between the categories

Nodes

Extensions

Links