Calculates for each pair of selected columns a correlation coefficient, i.e. a measure of the correlation of the two variables.
Which correlation measure is applied depends on the types of
the underlying variables:
numeric <-> numeric: Pearson's
product-moment coefficient. Missing values in a column are ignored in such a way that for the
computation of the correlation between two columns only complete records are taken into account. For
instance, if there are three columns A, B and C and a row contains a missing value in column A but not
in B and C, then the row will be ignored for computing the correlation between (A, B) and (A, C). It
will not be ignored for the correlation between (B, C). This corresponds to the function
cor(<data.frame>, use="pairwise.complete.obs") in the R statistics package.
The
value of this measure ranges from -1 (strong negative correlation) to 1 (strong positive correlation). A
value of 0 represents no linear correlation (the columns might still be highly dependent on each other,
though).
The p-value for these columns indicates the probability of an uncorrelated system
producing a correlation at least as extreme, if the mean of the correlation is zero and it follows a
t-distribution with df degrees of freedom.
nominal <-> nominal: Pearson's chi square test on the
contingency table. This value is then normalized to a range [0,1] using Cramer's V, whereby 0 represents no
correlation and 1 a strong correlation. Missing values in nominal columns are treated such as they were
a self-contained possible value. If one of the two columns contains more possible values than specified
in the dialog (default 50), the correlation will not be computed.
The p-value for these columns
indicates the probability of independent variables showing as extreme level of dependence. The value is
the same as for a chi-square test of independence of variables in a contingency table.
Correlation measures for other pairs of columns are not available, they are represented by missing
values in the output table and crosses in the accompanying view.
numeric <-> nominal pairs will be excluded. If only pairs with a valid correlation are included all pairs for which the correlation cannot be computed are excluded.You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.