Calculates for each pair of selected columns a correlation coefficient, i.e. a measure of the correlation of the two variables.
Which correlation measure is applied depends on the types of the
underlying variables:
numeric <-> numeric
:
Pearson's product-moment coefficient.
Missing values in a column are ignored in such a way that for the
computation of the correlation between two columns only complete
records are taken into account. For instance, if there are three
columns A, B and C and a row contains a missing value in column A
but not in B and C, then the row will be ignored for computing the
correlation between (A, B) and (A, C). It will not be ignored for
the correlation between (B, C). This corresponds to the function
cor(<data.frame>, use="pairwise.complete.obs")
in the R statistics package.
The value of this measure ranges from -1 (strong negative
correlation) to 1 (strong positive correlation). A value of 0
represents no linear correlation (the columns might still be
highly dependent on each other, though).
The p-value for these columns indicates the probability of an
uncorrelated system producing a correlation at least
as extreme, if the mean of the correlation is zero and it
follows a t-distribution with df degrees of freedom.
nominal <-> nominal
:
Pearson's chi square test on the contingency table.
This value is then normalized to a range [0,1] using
Cramer's V, whereby 0 represents no correlation and 1
a strong correlation. Missing values in nominal columns are
treated such as they were a self-contained possible value.
If one of the two columns contains more possible values than
specified in the dialog (default 50), the correlation will not
be computed.
The p-value for these columns indicates the probability of
independent variables showing as extreme level of dependence.
The value is the same as for a chi-square test
of independence of variables in a contingency table.
Correlation measures for other pairs of columns are not
available, they are represented by missing values in the output
table and crosses in the accompanying view.
numeric <-> nominal
pairs will be excluded.
If only pairs with a valid correlation are included all pairs for which
the correlation cannot be computed are excluded.You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.