KNIME Distance Matrix Extension version 3.7.0.v201809280949 by KNIME AG, Zurich, Switzerland
Assume that we have a data set in a 2-dimensional Euclidean space and we want to estimate the probability that a point P1 (x,y) is part of this set. Obviously, the 'closer' the P1 is to the center of mass in the set, the more likely it is contained. Also we have to consider the spread of the data. A Data set with correlated variables will form a ellipse around the center of mass in the 2-dimensional Euclidean space. So the probability that a test point is contained in the set is also depend on the direction of the axis of that ellipse - or ellipsoid in a N-dimensional Euclidean space. The ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples, which is actually used by the Mahalanobis distance.
If the covariance matrix is the identity matrix the variables of the data set are not correlated and the Mahalanobis distance reduces to the Euclidean distance.
A typical use case is the outlier detection. These are intuitively points with a very high Mahalanobis distance in contrast to points in the data set.
To use this node in KNIME, install KNIME Distance Matrix Extension from the following update site:
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.