Explanation:
Assume that we have a data set in a 2-dimensional
Euclidean space and we want to estimate the probability that a point P1 (x,y) is part of this set.
Obviously, the 'closer' the P1 is to the center of mass in the set, the more likely it is contained.
Also we have to consider the spread of the data. A Data set with correlated variables will form a
ellipse around the center of mass in the 2-dimensional Euclidean space. So the probability that a test
point is contained in the set is also depend on the direction of the axis of that ellipse - or ellipsoid
in a N-dimensional Euclidean space. The ellipsoid that best represents the set's probability
distribution can be estimated by building the covariance matrix of the samples, which is actually used
by the Mahalanobis distance.
If the covariance matrix is the identity matrix the variables of the
data set are not correlated and the Mahalanobis distance reduces to the Euclidean distance.
Use
case:
A typical use case is the outlier detection. These are intuitively points with a very high
Mahalanobis distance in contrast to points in the data set.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Distance Matrix from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!