0 ×

**KNIME Distance Matrix Extension** version **4.4.0.v202104231044** by **KNIME AG, Zurich, Switzerland**

The
Mahalanobis Distance
is a metric, which measures the distance of two data sets with respect to the variance and covariance of the
selected variables.

It is defined as

*d*
(x,y) = ((x-y)
^{T}
S
^{-1}
(x-y))
^{1/2}

Whereby x and y are two random vectors on the same distribution with the convariance matrix S.

It is defined as

Whereby x and y are two random vectors on the same distribution with the convariance matrix S.

Explanation:

Assume that we have a data set in a 2-dimensional Euclidean space and we want to estimate the
probability that a point P1 (x,y) is part of this set. Obviously, the 'closer' the P1 is to the center
of mass in the set, the more likely it is contained. Also we have to consider the spread of the data. A
Data set with correlated variables will form a ellipse around the center of mass in the 2-dimensional
Euclidean space. So the probability that a test point is contained in the set is also depend on the
direction of the axis of that ellipse - or ellipsoid in a N-dimensional Euclidean space. The ellipsoid
that best represents the set's probability distribution can be estimated by building the covariance
matrix of the samples, which is actually used by the Mahalanobis distance.

If the covariance matrix is the identity matrix the variables of the data set are not correlated and the
Mahalanobis distance reduces to the Euclidean distance.

Use case:

A typical use case is the outlier detection. These are intuitively points with a very high Mahalanobis
distance in contrast to points in the data set.

- Column Selection
- Choose the numeric columns for which the distance is defined.

- Input data.
- Optional covariance input table. The matrix must be quadratic and have identical column/row pairs. If unconnected the covariance matrix is computed on the selected input columns.

~~Linear Correlation~~(11 %) Deprecated- Column Filter (8 %) Streamable
- Partitioning (6 %)
- K Nearest Neighbor (Distance Function) (6 %)
~~K Nearest Neighbor (Distance Function)~~(5 %) Deprecated- Show all 75 recommendations

~~K Nearest Neighbor (Distance Function)~~(18 %) Deprecated- DBSCAN (15 %)
- Hierarchical Clustering (DistMatrix) (12 %)
- K Nearest Neighbor (Distance Function) (10 %)
- Distance Matrix Calculate (9 %)
- Show all 52 recommendations

- 03_NIR_Spectral_Data_Inhouse_Database_Search (KNIME Hub)
- 17887 (KNIME Hub)

To use this node in KNIME, install KNIME Distance Matrix from the following update site:

KNIME 4.4

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.