Silhouette Coefficient

This node computes the Silhouette Coefficient for the provided clustering result. The Silhouette Coefficient is a useful metric for evaluating clustering performance. For each row, it is computed using (b - a) / max(a, b), where a is the mean intra-cluster distance and b is the mean inter-cluster distance to the closest cluster. Additionally, a second table containing the mean over all individual Silhouette Coefficients is calculated. The score can range from -1.0 to 1.0, while the higher the score, the better. There have to be at least two clusters for the score to be computable.

By default, the Euclidean distance is used to calculate distances between rows. This may be changed by providing an optional distance function. If a distance function is supplied, the data column selection in the dialog will be ignored as the used columns are configured by the connected distance function.

Computing the Silhouette Coefficient is computationally expensive, thus it is recommended to subsample if the original dataset is large.

Options

Data Column Selection
Columns to be used for distance computation.
Clustering Column Selection
Column containing the name of the cluster for each row.

Input Ports

Icon
The table with input data and a clustering column.
Icon
Optional distance function.

Output Ports

Icon
The original table with appended Silhouette Coefficient column.
Icon
A table with one column and one row containing the mean Silhouette Coefficient of all samples.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.