k-Medoids

Applies k-Medoids algorithm on the input table. Starting with a random initialization of the medoids, it iteratively performs an exhaustive search on the input data by determining the cost for swapping any medoid with any input data row. It then replaces the medoid with the data row that reduces the cost most unless no more cost reduction is possible (in which case it terminates) or the maximum number of iterations are run (or the node is canceled in the view). The costs are determined by either using a pre-computed distance matrix given (Port 0) or with the usage of a connected distance measure (Port 1).

Options

Partition Count (k)
Enter the number of partitions (must be greater than 1)
Distance Column
Select the column containing the distance values. This option is disabled if a distance measure is connected (Port 1).
Chunk Size
How many rows to consider at once. This option has no effect on the output but only influence the runtime (larger chunk size resulting in more memory consumption but faster execution).
Constraint no. iterations
Allows limiting the number of iterations to run. If disabled it will run until the cost reduction is negative (no better solution available) or the calculation is finished in the view.
Use static seed
Seed used for random initialization. The random initialization has no practical impact on the clustering result (only for theoretical corner cases). If disabled, a "random" random seed is used.
Output relative distances to medoids
If selected, append additional columns to first output table, which reflect the relative distances to each of the medoids. The smaller the value the higher the membership to the respective partition. The values in the new columns sum to 1.
Choke on asymmetric distances
If selected, the node will fail when the input contains distance vectors that are marked as (potentially) not symmetric. Asymmetric distances may lead to infinite loops (due to alternating minimal). In most cases you should leave this box selected.

Input Ports

Icon
Table containing the optional distance matrix.
Icon
Optional distance measure, which renders the distance matrix at Port 0 unnecessary.

Output Ports

Icon
Input table with additional column containing the partitioning information and the winner partition.
Icon
Medoid vectors (from input table) along with the partition size.

Views

Learn Progress View
While executing it shows the cost reduction in each iteration. It usually starts with some large cost reduction (due to random initialization) but decreases as more iterations are run. The "Finish" button allows the user to stop the calculation after the current iteration in case the reduction is considered sufficiently small.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.