k-Medoids

Applies k-Medoids algorithm on the input table. Starting with a random initialization of the medoids, it iteratively performs an exhaustive search on the input data by determining the cost for swapping any medoid with any input data row. It then replaces the medoid with the data row that reduces the cost most unless no more cost reduction is possible (in which case it terminates) or the maximum number of iterations are run (or the node is canceled in the view). The costs are determined by either using a pre-computed distance matrix given (Port 0) or with the usage of a connected distance measure (Port 1).

Options

Distance matrix column
Select the column containing the distance values. This option is only visible if no distance measure is connected to Port 1.
Partition count (k)
Enter the number of partitions (must be greater than 1).
Chunk size
Specify the number of rows to consider at once. This option has no effect on the output but only influences the runtime (larger chunk size resulting in more memory consumption but faster execution).
Constrain number of iterations
Allows limiting the number of iterations to run. If disabled, it will run until the cost reduction is negative (no better solution available) or the calculation is finished in the view.
Use static seed
Seed used for random initialization. The random initialization has no practical impact on the clustering result (only for theoretical corner cases). If disabled, a "random" random seed is used.
Random seed
The seed value used for random initialization. Use the same seed to get identical results across multiple executions. You can enter a custom value or use the random seed generation button below.
Draw seed
Generate a random seed and set it in the Random seed input above for reproducible runs.
Output relative distances to medoids
If selected, append additional columns to first output table, which reflect the relative distances to each of the medoids. The smaller the value the higher the membership to the respective partition. The values in the new columns sum to 1.
Choke on asymmetric distances
If selected, the node will fail when the input contains distance vectors that are marked as (potentially) not symmetric. Asymmetric distances may lead to infinite loops (due to alternating minima). In most cases you should leave this box selected.

Input Ports

Icon
Table containing the optional distance matrix.
Icon
Optional distance measure, which renders the distance matrix at Port 0 unnecessary.

Output Ports

Icon
Input table with additional column containing the partitioning information and the winner partition.
Icon
Medoid vectors (from input table) along with the partition size.

Views

Learn Progress View
While executing it shows the cost reduction in each iteration. It usually starts with some large cost reduction (due to random initialization) but decreases as more iterations are run. The "Finish" button allows the user to stop the calculation after the current iteration in case the reduction is considered sufficiently small.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.