k-Medoids

Applies k-Medoids algorithm on the input table. Starting with a random initialization of the medoids, it iteratively performs an exhaustive search on the input data by determining the cost for swapping any medoid with any input data row. It then replaces the medoid with the data row that reduces the cost most unless no more cost reduction is possible (in which case it terminates) or the maximum number of iterations are run (or the node is canceled in the view). The costs are determined by either using a pre-computed distance matrix given (Port 0) or with the usage of a connected distance measure (Port 1).

Options

Partition Count (k): Enter the number of partitions (must be greater than 1)
Distance Column: Select the column containing the distance values. This option is disabled if a distance measure is connected (Port 1).
Chunk Size: How many rows to consider at once. This option has no effect on the output but only influence the runtime (larger chunk size resulting in more memory consumption but faster execution).
Constraint no. iterations: Allows limiting the number of iterations to run. If disabled it will run until the cost reduction is negative (no better solution available) or the calculation is finished in the view.
Use static seed: Seed used for random initialization. The random initialization has no practical impact on the clustering result (only for theoretical corner cases). If disabled, a "random" random seed is used.
Output relative distances to medoids: If selected, append additional columns to first output table, which reflect the relative distances to each of the medoids. The smaller the value the higher the membership to the respective partition. The values in the new columns sum to 1.
Choke on asymmetric distances: If selected, the node will fail when the input contains distance vectors that are marked as (potentially) not symmetric. Asymmetric distances may lead to infinite loops (due to alternating minimal). In most cases you should leave this box selected.

Input Ports

: Table containing the optional distance matrix.
: Optional distance measure, which renders the distance matrix at Port 0 unnecessary.

Output Ports

: Input table with additional column containing the partitioning information and the winner partition.
: Medoid vectors (from input table) along with the partition size.

Popular Predecessors

Popular Successors

Views

Learn Progress View: While executing it shows the cost reduction in each iteration. It usually starts with some large cost reduction (due to random initialization) but decreases as more iterations are run. The "Finish" button allows the user to stop the calculation after the current iteration in case the reduction is considered sufficiently small.

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Distance Matrix from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191418

On NodePit since: 2025-07-02

Last update: 2025-07-25

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!