0 ×

Spark k-Means

KNIME Extension for Apache Spark core infrastructure version 4.2.0.v202007072005 by KNIME AG, Zurich, Switzerland

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.


Number of clusters
The number of clusters (cluster centers) to be created.
Number of iterations
The maximal number of iterations after which the algorithm terminates, independent of the accuracy improvement of the cluster centers.
Initialization seed
Random seed for cluster initialization (requires Apache Spark 1.3 or later).
Feature Columns
The feature columns to learn the model from. Supports only numeric columns.

Input Ports

Input data (JavaRDD)

Output Ports

The input data labeled with the cluster they are contained in.
MLlib Cluster Model

Best Friends (Incoming)

Best Friends (Outgoing)



To use this node in KNIME, install KNIME Extension for Apache Spark from the following update site:


A zipped version of the software site can be downloaded here. Read our FAQs to get instructions about how to install nodes from a zipped update site.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.