0 ×

Spark k-Means

KNIME Extension for Apache Spark core infrastructure version 4.4.0.v202106241517 by KNIME AG, Zurich, Switzerland

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.


Number of clusters
The number of clusters (cluster centers) to be created.
Number of iterations
The maximal number of iterations after which the algorithm terminates, independent of the accuracy improvement of the cluster centers.
Initialization seed
Random seed for cluster initialization (requires Apache Spark 1.3 or later).
Feature Columns
The feature columns to learn the model from. Supports only numeric columns.

Input Ports

Input data (JavaRDD)

Output Ports

The input data labeled with the cluster they are contained in.
MLlib Cluster Model

Best Friends (Incoming)

Best Friends (Outgoing)



To use this node in KNIME, install KNIME Extension for Apache Spark (legacy) from the following update site:


A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.