Spark k-Means

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.


Number of clusters
The number of clusters (cluster centers) to be created.
Number of iterations
The maximal number of iterations after which the algorithm terminates, independent of the accuracy improvement of the cluster centers.
Initialization seed
Random seed for cluster initialization (requires Apache Spark 1.3 or later).
Feature Columns
The feature columns to learn the model from. Supports only numeric columns.

Input Ports

Input data (JavaRDD)

Output Ports

The input data labeled with the cluster they are contained in.
MLlib Cluster Model


This node has no views




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.