Spark k-Means

This node applies the Apache Spark K-means clustering algorithm. It outputs the cluster centers for a predefined number of clusters (no dynamic number of clusters). K-means performs a crisp clustering that assigns a data vector to exactly one cluster. The data is not normalized by the node (if required, you should consider to use the "Spark Normalizer" as a preprocessing step).

Use the Spark Cluster Assigner node to apply the learned model to unseen data.

Options

Number of clusters
The number of clusters (cluster centers) to be created.
Number of iterations
The maximal number of iterations after which the algorithm terminates, independent of the accuracy improvement of the cluster centers.
Initialization seed
Random seed for cluster initialization (requires Apache Spark 1.3 or later).
Feature Columns
The feature columns to learn the model from. Supports only numeric columns.

Input Ports

Icon
Input data (JavaRDD)

Output Ports

Icon
The input data labeled with the cluster they are contained in.
Icon
MLlib Cluster Model

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.