k-Means (distance)

This component runs K-means algorithm and outputs the Euclidean distance between every point and the clusters' centroids.

In the configuration dialog, you can select whether to calculate the distance to every cluster's centroid or only to the one each points belongs to.

The clustering algorithm uses the Euclidean distance on the selected attributes. The data is not normalized by the node (if required, you should consider to use the "Normalizer" as a preprocessing step).

This component is free to use and modify.
Author: Andrea De Mauro, aboutbigdata.net

Options

Columns selection (all numeric columns will be used for clustering)
Select columns to be kept (only and all numeric columns will be used for clustering).
Number of clusters
Number of clusters to be created.
Computes distance between each point and:
Select wheter you would like to see the distance between every point and every cluster or only the cluster which the point belongs to.

Input Ports

Icon
Input data for the clustering. Only numeric columns are considered in the clustering.

Output Ports

Icon
The input data labeled with the cluster they are contained in.
Icon
The created clusters, i.e. the coordinates of the centroids.
Icon
PMML cluster model

Nodes

Extensions

Links