Icon

03 Analyze Clustering location Data

<p><strong>Analyze Location Data by Clustering</strong></p><p>In this workflow, we perform a clustering task on location data where we have longitude and latitude information. We use the <strong>k-Means algorithm</strong> to cluster this data and then visualize the clustering results.</p>

URL: What is Clustering and How Does it Work? - KNIME Blog https://www.knime.com/blog/what-is-clustering-how-does-it-work
URL: Training Clustering Algorithms - KNIME TV - YouTube https://www.youtube.com/watch?v=7luMauX0KWM
URL: Clustering - KNIME TV - YouTube https://www.youtube.com/watch?v=H7Rmq_NpI8o
URL: KNIME Cheat Sheet : Building a KNIME Workflow for Beginners https://www.knime.com/sites/default/files/2021-07/CheatSheet_Beginner_A3.pdf
URL: KNIME Self Paced Course https://www.knime.com/knime-self-paced-courses

Model training

Train the algorithm using the k-Means node. The number of clusters (k) needs to be selected manually. Here, we set k=3.

Read data

The data contains various attributes about different houses and their price.

Pre-processing (data preparation)

Model evaluation
How to train a k-Means model?

Step 1: Drag the k-Means node and click on it to open the dialog.

Step 2: Set the "Number of clusters" to 3. In the "Column selection", include the columns "Lat" and "Long".

Step 3: Click on the node and "Execute" it to perform clustering.

How to evaluate a k-Means model?

Step 1:To visualize the clusters, add the ''Scatter Plot'' and the ''OSM Map View'' nodes. These nodes should be connected to the Color Manager to visualize clusters with colors.

Step 2:To evaluate the clustering task, connect the clustering output to the "Silhouette Coefficient" node. Select "Cluster" as the clustering column.

Step 3:Execute the node to get Silhouette Coefficients for each instance, each cluster, and for the overall clustering task.

Analyze Location Data by Clustering


In this workflow, we perform a clustering task on location data where we have longitude and latitude information. We use the k-Means algorithm to cluster this data and then visualize the clustering results.

Filter data

Keep only data for houses in California

Normalize data

Apply min-max normalization to latitude and longitude

Workflow complete!

Keep the momentum going by exploring Just KNIME It!on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Color databy cluster assignment
Color Manager
Readlocation_data.table
Table Reader
Visualize clusterson world map
OSM Map View
Evaluate cluster performance (higher value is preferred)
Silhouette Coefficient
Denormalize data:Lat & Long backto original values
Denormalizer
Standardizelat & long
Normalizer
Cluster data:k=3
k-Means
In California
Row Filter
Visualize clusters
Scatter Plot

Nodes

Extensions

Links