Icon

03 Analyze Data by Clustering Location Data

<p><strong>Analyze Data: Clustering of Location Data</strong></p><p>In this workflow, we perform a <strong>clustering </strong>task on location data where we have longitude and latitude information. We use the <strong>k-Means algorithm</strong> to cluster this data and then visualize the clustering results.</p>

URL: KNIME Self Paced Course https://www.knime.com/knime-self-paced-courses
URL: KNIME Cheat Sheet: Building a KNIME workflow for beginners https://www.knime.com/cheat-sheets/building-knime-workflow-beginners
URL: KNIME Cheat Sheet: Machine learning with KNIME Analytics Platform https://www.knime.com/files/machine-learning-with-knime.pdf
URL: YouTube: Clustering https://youtu.be/C-YAdPOg9BM?si=qRoBRTApXxqAiEAA
URL: YouTube: Training Clustering Algorithms https://youtu.be/i47dBwK8KfQ?si=szK-ax8Vlgc5eYRo
URL: KNIME Blog: Cluster analysis: What it is, types & how to apply the technique without code https://www.knime.com/blog/what-is-clustering-how-does-it-work
URL: Webinar: KNIME101: Machine Learning for Beginners with KNIME https://www.knime.com/events/knime101-machine-learning-beginners-knime

Model training

Train the algorithm using the k-Means node. The number of clusters (k) needs to be selected manually. Here, we set k=3.

Read data

The data contains various attributes about different houses and their price.

Pre-processing (data preparation)

Model evaluation
How to train a k-Means model?

Step 1: Drag the k-Means node into the workflow and click on it to open the configuration window.

Step 2: Set the "Number of clusters" to 3. In the "Column selection", include the columns "Lat" and "Long".

Step 3: Click on "Apply and Execute" to perform the clustering.

How to evaluate a k-Means model?

Step 1: To visualize the clusters, add the ''Scatter Plot'' and the ''OSM Map View'' nodes. These nodes should be connected to the "Color Manager" to visualize clusters with colors.

Step 2: To evaluate the clustering task, connect the clustering output to the "Silhouette Coefficient" node. Select "Cluster" as the clustering column.

Step 3: Execute the node to get Silhouette Coefficients for each instance, each cluster, and for the overall clustering task.

Analyze Data: Clustering of Location Data


In this workflow, we perform a clustering task on location data where we have longitude and latitude information. We use the k-Means algorithm to cluster this data and then visualize the clustering results.

Filter data

Keep only data for houses in California

Normalize data

Apply min-max normalization to latitude and longitude

Workflow complete!

Keep the momentum going by exploring Just KNIME It! on the Hub to challenge yourself and see how these nodes can be integrated into more complex workflows and use cases.

Color databy cluster assignment
Color Manager
Readlocation_data.table
Table Reader
Visualize clusterson world map
OSM Map View
Evaluate cluster performance (higher value is preferred)
Silhouette Coefficient
Denormalize data:Lat & Long backto original values
Denormalizer
Standardizelat & long
Normalizer
Cluster data:k=3
k-Means
In California
Row Filter
Visualize clusters
Scatter Plot

Nodes

Extensions

Links