Icon

Customer Segmentation with Snowflake

<p><strong>Customer Segmentation with Snowflake</strong></p><p>This workflow connects to a Snowflake database, loads customer transaction data, and implements basic customer segmentation through a clustering algorithm, k-Means. Customer segmentation can help Sales and Marketing departments identify hidden patterns in customer behavior/preferences and define better expansion and retention strategies:</p><ul><li><p>The sample data provided represents customer transaction and spending behaviors, as well as customer demographics. Raw data is joined, partitioned (existing vs. new customers) and preprocessed (missing value handling, outlier detection and normalization).</p></li><li><p>Using the Elbow method, it's possible to visually estimate the best number (k) of clusters for the algorithm. Next, the k-Means algorithm is used to segment customers and the obtained clusters are assigned to new customer.</p></li><li><p>Obtained clusters can be visualized and inspected in an interactive view and their quality assessed with a scoring metric. Clustered new data can be further exported for further processing or reporting.</p></li></ul>

URL: Snowflake Extension Guide | KNIME Documentation https://docs.knime.com/ap/latest/snowflake_extension_guide/#quickstart-with-snowflake-in-knime

Clustering

  1. k-Means

  2. Assign clusters to new customers

Data Pre-Processing

  1. Data partitioning (current vs. new customers to cluster)

  2. Missing value handling

  3. Outlier detection

  4. Normalization (z-score)

  5. Dimensionality reduction (PCA) for cluster viz

  6. Elbow method to determine the value of k

Cluster Visualization & Evaluation

  1. Denormalization

  2. Cluster Visualization

  3. Cluster quality evaluation (Silhouette coefficient)

  4. Save results


Customer Segmentation with Snowflake


This workflow connects to a Snowflake database, loads customer transaction data, and implements basic customer segmentation through a clustering algorithm, k-Means. Customer segmentation can help Sales and Marketing departments identify hidden patterns in customer behavior/preferences and define better expansion and retention strategies:

  • The sample data provided represents customer transaction and spending behaviors, as well as customer demographics. Raw data is joined, partitioned (existing vs. new customers) and preprocessed (missing value handling, outlier detection and normalization).

  • Using the Elbow method, it's possible to visually estimate the best number (k) of clusters for the algorithm. Next, the k-Means algorithm is used to segment customers and the obtained clusters are assigned to new customer.

  • Obtained clusters can be visualized and inspected in an interactive view and their quality assessed with a scoring metric. Clustered new data can be further exported for further processing or reporting.

Connect to Snowflake and write Customer data to database
  1. Customer transaction and spending behavior

  2. Customer demographics

📍Technical Note:

If you don't have a Snowflake account, you can sign up for a free 30-day trial account (when signing up, select Enterprise edition and choose any Snowflake cloud/region - preferably AWS or Azure).

Configure the Snowflake Connector node:

After signing up, navigate to your Snowflake Account details. There, you'll find key information to configure the node:

  • Account identifier (input it in Full account name field)

  • Login name (input it in Authentication > Username & password field). The password field is the password used to log in to your Snowflake account.

  • Role (input it in Default access control role). Make sure the role is ACCOUNTADMIN.

  • Stay in your Account details and navigate to the Config File tab. Select the warehouse you prefer (e.g., SNOWFLAKE_LEARNING_WH, if you're using a free trial account) and input the warehouse name in the Virtual warehouse field.

📍Technical Note:

[For users with a free 30-day trial account] In the Write Customer data to DB component, keep the default configurations and simply execute the component. The Table names are defined automatically.

[For users with a regular account] Double-click on the Write Customer data to DB component and provide your preferred Database and Schema names (they must already exist). The Table names are defined automatically.

💡Pro tip: If your working with very large datasets, we recommend replacing the DB Writer node with the DB Table Structure Creator + DB Loader nodes for faster, bulk data loading to database.

Missing value handlingOutlier detectionNormalizationWrite cleaned data to DB
Apply data processing
Connect todatabase
Snowflake Connector
H2O Local Context
DB Connection Closer
Table to H2O
Double-click andprovide Database nameand/or Schema nameNote: if database table already exists, it will be overwritten
Write Customer data to DB
To new customer data
Snowflake H2O MOJO Predictor (Cluster Assigner)
FormattingDenormalization
Post-processing
H2O Model to MOJO
k = 4
H2O k-Means
Save clustereddata back to DB
DB Writer (DB Data)
Visualize clusters
Top: Existing CustomersBottom: New Customers
Join and partition
Find number of clusters(k) by visual inspection
Elbow Method
Missing value handling,Outlier detection,Normalization
Data processing
H2O to Table
Check cluster quality
Silhouette Coefficient

Nodes

Extensions

Links