Icon

Fraud Detection with Snowflake

<p><strong>Fraud Detection of Credit Card Transactions with Snowflake</strong></p><p>This workflow shows an overview of different outlier detection techniques for identifying fraudulent credit card transactions. After connecting to a Snowflake database and loading credit card transaction data, the workflow partitions the data (train set, validation set and test set) and normalizes it. For each technique, both performance metrics and predictions are output. The six different techniques are:</p><ul><li><p>Quartiles, Distribution and Clustering (DBSCAN)</p></li><li><p>Isolation Forest</p></li><li><p>Logistic Regression and Random Forest</p></li></ul><p><strong>Important:</strong> The performance of the techniques is evaluated on the same test set and, given the heavily imbalanced dataset, this is reported in terms of <em>Recall </em>and <em>Precision</em>.</p>

URL: Snowflake Extension Guide | KNIME Documentation https://docs.knime.com/ap/latest/snowflake_extension_guide/#quickstart-with-snowflake-in-knime

Fraud Detection with different techniques

Detecting fraudulent transactions with quartile-, distribution-, and cluster-based techniques

Detecting fraudulent transactions by isolation forest

Detecting fraudulent transactions with machine learning-based techniques

Fraud Detection of Credit Card Transactions with Snowflake


This workflow shows an overview of different outlier detection techniques for identifying fraudulent credit card transactions. After connecting to a Snowflake database and loading credit card transaction data, the workflow partitions the data (train set, validation set and test set) and normalizes it. For each technique, both performance metrics and predictions are output. The six different techniques are:

  • Quartiles, Distribution and Clustering (DBSCAN)

  • Isolation Forest

  • Logistic Regression and Random Forest

Important: The performance of the techniques is evaluated on the same test set and, given the heavily imbalanced dataset, this is reported in terms of Recall and Precision.

Connect to Snowflake and write Credit Card data to database

Data Pre-Processing

  1. Partitioning the data into test & training (80/20)

  2. Normalize data (z-score normalization)

  3. Creating training & validation set

Note: The dataset is heavily unbalanced. Instead of downsampling the majority class or upsampling the minority class, we opted to use class statistics (Precision/Recall) to assess the goodness of the models.

📍Technical Note:

[For users with a free 30-day trial account] In the Write Credit Card data to DB component, keep the default configurations and simply execute the component. The Table name is defined automatically.

[For users with a regular account] Double-click on the Write Credit Card data to DB component and provide your preferred Database and Schema names (they must already exist). The Table name is defined automatically.

💡Pro tip: If your working with very large datasets, we recommend replacing the DB Writer node with the DB Table Structure Creator + DB Loader nodes for faster, bulk data loading to database.

📍Technical Note:

If you don't have a Snowflake account, you can sign up for a free 30-day trial account (when signing up, select Enterprise edition and choose any Snowflake cloud/region - preferably AWS or Azure).

Configure the Snowflake Connector node:

After signing up, navigate to your Snowflake Account details. There, you'll find key information to configure the node:

  • Account identifier (input it in Full account name field)

  • Login name (input it in Authentication > Username & password field). The password field is the password used to log in to your Snowflake account.

  • Role (input it in Default access control role). Make sure the role is ACCOUNTADMIN.

  • Stay in your Account details and navigate to the Config File tab. Select the warehouse you prefer (e.g., SNOWFLAKE_LEARNING_WH, if you're using a free trial account) and input the warehouse name in the Virtual warehouse field.

Compare Recall/Precisionacross different techniquesto detect fraud
Bar Chart
Training and validation sets
DBSCAN
Clustering based method
Normalize
Double-click andprovide Database nameand/or Schema nameNote: if database table already exists, it will be overwritten
Write Credit Card data to DB
DB Connection Closer
Top: Training (80%)Bottom: Test (20%)
DB Table Partitioner
Normalize
Connect todatabase
Snowflake Connector
Random forest
Trainingto KNIME table
DB Reader
Test toKNIME table
DB Reader
Isolation forest
Combine scores
Distribution based
Quartile based
Logistic regression

Nodes

Extensions

Links