Icon

Fraud Detection

<p><strong>Fraud Detection of Credit Card Transactions</strong></p><p>This workflow shows an overview of different outlier detection techniques for identifying fraudulent credit card transactions. After accessing the credit card fraud detection dataset, the data is partitioned (train set, validation set and test set) and normalized. For each technique, both performance metrics and predictions are output. The seven different techniques are:</p><ul><li><p>Quartiles, Distribution and Clustering (DBSCAN)</p></li><li><p>Isolation Forest and Autoencoder</p><ul><li><p>For the Autoencoder, make sure to select the proper Conda environment for Keras under "Preferences &gt; Python Deep Learning". For more info and installation guidance, check the pertinent docs.</p></li></ul></li><li><p>Logistic Regression and Random Forest</p></li></ul><p><strong>Important:</strong> The performance of the techniques is evaluated on the same test set and, given the heavily imbalanced dataset, this is reported in terms of <em>Recall </em>and <em>Precision</em>.</p>

URL: Four Techniques for Outlier Detection https://www.knime.com/blog/four-techniques-for-outlier-detection
URL: Fraud Detection using Random Forest, Neural Autoencoder, and Isolation Forest Techniques https://www.knime.com/blog/fraud-detection-using-random-forest
URL: Credit Card Fraud Detection dataset on Kaggle https://www.kaggle.com/mlg-ulb/creditcardfraud
URL: Overview of Credit Card Fraud Detection Techniques https://youtu.be/-S5f87k8LXI

Fraud Detection with different techniques

Detecting fraudulent transactions with quartile-, distribution-, and cluster-based techniques

Detecting fraudulent transactions by autoencoder and isolation forest

Detecting fraudulent transactions with machine learning-based techniques

Fraud Detection of Credit Card Transactions


This workflow shows an overview of different outlier detection techniques for identifying fraudulent credit card transactions. After accessing the credit card fraud detection dataset, the data is partitioned (train set, validation set and test set) and normalized. For each technique, both performance metrics and predictions are output. The seven different techniques are:

  • Quartiles, Distribution and Clustering (DBSCAN)

  • Isolation Forest and Autoencoder

    • For the Autoencoder, make sure to select the proper Conda environment for Keras under "Preferences > Python Deep Learning". For more info and installation guidance, check the pertinent docs.

  • Logistic Regression and Random Forest

Important: The performance of the techniques is evaluated on the same test set and, given the heavily imbalanced dataset, this is reported in terms of Recall and Precision.

Data Reading

Credit card fraud detection dataset.

Download full dataset from Kaggle.

Data Pre-Processing

  1. Create target column

  2. Partitioning the data into test & training (80/20)

  3. Normalize data (z-score normalization)

  4. Creating training & validation set

Note: The dataset is heavily unbalanced. Instead of downsampling the majority class or upsampling the minority class, we opted to use class statistics (Precision/Recall) to assess the goodness of the models.

Joiner
Joiner
Compare Recall/Precisionacross different techniques
Bar Chart
Training and validation sets
Target columngood/bad
Expression
DBSCAN
Clustering based method
Normalize
Read credit card data (sample)
CSV Reader
Joiner
Joiner
Joiner
Joiner
80/20
Table Partitioner
Isolation forest
Distribution based
Random forest
Quartile based
Logistic regression
Autoencoder

Nodes

Extensions

Links