This workflow uses the DBSCAN clustering algorithm to detect fraud by identifying outliers in credit card transaction data. Density-based spatial clustering of applications with noise (DBSCAN) is a unsupervised clustering algorithm that works well with data that does not vary significantly across different parts of the dataset. We normalize the training data and sample a subset for analysis, outliers are tagged for potential fraud. Metrics are extracted at the end for viewing through the 'Scorer' node.
Steps taken for training:
1. Read Training Data
2. Data Preprocessing: Normalize the data into range [0,1] or [good,fraud] and Save Normalizer model
3. Train DBSCAN using Euclidean Distance
4. Mark Outliers and Evaluate Model Results
URL: Kaggle Dataset https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!