Icon

Fraud_​Detection_​Distribution_​Training

Fraud Detection: Distribution Method Training

In this workflow, the Distribution Method is used to check for fraud. The Distribution method for classification is particularly useful in data where majority of the data is expected to follow a certain pattern or distribution. For credit card transactions, we can use this to help determine whether there is potential fraud or not. We start with reading in the training data from a sample dataset. The table is preprocessed to convert the classifiers of "0" or "1" to either "good" or "fraud". Next, the data undergoes a Z-score normalization, which standardizes the data to a mean of zero and a standard deviation of one, making it easier to compare different scales. The z-score normalization model is exported for later use in deployment. We analyze the data to check distributions and employ filters to isolate a single column (V5) and to exclude outliers beyond the 95% confidence intervals. The last step we mark the outliers and score the model on correctly/incorrectly identified transactions. The model score can be viewed using the 'Scorer' node.

The steps we perform are shown below:
1. Read Training Data
2. Data Preprocessing
3. Normalize Data
4. Save Model
5. Filter and Isolate
6. Mark Outliers and Score

URL: Kaggle Dataset https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Nodes

Extensions

Links