In this workflow, the Distribution Method is used to check for fraud. The Distribution method for classification is particularly useful in data where majority of the data is expected to follow a certain pattern or distribution. For credit card transactions, we can use this to help determine whether there is potential fraud or not. We start with reading in the training data from a sample dataset. The table is preprocessed to convert the classifiers of "0" or "1" to either "good" or "fraud". Next, the data undergoes a Z-score normalization, which standardizes the data to a mean of zero and a standard deviation of one, making it easier to compare different scales. The z-score normalization model is exported for later use in deployment. We analyze the data to check distributions and employ filters to isolate a single column (V5) and to exclude outliers beyond the 95% confidence intervals. The last step we mark the outliers and score the model on correctly/incorrectly identified transactions. The model score can be viewed using the 'Scorer' node.
The steps we perform are shown below:
1. Read Training Data
2. Data Preprocessing
3. Normalize Data
4. Save Model
5. Filter and Isolate
6. Mark Outliers and Score
URL: Kaggle Dataset https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.