Icon

Building a Credit Scoring Model

This KNIME workflow focuses on creating a credit scoring model based on historical data. As with all data mining modeling activities, it is unclear in advance which analytic method is most suitable. This workflow therefore uses three different methods simultaneously – Decision Trees, Neural Networking and SVM – then automatically determines which model is most accurate and writes that model out for further use.

This workflow manipulates the data so it is suitable for a variety of modeling techniques by converting nominals to numerics. The data was enhanced so that understandable labels are used. It uses metanodes to “package” each technique suitable for reuse. Each Model uses a Test / Learn and cross validated process to ensure accuracy. The workflow writes out the model in the official PMML format, so that other applications can use the model.

Credit Scoring

Credit scoring is a technique used to determine whether or not to extend credit (and if so, how much) to a borrower. This workflow illustrates how to create and choose a credit scoring model based on both historical data and on the application of different machine learning algorithms.

Data Reading

The data are German Credit data, including credit status, demographic data, and customer history. The file is located in TheData/Credit.

Pre-processing

Learners such as neural network or SVM (Support Vector Machines) can only handle numeric attributes.

Nominal columns are converted into numerical columns.

Model Training and Evaluation

The following algorithms are trained and evaluated with cross-validation:

  1. Neural Network

  2. SVM (Support Vector Machines)

  3. Decision Tree

Double-click on the metanode to see the subworkflow

Model Selection

All results, i.e. accuracies and respective models, are combined in one single table.

Rows are then sorted by descending accuracy and only first row (best performing model) is kept.

Save the Model

  • Convert the "model" cell back to PMML

  • Save the model

KNIME Analytics Platform writes out the model in the official PMML format, so that other applications can use the model.

Visualize

Compare accuracy scores of the three models.

Task

Create a credit scoring model based on historical data. Select the best machine learning algorithm to be applied. Use cross-validation to evaluate model performance.

Concatenate
PMML Writer
Train and Cross Validate a Decision Tree
Pick best model
Row Filter
Train and Cross Validate a Neural Network
Cell to PMML
Reading credit scoring dataset
CSV Reader
Sort byaccuracy
Sorter
Bar Chart
Create dataset only containing numbers
Category to Number
Train and Cross Validate a SVM

Nodes

Extensions

Links