Icon

B) Exercise - Credit scoring

Evaluating the Performance of a Regression Model

This workflow trains a linear regression model that predicts the amount of a credit. The performance of the linear regression model is evaluated with the Numeric Scorer node. The assumptions of the linear regression model are checked in a parallel workflow branch. In this branch we check for the linear correlation between independent columns, heteroscedasticity of the residuals, and normality of the residuals.


EXERCISE: Consider the table "B_credit_scoring" (path ../data/C_credit_scoring.csv) containing 1,000 customers of a bankcollected for credit assessment. Complete the following steps:1) Perform a quick bivariate preliminary analysis aimed at detecting the relationships between the input variables (with particularfocus on the variables "Credit Amount" and "Score") 2) Split the input table into training set and test set (80% train, 20% test, with random seed=444444)3) Create a linear regression model to estimate the customer's credit amount as a function of all other variables. Focus on theassessment of model assumptions and predictive performance on the test dataset.4) Create a logistic regression model to predict the probability that a customer is a bad payer ("Score" = bad) as a function of theother variables in the dataset and evaluate its performance. EDA IMPORT DATA PARTITIONING MODELING Load data CSV Reader EXERCISE: Consider the table "B_credit_scoring" (path ../data/C_credit_scoring.csv) containing 1,000 customers of a bankcollected for credit assessment. Complete the following steps:1) Perform a quick bivariate preliminary analysis aimed at detecting the relationships between the input variables (with particularfocus on the variables "Credit Amount" and "Score") 2) Split the input table into training set and test set (80% train, 20% test, with random seed=444444)3) Create a linear regression model to estimate the customer's credit amount as a function of all other variables. Focus on theassessment of model assumptions and predictive performance on the test dataset.4) Create a logistic regression model to predict the probability that a customer is a bad payer ("Score" = bad) as a function of theother variables in the dataset and evaluate its performance. EDA IMPORT DATA PARTITIONING MODELING Load data CSV Reader

Nodes

Extensions

Links