Icon

Group 22_​Day 1

Feature preparation
Pre-processing
Data partitioning and sampling
Random forest training
Gradient boosting validation and important metrics
Decision tree training
Logistic regression training
Gradient boosting training
Random forest validation and important metrics
Decision tree validation and important metrics
Logistic regression validation and important metrics
Comparing the performance metrics of the models
Champion model testing and important metrics
Instructions on how to use the nodes in the four "[Model name] validation and important metrics" boxes. 1. These boxes contain all the necessary nodes to evalute the performance of a model. Hence, you still need to do all the work of preparation, training, and so on. 2. You need to make sure the "[Model Name] Predictor" nodes are set up as follows (so that the metanode that computes the F2 score can read them): Gradient Boosted Trees Predictor: - Tick all boxes in the Prediction Settings menu.- The prediction column name is "Prediction (GB)".- The suffix for proability columns is "GB". Decision Tree Predictor:- Maximum number of stored patterns is 20,000. - Tick all boxes in the Options menu. - The prediction column name is "Prediction (DT)"- The suffix for probability columns is "DT". Random Forest Predictor:- Tick all boxes in the Prediction settings menu except "Use soft voting".- The prediction column name is "Prediction (RF)".- The suffix for proability columns is "RF". Logistic Regression Predictor: - Tick all boxes in the Settings menu.- The prediction column name is "Prediction (LR)".- The suffix for probability columns is "LR".
If you encounter a problem with the meta node for F2 score, open it and follow the instructions. Do not take for granted that the Lift chart, ROC curve, and Scorer nodes are properly set.
If you encounter a problem with the meta node for F2 score, open it and follow the instructions. Do not take for granted that the Binary Classification Inspector and Line Plot (JavaScript) nodes are properly configured.
Upload the score data set to predict the missing target variable with the champion model.
Do not train the model again: use the trained champion model to predict new data.
Make sure to prepare the data in the exact same way as the other partitions. Do not sample and partition this data set. Do not train the model again: use the trained champion model to predict new data.
Make sure that the Binary Classification Inspector node only includes: - "Gradient boosting" - "Random forest" - "Decision tree" - "Logistic regression"

Feature preparation (Validation & Testing)

95% Winsorization (upper & lower bounds)

95% Winsorization (upper & lower bounds)

BEFORE CHANGES:

GB F2: 0.799

RF f2: 0.792

DT F2: 0.792

LR F2: 0.775

Feature preparation

95% Winsorization (upper & lower bounds)

Exploration
Removed 1st and 99th
Column Filter
Data Explorer
Statistics View
Extra columns removed
Column Filter
Adjust prediction based on cutoff value of your champion AI model
Column Expressions (legacy)
70/30
Table Partitioner
(20/10)
Table Partitioner
Statistics
Statistics
50-50 downsampling
Equal Size Sampling
Trimming (age > 18)
Row Filter
Data Explorer
Treat outliers (<>1.5 IQR) as missing
Numeric Outliers
Grand Mean Imputation
Missing Value
Data Explorer
Data Explorer
auto_claims.csv (Data set for training, validation, and testing)
CSV Reader
auto_claims_score.csv (Data set for scoring)
CSV Reader
Data Explorer
Create an Excel file with the model's outputs
Excel Writer
Lift Chart (JavaScript) (legacy)
Validation
Logistic Regression Predictor
ROC Curve (JavaScript) (legacy)
Lift & Gain table
RowID
Validation
Gradient Boosted Trees Predictor
Validation
Random Forest Predictor
Statistics View
lift chart
Line Plot (JavaScript) (legacy)
Generating F2
Precision & Recall
Lift Chart (JavaScript) (legacy)
Selecting best F2 cut-off
Top k Row Filter
ROC Curve (JavaScript) (legacy)
Generating F2
Precision & Recall
Extract Header & Transpose
Sert Color
Binary Classification Inspector
Generating F2
Precision & Recall
Selecting best F2 cut-off
Top k Row Filter
Scorer (JavaScript)
Validation
Gradient Boosted Trees Predictor
Lift Meta node
Precision & Recall
Lift Chart (JavaScript) (legacy)
Validation
Decision Tree Predictor
ROC Curve (JavaScript) (legacy)
Selecting best F2 cut-off
Top k Row Filter
Grand Mean Imputation
Missing Value
Lift Chart (JavaScript) (legacy)
Statistics
Removed 1st and 99th
Column Filter
ROC Curve (JavaScript) (legacy)
95% winsorization
Column Expressions (legacy)
Generating F2
Precision & Recall
Get 1st & 99th % for annual_income
GroupBy
Training
Logistic Regression Learner
Removed 1st and 99th
Column Filter
Joined % aggregationswith original table
Cross Joiner
Training
Random Forest Learner
Joined % aggregationswith original table
Cross Joiner
Training
Decision Tree Learner
95% winsorization
Column Expressions (legacy)
Treat outliers (<>1.5 IQR) as missing
Numeric Outliers
Training
Gradient Boosted Trees Learner
Get 1st & 99th % for annual_income
GroupBy
Trimming (age > 18)
Row Filter
Get 1st & 99th % for annual_income
GroupBy
Grand Mean Imputation
Missing Value
Confusion matrix and ROC
Binary Classification Inspector
95% winsorization
Column Expressions (legacy)
Joined % aggregationswith original table
Cross Joiner
Scorer (JavaScript)
Validation
Gradient Boosted Trees Predictor
Scorer (JavaScript)
Removed 1st and 99th
Column Filter
Scorer (JavaScript)
annual_income
Histogram
select top 3 models based on f2 score
Top k Row Filter
Scorer (JavaScript)
vehicle_price
Histogram
Data Explorer
age_of_driver
Histogram
Selecting best F2 cut-off
Top k Row Filter
age_of_driver
Box Plot
vehicle_price
Box Plot
annual_income
Box Plot
Linear Correlation
String to Number
Statistics
Table View
Trimming (age > 18)
Row Filter
Treat outliers (<>1.5 IQR) as missing
Numeric Outliers
Treat outliers (<>1.5 IQR) as missing
Numeric Outliers (Apply)
Grand Mean Imputation
Missing Value (Apply)
Trimming (age > 18)
Row Filter
Get 1st & 99th % for annual_income
GroupBy
Joins 2 models
Joiner
Replace P (fraud =1) with model name
Column Renamer
Joins 2 models
Joiner
Joins 4 models
Joiner
95% winsorization
Column Expressions (legacy)
Joined % aggregationswith original table
Cross Joiner

Nodes

Extensions

Links