Icon

modelling_​B_​ollist_​final

Prepare the Dataset for Modeling

This section loads the CSV data, keeps only the relevant columns, and creates a new target/class column using rules. It then removes no-longer-needed fields, tidies a column name, converts numeric values to text categories where needed, and recalculates the domain metadata so later nodes can correctly recognize the possible values in each column. In short, it turns the raw file into a cleaner, model-ready table.

Train and Compare Models with Cross Validation

This section runs a cross-validation loop: the data is repeatedly split into training and test portions, two different models are trained (Random Forest and Logistic Regression), and both make predictions on the same test rows. Their prediction outputs are then combined side by side so each fold keeps both models’ results together. Finally, the loop aggregates all folds into one full prediction table and an error summary, giving an overall view of model performance across the entire dataset.

Evaluate and Visualize Model Performance

Uses the combined cross-validation prediction results to assess how well the models classify the target. The scorer nodes calculate accuracy-style results such as the confusion matrix and summary statistics, while the ROC curve views show how well each model separates the two classes across different decision thresholds. In short, this block turns predictions into performance metrics and comparison visuals.

Convert Probabilities into Final Class Labels

This step turns model output into a final yes/no prediction. First, rules are used to map prediction scores or probabilities into a clear predicted class. Then the results are evaluated with a confusion matrix and accuracy statistics so you can see how well those final classifications match the true target values.

Create Final Class Labels and Score Them

This step converts the model’s output into a clear final predicted class using rules, then checks those predictions against the true target values. The result is a confusion matrix and accuracy statistics, which show how well the final yes/no classifications perform.

Test Multiple Classification Thresholds

This section takes the Logistic Regression prediction scores, converts them into final yes/no class labels using several different probability cutoffs, and then scores each version. The goal is to compare how changing the decision threshold (such as 0.30, 0.35, 0.42, and 0.45) affects the confusion matrix and overall accuracy statistics.

Compare Category Distributions Against the Target

Builds several cross-tab summaries to compare how different categorical fields—business segment, lead type, and business type—are distributed across the activated outcome. This helps you quickly see whether some groups are more or less associated with activation and is useful for early pattern finding before or alongside modeling.

CSV Reader
Column Filter
Rule Engine
Rule Engine
Column Filter
Column Renamer
Number to String
Domain Calculator
Scorer
Rule Engine
Pred_LR_035
Rule Engine
Scorer
Pred_LR_030
Rule Engine
Scorer
Scorer
Pred_LR_042
Rule Engine
Pred_LR_045
Rule Engine
Scorer
Random Forest Predictor
Scorer
X-Partitioner
Logistic Regression Predictor
Crosstabbusiness_type vsactivated
Crosstab
Random Forest Learner
Column Appender
X-Aggregator
Crosstabbusiness_segment vsactivated
Crosstab
Logistic Regression Learner
Crosstab lead_typevs activated
Crosstab
LR
ROC Curve
LR
Scorer
RF
ROC Curve
RF
Scorer

Nodes

Extensions

Links