Icon

modelling_​ollist_​final

Prepare Data and Create a Train/Test Split

The workflow starts by loading the raw data, converting the timestamp into a usable date-time format, and extracting parts like time-based fields to create extra predictors. It then calculates the conversion rate by day and adds that summary back to each row, so every record carries both its original details and a day-level performance signal. Finally, the enriched dataset is split into training and test sets, so models can be built on one part and evaluated on unseen data.

Build Training Features and Train Two Models

This section works on the training data. It first calculates two aggregated features: conversion rate by origin and conversion rate by landing page, then joins those summary rates back onto each training row so every record includes both its original values and these group-level performance signals. After that, it fills in missing values, keeps only the columns needed for modeling, converts the target into the right categorical format, and uses SMOTE to balance the classes by creating synthetic examples of the minority class. Finally, it refreshes the data metadata and trains two classification models in parallel: a Random Forest and a Logistic Regression, so their performance can be compared later.

Prepare Test Data and Score Both Models

This section applies the same engineered conversion-rate features to the test set by joining in the rates learned from the training data, then fills missing values and keeps only the columns needed for prediction. It also converts the target into a categorical format so the models can compare predicted vs. actual classes. Finally, the prepared test data is scored by both the Random Forest and Logistic Regression models, producing predictions for model evaluation on unseen data.

Evaluate and Compare Model Performance

This section assesses how well the models perform on the test data. It creates a confusion matrix and overall accuracy metrics for the Random Forest, draws ROC curves to show how well each model separates the two classes, and then combines both models’ prediction scores into one table so their ROC performance can be compared side by side.

CSV Reader
String to Date&Time
Date&Time Part Extractor
Column Renamer
GroupBy
Joiner
Column Renamer
Random Forest Learner
Logistic Regression Learner
Column Filter
Joiner
Missing Value
Domain Calculator
Joiner
Number to String
output origin_conv_rate
Column Renamer
Random Forest Predictor
Column Filter
Table Partitioner
output landing_page_conv_rate
Column Renamer
Scorer
GroupBy
Joiner
ROC Curve
Joiner
Logistic Regression Predictor
GroupBy
ROC Curve
Number to String
SMOTE
Missing Value
Joiner
Column Filter
ROC Curve

Nodes

Extensions

Links