Icon

Workflow knime Marketing Analytics ordinato

Dataset preparation

The Joiner is used to merge the outcome variable with the Italian store dataset, which naturally introduced missing values for stores with unknown promotion results. By filtering these rows out with the missing valu enode, the final, clean dataset of only activated stores was fully prepared for the analysis phase

Activated store analysis

Analyzing decision Tree is understandable that distance from a Dentist is the primary splitting criterion, yielding the highest node purity (maximizing gini index) and acting as the key determinant in the data distribution. After that, using the random forest algorithm it is possible to evaluates a multiple sets of trees in order to create a model based on the aggreation of the factors that define the different splits across the trees. By looking at the ROC curve it is understandables that the model is a good predictor respect to a random classifier and also presents a better Area Under the Curve (0.945 AUC) respect to the results of the logistic regression model (0.836 AUC)

Importance of factors

Analyzing the split criteria across the Random Forest allows us to quantify Feature Importance based on cumulative split contribution. The analysis ranks the key drivers of store performance in the following order: Distance to Dentist, Water Hardness IDX, Online Consumption IYA, Nr of Schools, Household avg components, and Average Income per Household. Distance to Dentist, Water Hardness, Online Consumption, and Household avg components exhibit a negative relationship with shop's success (inverse correlation), whereas Nr of Schools and Average Income per Household act as positive performance drivers

Store in which invest in 2026

The provided panel displays the comprehensive classification of all stores. Specifically, the 'Final Outcome' column explicitly delineates the stores recommended for investment from those excluded from the strategy

Logistic regression

Insight: Our logistic regression identified distance to the dentist as the sole significant predictor, revealing that success probability decreases as this distance increases. Although the ROC curve confirms the model is robust and outperforms random chance, we are moving to alternative models because this single-factor explanation lacks sufficient depth

Estimated size price

Comparing the final numbers of the 2026 with 2025

Creating a row count
Math Formula
Removing the stores in which not invest
Row Filter
Observing an AUC = 0,945 the model is higly robust respect to a random classifier
ROC Curve
Creating the summary table
Metanode
Final list of shop in which invest of 315 lines
Excel Writer
Final table of estimated price
Row Filter
Excluding the column that contains the stores already activated in 2025
Reference Row Filter
Verifing the low presence multicollinearity
Linear Correlation
Estimating the profit by looking at the propesity cut off (0.65)
Rule Engine
Using the random forest algorithm build on the observation where the outcome is given is possible to estimate the propensity of the new shops to success in case of activation
Random Forest Predictor
Caculating the estimated cumulative profit
Moving Aggregator
Using z-score to optimize data fort the logistic regression
Normalizer
Presenting an AUC = 0,836 the model is robust nad batter that a random classifier
ROC Curve
Sorting by descendingpropensity
Sorter
Creating a list of only the new stores in which invest for the first time in 2026
Row Filter
OralCare_activated_Stores_2025
Excel Reader
Italian_Stores
Excel Reader
Attaching the outcome value to the Italian stores dataset
Joiner
OralCare_activated_Stores_2025 complete
Missing Value
Put as last column the outcome
Column Resorter
Calculating the importance (level 1)
Math Formula
Removing overfitting by partitioning the table for outcome in a stratified way
Table Partitioner
Removing overfitting by partitioning the table for outcome in a stratified way
Table Partitioner
Observing that first split is for Distance from dentist in the first 70% of the data
Decision Tree Learner
Ordering factors tableby significance
Sorter
Creating differnt model by using the 70% of the observation in order to understand which are the main factors
Random Forest Learner
Using the algorithm from random forest learner is obtainable the propensity to be successeful of each store in case of activation
Random Forest Predictor
Using the trial set of 30%
Decision Tree Predictor
Evaluating the model. Wrong classified 4/52
Scorer
Sorting by importance
Sorter
Complete activeted store
Excel Reader
Evaluating the model. Wrong classified 5/52
Scorer
Complete activeted store
Excel Reader
Sorting by descendingpropensity
Sorter
Italia stores
Excel Reader
Creating the full dataset with another column called final outcome saying if the store has to be activated in 2026
Concatenate
Putting the columns nearer in order to make them more viewable
Column Resorter
Removing the useless column
Column Filter
Transforing outcome into strings
Rule Engine
Creating index not correlated
Dataset manipulation
Creating the profit column
Rule Engine
Calculating the cumulative profit.Looking at the highest value of the cumulative profit is possible to undestande which is the propensity cut off (greater than or equal 0.65) in the distribution in order to maximize the profit
Moving Aggregator
Plotting the cumulative profit
Line Plot (legacy)
Calculating the importance (level 2)
Math Formula
Transform outcome into numbers (0;1)
Rule Engine
Sorting by importance
Sorter
Calculating the importance (all levels)
Math Formula
Defining in which stores re/invest or not by propensity cut off (0.65)
Rule Engine
Calculating the importance (level 0)
Math Formula
Running the logistic regression model on the 70% of the observation
Logistic Regression Learner
Evaluation ofthe model. Wrong classified 9 out of 52
Scorer
Defining the old stores in which reinvest or not in 2026
Rule Engine
Using the algorithm from the learner to predict the rest 30% of the dataset
Logistic Regression Predictor
Exporting the file
Excel Writer
Sorting by importance
Sorter
Export the complete activeted store
Excel Writer
Sorting by importance
Sorter
Creating a column which says if the stores has to be activated or not without distinguish the old and new shops
Rule Engine
Verifing the presence of high multicollinearity
Linear Correlation
Exporting the complete list of italian store
Excel Writer
Calulating the estimated profit
Rule Engine

Nodes

Extensions

Links