Icon

ME&A_​2025&26_​group_​AP_​KNIME (1)

These nodes convert categorical variables into numeric format by replacing text values with numerical codes.

This node converts values stored as text (string) into numeric formats (integer/double), enabling their use in calculations and data analysis.

Identifies extreme values using the IQR method and caps them at acceptable limits to reduce their impact without removing observations.

Missing values were handled to ensure data completeness and avoid errors during analysis.

Numeric Outliers: Treated to reduce their impact on clustering and predictive models.

Missing Values: Replaced to ensure a complete dataset.

Income → Median: Used because it is more robust to outliers.

Other Numeric Variables → Mean: Used to preserve the overall distribution.

String Variables → Most Frequent Value: Used to maintain consistency in categorical data.

This node applies logical rules to transform and recode variables, preparing the data for analysis.

This node selects the variables relevant for the analysis by removing unnecessary columns.

This node scales all variables to a common range, enabling fair comparisons during clustering.

Table manipulator: This node reorganizes the table structure by renaming, reordering, or modifying columns as required for the analysis.

RowID: This node creates or modifies row identifiers, ensuring that each observation has a unique ID.

Statistics: This node generates descriptive statistics to summarize the main characteristics of the data.

Histograms: This node visualizes the distribution of variables, helping to identify patterns, skewness, and potential outliers.

K-Means: This node groups customers into clusters based on similarities in their characteristics.

Silhouette Coefficient: This node evaluates the quality of the clustering by measuring how well each observation fits within its assigned cluster.

GroupBy: This node calculates summary statistics for each cluster, enabling cluster profiling and comparison.

Color Manager: This node assigns colours to clusters to improve visual interpretation.

Scatter Plot: This node visualizes the clusters, making it easier to identify patterns and separation between groups.

Desnormalizer: This node restores the variables to their original scale, making the results easier to interpret.

Table Partitioner: Splits the dataset into training and testing sets for model validation.

Decision Tree Learner: Builds a decision tree model using the training data.

Decision Tree Predictor: Applies the trained model to generate predictions.

Scorer: Evaluates the predictive performance of the model using classification metrics.

ROC Curve: Assesses the model's ability to distinguish between classes.

Lift Chart: Measures the improvement of the model over random selection.

Linear Correlation: Identifies relationships between numerical variables.

Correlation Filter: Removes highly correlated variables to reduce redundancy and improve model performance.

RProp MLP Learner: Builds a Multi-Layer Perceptron (Neural Network) model using the training data.

MultiLayerPerceptron Predictor: Applies the trained neural network to generate predictions on new data.

Excel Writer: Exports the results and model outputs to an Excel file for further analysis and reporting.

Linear Correlation: Identifies relationships between numerical variables.

Heatmap: Visualizes correlations between variables through a colour-coded matrix.

Statistics: Provides descriptive statistics to summarize the dataset.

Histogram: Displays the distribution of a variable and helps identify patterns and outliers.

Scatter Plot: Visualizes the relationship between two variables and helps detect trends or clusters.

Bar Chart: Displays the frequency or distribution of categories, making it easier to compare values across groups.

Bevarages
Rule Engine
Perishables
Rule Engine
F=1
String Replacer
M=0
String Replacer
single=0
String Replacer
other=0
String Replacer
PerishableShareMeasures the percentage of spending on perishable products. This helps identify customers who make frequent, routine purchases and exhibit a more grocery-focused shopping behaviour.
Math Formula
widow=0
String Replacer
Lift Chart (JavaScript) (legacy)
Canned
Rule Engine
ROC Curve
ROC Curve
Frozen
Rule Engine
Scorer
Scorer
Lift Chart (JavaScript) (legacy)
Lift Chart (JavaScript) (legacy)
ROC Curve
Decision Tree Learner
Scorer
Denormalizer
Together=1
String Replacer
Divorced=0
String Replacer
Bar Chart
String to Number
Married=1
String Replacer
String Manipulation
InternetSpent
Math Formula
GroupBy
Numeric Outliers
Statistics
INcome
Rule Engine
Decision Tree Predictor
PhysicalSpent
Math Formula
Linear Correlation
MSc=1
String Replacer
Lift Chart (JavaScript) (legacy)
Heatmap
Missing Value
ROC Curve
Excel Writer
Scorer
Missing Value
Statistics
Statistics
Excel Reader
Histogram
Missing Value
Normalizer
Selected modelBest balance between interpretability and segmentation quality
k-Means
GroupBy
Tested for comparisonMore fragmented clusters
k-Means
Tested for comparisonOver-segmentation observed
k-Means
PhD=1
String Replacer
Normalizer
Missing Value
BSc=1
String Replacer
Others
Rule Engine
Avg Basket Value
Math Formula
Avg Basket ValueMeasures the average amount spent per purchase or transaction.
Math Formula
OnlineEngagement
Math Formula
OnlineEngagementMeasures the frequency of online purchases and the intensity of digital channel usage.
Math Formula
Category_DiversityMeasures consumption diversity by identifying customers who purchase across multiple product categories versus those who focus on a single type of product.
Math Formula
High School=0
String Replacer
Histogram
Silhouette Coefficient
primary=0
String Replacer
Missing Value
Statistics
RecencyFrequencyScoreIt aims to measure how active and engaged the customer currently is.
Math Formula
PerishableShare
Math Formula
Avg_Basket_Value
Histogram
This node is used to select exactly which variables will be included in the clustering and predictive models.
Column Filter
Silhouette Coefficient
InternetSpent
Histogram
Excel Reader
OnlineEngagement
Histogram
Category_Diversity
Histogram
PhysicalSpent
Histogram
PerishableShare
Histogram
Normalizer
Table Partitioner
Correlation Filter
Table Partitioner
Table Manipulator
Decision Tree Predictor
MultiLayerPerceptron Predictor
Scorer
RProp MLP Learner
Decision Tree Learner
MultiLayerPerceptron Predictor
Number to String
RProp MLP Learner
k=2
k-Means
MultiLayerPerceptron Predictor
Histogram
RProp MLP Learner
Medir qualidade dos clusters.
Silhouette Coefficient
This node is used to remove missing values that were created during the feature engineering process.
Missing Value
GroupBy
GroupBy
Bar Chart
GroupBy
k=4
k-Means
Distance Matrix Calculate
Silhouette Coefficient
k=3
k-Means
k=5
k-Means
GroupBy
Silhouette Coefficient
Column Filter
tratar internet > 100%
Rule Engine
Normalizer
Silhouette Coefficient
Silhouette Coefficient
Perishables
Rule Engine
Table Manipulator
Criar variavel debt
Math Formula
Scatter Plot
Frozen
Rule Engine
Color Manager
Bevarages
Rule Engine
Canned
Rule Engine
Numeric Outliers
Prepares the dataset for data cleaning, encoding, and clustering analysis.
Column Filter
INcome
Rule Engine
RowID
Others
Rule Engine
Silhouette Coefficient
F=1
String Replacer
k=6
k-Means
Bar Chart
GroupBy
M=0
String Replacer
GroupBy
other=0
String Replacer
ROC Curve
GroupBy
Linear Correlation
Lift Chart (JavaScript) (legacy)
Histogram
Scatter Plot
Histogram
Histogram
Bar Chart
Histogram
primary=0
String Replacer
Scatter Plot
Statistics
Bar Chart
Heatmap
Histogram
Table Manipulator
High School=0
String Replacer
Scatter Plot
BSc=1
String Replacer
Histogram
RowID
Scatter Plot
Bar Chart
MSc=1
String Replacer
widow=0
String Replacer
Divorced=0
String Replacer
PhD=1
String Replacer
single=0
String Replacer
String to Number
Column Filter
Color Manager
InternetSpentEstimates each customer's online spending.
Math Formula
Bar Chart
Together=1
String Replacer
Bar Chart
Married=1
String Replacer
Linear Correlation
Category_Diversity
Math Formula
RecencyFrequencyScore
Math Formula
Scatter Plot
PhysicalSpentEstimates the customer's spending in physical stores.
Math Formula
Column Filter
Criar variavel debt
Math Formula
Statistics
tratar internet > 100%
Rule Engine

Nodes

Extensions

Links