Data Strategy and Balancing
Objective: Optimize the dataset for machine learning modeling through feature normalization, class balancing, and data partitioning.
Internal Nodes and Parameters:
Normalizer: Applied the Min-Max (0-1) method to all numerical features to ensure equal weighting during model training.
SMOTE: Performed oversampling on the minority class "Biopsy" (K=5) to achieve a perfect 50/50 balance, resulting in 803 records per class.
Table Partitioner: Executed a 70% Training / 30% Test split using stratified sampling to maintain class proportions across both datasets.
Assumptions and Missingness:
The input data is assumed to be free of missing values, as imputation was finalized during the previous phase.
The target variable (Biopsy) was converted to String format to satisfy the algorithmic requirements of the SMOTE node.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.