Icon

KNIME_​ML_​Project

<p><strong>Data Strategy and Balancing </strong></p><ul><li><p><strong>Objective</strong>: Optimize the dataset for machine learning modeling through feature normalization, class balancing, and data partitioning.</p></li><li><p><strong>Internal Nodes and Parameters</strong>:</p><ul><li><p><strong>Normalizer</strong>: Applied the <strong>Min-Max (0-1)</strong> method to all numerical features to ensure equal weighting during model training.</p></li><li><p><strong>SMOTE</strong>: Performed oversampling on the minority class "<strong>Biopsy</strong>" (K=5) to achieve a perfect <strong>50/50 balance</strong>, resulting in <strong>803 records per class</strong>.</p></li><li><p><strong>Table Partitioner</strong>: Executed a <strong>70% Training / 30% Test</strong> split using <strong>stratified sampling</strong> to maintain class proportions across both datasets.</p></li></ul></li><li><p><strong>Assumptions and Missingness</strong>:</p><ul><li><p>The input data is assumed to be free of missing values, as imputation was finalized during the previous phase.</p></li><li><p>The target variable (Biopsy) was converted to <strong>String</strong> format to satisfy the algorithmic requirements of the SMOTE node.</p></li></ul></li></ul>

Data Strategy and Balancing

  • Objective: Optimize the dataset for machine learning modeling through feature normalization, class balancing, and data partitioning.

  • Internal Nodes and Parameters:

    • Normalizer: Applied the Min-Max (0-1) method to all numerical features to ensure equal weighting during model training.

    • SMOTE: Performed oversampling on the minority class "Biopsy" (K=5) to achieve a perfect 50/50 balance, resulting in 803 records per class.

    • Table Partitioner: Executed a 70% Training / 30% Test split using stratified sampling to maintain class proportions across both datasets.

  • Assumptions and Missingness:

    • The input data is assumed to be free of missing values, as imputation was finalized during the previous phase.

    • The target variable (Biopsy) was converted to String format to satisfy the algorithmic requirements of the SMOTE node.

File Reader
String to Number
Statistics
Column Filter
Missing Value
Data Strategy and Balancing

Nodes

Extensions

Links