Sampling Strategies Comparison

Experiment with:
- simple random sampling
- stratified random sampling (Partitioning node)
- undersampling (Equal Size Sampling node)
- oversampling (Bootstrap Sampling node and SMOTE node)

The workflow draws on the kaggle Stroke Prediction Dataset that represents 5110 rows with 11 clinical features such as body mass index, smoking status, age, gender, and glucose level. The task is to predict stroke (yes/no), which is a classification problem. We chose to build a Random Forest model.

Nodes

Partitioning5 ×
Random Forest Learner5 ×
Random Forest Predictor5 ×
Binary Classification Inspector1 ×
Bootstrap Sampling1 ×
CSV Reader1 ×
Column Appender1 ×
Column Auto Type Cast1 ×
Concatenate1 ×
Equal Size Sampling1 ×
Row Splitter1 ×
SMOTE1 ×
Shuffle1 ×

Extensions

FeatureKNIME Base nodes
FeatureKNIME Ensemble Learning Wrappers
FeatureKNIME Machine Learning Interpretability Extension

Sampling Strategies Comparison

Nodes

Extensions

Links

Download