Icon

Sampling Strategies Comparison_​OK

Sampling Strategies Comparison

Experiment with:
- simple random sampling
- stratified random sampling (Partitioning node)
- undersampling (Equal Size Sampling node)
- oversampling (Bootstrap Sampling node and SMOTE node)

The workflow draws on the kaggle Stroke Prediction Dataset that represents 5110 rows with 11 clinical features such as body mass index, smoking status, age, gender, and glucose level. The task is to predict stroke (yes/no), which is a classification problem. We chose to build a Random Forest model.

Nodes

Extensions

Links