Icon

Sampling Strategies Comparison

Experiment with:
- simple random sampling
- stratified random sampling (Partitioning node)
- undersampling (Equal Size Sampling node)
- oversampling (Bootstrap Sampling node and SMOTE node)

The workflow draws on the kaggle Stroke Prediction Dataset that represents 5110 rows with 11 clinical features such as body mass index, smoking status, age, gender, and glucose level. The task is to predict stroke (yes/no), which is a classification problem. We chose to build a Random Forest model.

Simple Random Sampling Stratified Sampling Only SMOTE Undersampling Bootstrapping Signs of a Stroke with SMOTE and Other Sampling TechniquesIn this workflow we use the Binary Classification Inspector node to compare results for the stroke.csv dataset. Split data into train and test setoversampleminority classRead Stroke DataSplit data into train and test setSplit data into train and test settrain the modeltest the modeltrain the modelmake predictionstrain the modeltest the modeldownsamplemajority classtest the modelSplit data into train and test settrain the modelcompare all modelsjoin all predictions togetherAutomatically detect data typeoversampleminority classseparate stroke from non-stroke dataput databack togethermix old andnew samplesSplit data into train and test settrain the modeltest the model Partitioning SMOTE CSV Reader Partitioning Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Equal Size Sampling Random ForestPredictor Partitioning Random ForestLearner Binary ClassificationInspector Column Appender Column AutoType Cast Bootstrap Sampling Row Splitter Concatenate Shuffle Partitioning Random ForestLearner Random ForestPredictor Simple Random Sampling Stratified Sampling Only SMOTE Undersampling Bootstrapping Signs of a Stroke with SMOTE and Other Sampling TechniquesIn this workflow we use the Binary Classification Inspector node to compare results for the stroke.csv dataset. Split data into train and test setoversampleminority classRead Stroke DataSplit data into train and test setSplit data into train and test settrain the modeltest the modeltrain the modelmake predictionstrain the modeltest the modeldownsamplemajority classtest the modelSplit data into train and test settrain the modelcompare all modelsjoin all predictions togetherAutomatically detect data typeoversampleminority classseparate stroke from non-stroke dataput databack togethermix old andnew samplesSplit data into train and test settrain the modeltest the model Partitioning SMOTE CSV Reader Partitioning Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Equal Size Sampling Random ForestPredictor Partitioning Random ForestLearner Binary ClassificationInspector Column Appender Column AutoType Cast Bootstrap Sampling Row Splitter Concatenate Shuffle Partitioning Random ForestLearner Random ForestPredictor

Nodes

Extensions

Links