Icon

JKISeasor2-19_​tomljh_​ver4

There has been no title set for this workflow's metadata.

Challenge 19: Dealing with Diabetes

Level: Easy or Medium

Description: In this challenge you will take the role of a clinician and check if machine learning can help you predict diabetes. You should create a solution that beats a baseline accuracy of 65%, and also works very well for both classes (having diabetes vs not having diabetes). We got an accuracy of 77% with a minimal workflow. If you'd like to take this challenge from easy to medium, try implementing:

* sampling techniques
* feature importance calculation

1.sampling techniques : Stratified sampling SMOTE2.Forward Feature Selectionacc = 79.22% Ref:Knime Hub:https://hub.knime.com/-/spaces/-/latest/~9cBJzpQEeZyMtSNa/https://hub.knime.com/-/spaces/-/latest/~gohv_JOBUNKgit_t/ersy :https://hub.knime.com/-/spaces/-/latest/~XFkhViuhGsGn27gS/github:https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Machine%20Learning%20for%20Diabetes.ipynb My original methodDue to my habit, I set seed to 1234 and accidentally obtained acc=79.2 Find SeedTip: Due to the small dataset and fast calculation using the method, I generated 50 seeds.best seed: 509909 Final Simplification read data - diabetes.csvSet the target variable to string typeStratified sampling70/30select featurestree deep = 8apply to the test setoutcomeint ->strOversample " outcome" class at each training sampleEDANode 848Node 849Stratified sampling70/30Node 854Node 863Node 864EDANode 866Node 867Stratified sampling70/30seed = 509909seed = 509909 CSV Reader Partitioning Forward FeatureSelection Scorer (JavaScript) Gradient BoostedTrees Learner Gradient BoostedTrees Predictor ReferenceColumn Filter Number To String SMOTE ROC Curve(JavaScript) Data Explorer Random ForestLearner Random ForestPredictor Scorer (JavaScript) Partitioning Variable Loop End Data Generator Math Formula Data Explorer Table Row ToVariable Loop Start Table RowTo Variable Partitioning Scorer (JavaScript) Random ForestLearner Random ForestPredictor 1.sampling techniques : Stratified sampling SMOTE2.Forward Feature Selectionacc = 79.22% Ref:Knime Hub:https://hub.knime.com/-/spaces/-/latest/~9cBJzpQEeZyMtSNa/https://hub.knime.com/-/spaces/-/latest/~gohv_JOBUNKgit_t/ersy :https://hub.knime.com/-/spaces/-/latest/~XFkhViuhGsGn27gS/github:https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Machine%20Learning%20for%20Diabetes.ipynb My original methodDue to my habit, I set seed to 1234 and accidentally obtained acc=79.2 Find SeedTip: Due to the small dataset and fast calculation using the method, I generated 50 seeds.best seed: 509909 Final Simplification read data - diabetes.csvSet the target variable to string typeStratified sampling70/30select featurestree deep = 8apply to the test setoutcomeint ->strOversample " outcome" class at each training sampleEDANode 848Node 849Stratified sampling70/30Node 854Node 863Node 864EDANode 866Node 867Stratified sampling70/30seed = 509909seed = 509909 CSV Reader Partitioning Forward FeatureSelection Scorer (JavaScript) Gradient BoostedTrees Learner Gradient BoostedTrees Predictor ReferenceColumn Filter Number To String SMOTE ROC Curve(JavaScript) Data Explorer Random ForestLearner Random ForestPredictor Scorer (JavaScript) Partitioning Variable Loop End Data Generator Math Formula Data Explorer Table Row ToVariable Loop Start Table RowTo Variable Partitioning Scorer (JavaScript) Random ForestLearner Random ForestPredictor

Nodes

Extensions

Links