Icon

20230809 Pikairos JustKNIMEIt Season 2 Challenge 19 Dealing with Diabetes

In this challenge you will take the role of a clinician and check if machine learning can help you predict diabetes. You should create a solution that beats a baseline accuracy of 65%, and also works very well for both classes (having diabetes vs not having diabetes). We got an accuracy of 77% with a minimal workflow. If you'd like to take this challenge from easy to medium, try implementing:

sampling techniques
feature importance calculation

Challenge 19: Dealing with DiabetesIn this challenge you will take the role of a clinician and check if machine learning can help you predict diabetes. You should create a solution that beats abaseline accuracy of 65%, and also works very well for both classes (having diabetes vs not having diabetes). We got an accuracy of 77% with a minimalworkflow. If you'd like to take this challenge from easy to medium, try implementing: sampling techniques feature importance calculation DiabetesDataRead OutcomeColumn as a StringReplace 0Values ofCertain Columnswith MissingValuesRemoveMissing ValueRowsTop = 80% forTrainingand InternalTest SetBottom = 20% forExternal Test SetRemoveColumns with>25% MissingValuesFilter OutVariablesDeterminedas Unimportantbased onFindings fromthe Decision TreeModelUse a Decision Tree Model with K-FoldCross Validation with Variablesand their Shuffled Counterparts.Determine the Occurence of EachVariable and Calculate the DifferenceBetween Shuffled and Non-Shuffled.This can be Used to See the Importanceof Each Variable for the ModelApply DifferentModels toExternal Test SetRandom ForestXGBoostLogistic RegressionView StatisticsPer OutcomeView OverallStatisticsView Sensitivityand SpecificityStatisticsView Accuracyand Cohen's KappaStatisticsApply DifferentModels toExternal Test SetRandom ForestXGBoostLogistic RegressionView Sensitivityand SpecificityStatisticsView Accuracyand Cohen's KappaStatisticsView StatisticsView Statisticsof Data CSV Reader Math Formula(Multi Column) Missing Value Partitioning Missing ValueColumn Filter Column Filter Determine Variable Importanceusing a Decision Tree Train Different Models on the Internal Set andPredict the Outcome of the External Test Set InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) Train Different Models on the Internal Set andPredict the Outcome of the External Test Set InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) Statistics Challenge 19: Dealing with DiabetesIn this challenge you will take the role of a clinician and check if machine learning can help you predict diabetes. You should create a solution that beats abaseline accuracy of 65%, and also works very well for both classes (having diabetes vs not having diabetes). We got an accuracy of 77% with a minimalworkflow. If you'd like to take this challenge from easy to medium, try implementing: sampling techniques feature importance calculation DiabetesDataRead OutcomeColumn as a StringReplace 0Values ofCertain Columnswith MissingValuesRemoveMissing ValueRowsTop = 80% forTrainingand InternalTest SetBottom = 20% forExternal Test SetRemoveColumns with>25% MissingValuesFilter OutVariablesDeterminedas Unimportantbased onFindings fromthe Decision TreeModelUse a Decision Tree Model with K-FoldCross Validation with Variablesand their Shuffled Counterparts.Determine the Occurence of EachVariable and Calculate the DifferenceBetween Shuffled and Non-Shuffled.This can be Used to See the Importanceof Each Variable for the ModelApply DifferentModels toExternal Test SetRandom ForestXGBoostLogistic RegressionView StatisticsPer OutcomeView OverallStatisticsView Sensitivityand SpecificityStatisticsView Accuracyand Cohen's KappaStatisticsApply DifferentModels toExternal Test SetRandom ForestXGBoostLogistic RegressionView Sensitivityand SpecificityStatisticsView Accuracyand Cohen's KappaStatisticsView StatisticsView Statisticsof Data CSV Reader Math Formula(Multi Column) Missing Value Partitioning Missing ValueColumn Filter Column Filter Determine Variable Importanceusing a Decision Tree Train Different Models on the Internal Set andPredict the Outcome of the External Test Set InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) Train Different Models on the Internal Set andPredict the Outcome of the External Test Set InteractiveTable (local) InteractiveTable (local) InteractiveTable (local) Statistics

Nodes

Extensions

Links