Icon

Training a Diabetes predictor

Diabetes Prediction - Training - Gradient Boosted

This workflow is an example of how to train a basic machine learning model for a diabetes prediction task, using a Gradient Boosted algorithm.
Notice the three basic data prep steps: missing value imputation, type conversion, and outlier.

ClassifierIn this analysis, the training phase represents afundamental step, as it allows evaluating thelikelihood of an individual being affected bydiabetes. The model in question has undergonean optimized training process through the useof a random search for the best-performingparameters. Additionally, to enhance themodel's robustness and performance, a 10-foldcross-validation technique has been employed. LogLossMetrics used to evaluate model performance:LogLoss = -(y * ln(p) + (1-y) * ln(1-p)) Save ModelTo enable its reuse at any time, the trained model has been saved. InducerNode used to create a prediction model fromtraining data. The resulting model can then beused to make predictions on new data. Arrange the data to obtainroccurve Dataset and PartitioningDataset of patients with presence or absence of owning diabetes Data PrepocessingIt is common practice in statistical analysis touse data pre-processing techniques to optimizethe performance of machine learning models.Among these techniques, converting numericvariables to string variables, checking formissing values, removing outliers for specificvariables, model selection using techniques like Boruta, and checking for imbalances in the dataset can be effective in improving dataquality and analysis accuracy. ResultIn the results phase, the log-loss scoreobtained by our model and the ROC curve forthe best model are presented. Diabetes Prediction - Training - Gradient Boosted This workflow is an example of how to train a basic machine learning model for a diabetes prediction task, using a Gradient Boosted algorithm. Notice the three basic data prep steps: missing value imputation, type conversion, and outlier.The metric used is the LogLossInput attributes type rulesString: Sex, HighChol, CholCheck, BMI, Smoker, HeartDiseaseorAttack, PhysActivity, Fruits, Veggies, HyAlcoholConsump, DiffWalk, Hypertension, Stroke, DiabetesNumber (Integer): Age, GenHlth, MentHlth, PhysHlth Read new dataTrain = 70%Test = 30%Graphic image of the ROC curve.Generatepredictions PerformancescoringLogLossCleaning train data to achieve better performanceCleaning test data to achieve better performancePredict Diabeteswith best parameterArrange the data toperforme the ROCBarChart to evaluate the balance of the target variableExcel Reader Partitioning ROC Curve (local) Gradient BoostedTrees Predictor LogLoss Computation InteractiveTable (local) Preprocessing_Train VariableTransformation GB Learner Model Writer Data for ROC BarChart ClassifierIn this analysis, the training phase represents afundamental step, as it allows evaluating thelikelihood of an individual being affected bydiabetes. The model in question has undergonean optimized training process through the useof a random search for the best-performingparameters. Additionally, to enhance themodel's robustness and performance, a 10-foldcross-validation technique has been employed. LogLossMetrics used to evaluate model performance:LogLoss = -(y * ln(p) + (1-y) * ln(1-p)) Save ModelTo enable its reuse at any time, the trained model has been saved. InducerNode used to create a prediction model fromtraining data. The resulting model can then beused to make predictions on new data. Arrange the data to obtainroccurve Dataset and PartitioningDataset of patients with presence or absence of owning diabetes Data PrepocessingIt is common practice in statistical analysis touse data pre-processing techniques to optimizethe performance of machine learning models.Among these techniques, converting numericvariables to string variables, checking formissing values, removing outliers for specificvariables, model selection using techniques like Boruta, and checking for imbalances in the dataset can be effective in improving dataquality and analysis accuracy. ResultIn the results phase, the log-loss scoreobtained by our model and the ROC curve forthe best model are presented. Diabetes Prediction - Training - Gradient Boosted This workflow is an example of how to train a basic machine learning model for a diabetes prediction task, using a Gradient Boosted algorithm. Notice the three basic data prep steps: missing value imputation, type conversion, and outlier.The metric used is the LogLossInput attributes type rulesString: Sex, HighChol, CholCheck, BMI, Smoker, HeartDiseaseorAttack, PhysActivity, Fruits, Veggies, HyAlcoholConsump, DiffWalk, Hypertension, Stroke, DiabetesNumber (Integer): Age, GenHlth, MentHlth, PhysHlth Read new dataTrain = 70%Test = 30%Graphic image of the ROC curve.Generatepredictions PerformancescoringLogLossCleaning train data to achieve better performanceCleaning test data to achieve better performancePredict Diabeteswith best parameterArrange the data toperforme the ROCBarChart to evaluate the balance of the target variableExcel Reader Partitioning ROC Curve (local) Gradient BoostedTrees Predictor LogLoss Computation InteractiveTable (local) Preprocessing_Train VariableTransformation GB Learner Model Writer Data for ROC BarChart

Nodes

Extensions

Links