Icon

Challenge 6 - Heart Failure Prediction

<p><strong>Challenge 6: Heart Failure Prediction</strong></p><p><strong>Level:</strong> Medium<br><br><strong>Description:</strong> You are a medical researcher working with a hospital to uncover key risk factors behind heart failure. Using an unbalanced dataset of patient records, your task is to build a predictive model to identify potential heart disease cases. But accuracy alone isn’t enough—clinicians want to understand <em>why</em> the model makes the predictions it does.Train your model, then apply explainable AI techniques to reveal the top three features influencing its decisions. Can your insights help doctors detect heart failure earlier and more effectively?<br><br><strong>Beginner-friendly objective(s):</strong> 1. Load and preprocess the heart disease dataset, ensuring that the data is clean and ready for analysis. 2. Perform a train-test split on the dataset, maintaining the class distribution for accurate model evaluation.<br><br><strong>Intermediate-friendly objective(s):</strong> 1. Implement a parameter optimization loop to fine-tune the model's hyperparameters for improved performance. 2. Within the Parameter Optimization Loop, conduct cross-validation to assess the model's robustness and generalization (default: Naïve Bayes, but feel free to experiment with other models).<br><br><strong>Advanced objective(s):</strong> 1. Integrate multiple data science techniques, including one-hot encoding and normalization, to enhance the model's predictive power. 2. Evaluate the model's performance using advanced metrics and visualization techniques to gain insights into its accuracy and reliability. 3. Use the Surrogate Random Forest model from Global Feature Importance to determine the top 3 most important features driving predictions.</p><p>What are the top 3 features responsible for the model's predictions?</p>

Challenge 6: Heart Failure Prediction


Level: Medium

Description: You are a medical researcher working with a hospital to uncover key risk factors behind heart failure. Using an unbalanced dataset of patient records, your task is to build a predictive model to identify potential heart disease cases. But accuracy alone isn’t enough—clinicians want to understand why the model makes the predictions it does.Train your model, then apply explainable AI techniques to reveal the top three features influencing its decisions. Can your insights help doctors detect heart failure earlier and more effectively?

Beginner-friendly objective(s): 1. Load and preprocess the heart disease dataset, ensuring that the data is clean and ready for analysis. 2. Perform a train-test split on the dataset, maintaining the class distribution for accurate model evaluation.

Intermediate-friendly objective(s): 1. Implement a parameter optimization loop to fine-tune the model's hyperparameters for improved performance. 2. Within the Parameter Optimization Loop, conduct cross-validation to assess the model's robustness and generalization (default: Naïve Bayes, but feel free to experiment with other models).

Advanced objective(s): 1. Integrate multiple data science techniques, including one-hot encoding and normalization, to enhance the model's predictive power. 2. Evaluate the model's performance using advanced metrics and visualization techniques to gain insights into its accuracy and reliability. 3. Use the Surrogate Random Forest model from Global Feature Importance to determine the top 3 most important features driving predictions.

What are the top 3 features responsible for the model's predictions?

Dataset

The top 3 features responsible for the model's predictions are:

  1. ST_Slope

  2. ExerciseAngina

  3. ChestPainType

These features play the most significant role in determining the likelihood of heart failure based on the trained model.

Read theheart.csv
CSV Reader
Capture Workflow Start
Optimise Default Probabilityparameter
Parameter Optimization
Capture Workflow End
Convert the target to string
Number to String
Normalise numeric data
Normalizer
Model training
Naive Bayes Learner
Model Prediction
Naive Bayes Predictor
train test split
Table Partitioner
Evaluatingclassification model
Scorer (JavaScript)
Filter out stringcolumns which are one hot encoded
Column Filter
Apply Normaliseron test dataset
Normalizer (Apply)
Input: 0 : Model as a Workflow Object 1 : Data from Model Test Partition Output: 0 : Global Feature Importance
Global Feature Importance
Apply one hot encoding tothe categorical featuresand Remove originalcategorical features
One to Many (PMML)
Apply one hot encoding to the categorical features in the test set
PMML Transformation Apply
Separate target
Column Splitter (deprecated)
combine the target
Column Appender

Nodes

Extensions

Links