Icon

Heart_​Disease_​Classification Final

Data Preparation

Dataset inspection and preparation for machine learning.

Clustering Analysis

K-Means used to discover groups in the dataset.

Classification Model

Random Forest used to predict heart disease.

Hyperparameter Optimization

Testing different Random Forest tree numbers.

Model Evaluation

Accuracy and Cohen's Kappa used for evaluation.

Input dataset containing clinical and demographic attributes for heart disease classification.
Excel Reader
Definition of the target variable (Disease / No disease) based on clinical diagnosis.
Rule Engine
GroupBy
Selection of relevant features and removal of non-informative or identifier columns.
Column Filter
Splitting the dataset into training and test sets to enable unbiased model evaluation.
Table Partitioner
Training a decision tree classifier as an interpretable baseline model.
Decision Tree Learner
Applying the trained decision tree model to the test dataset.
Decision Tree Predictor
Evaluation of decision tree performance using a confusion matrix and accuracy metrics.
Scorer
H2O to Table
H2O Local Context
Rule Engine
Scorer
Column Renamer
k-Means
Shows how many patients fall into each chest pain category, helping visualize the categorical distribution of cp.
Bar Chart
Parameter Optimization Loop Start
GroupBy
Domain Calculator
Parameter Optimization Result:Best Random Forest configuration:ntrees = 200Cohen’s Kappa = 0.314Model EvaluationThis configuration achieved the best performance during optimization.
Parameter Optimization Loop End
Missing Value
Normalizer
Number of rows
Extract Table Dimension
Correlation analysis
Linear Correlation
Shows the number of samples from each dataset source to illustrate how the dataset is distributed across different collection locations.
Bar Chart
This scatter plot visualizes the relationship between two numeric features by plotting each record as a point on a 2D graph, helping to inspect whether the variables show any pattern, trend, or association.
Scatter Plot
Convert num to a Nominal Target Colum
Number to String
Column Filter
H2O Random Forest Learner
Table to H2O
H2O Local Context
Line Plot
Histogram
Handling missing values using statistical imputation to ensure model compatibility.
Missing Value
Table to H2O
H2O Predictor (Classification)
Training a logistic regression model for probabilistic classification and comparison with the decision tree.
Logistic Regression Learner
Receiver Operating Characteristic (ROC) curve for the logistic regression classifier based on predicted probabilities for the positive class “Disease”. The curve illustrates the trade-off between sensitivity and specificity and provides a threshold-independent evaluation of model performance.
ROC Curve
Generating class predictions and predicted probabilities for the test dataset.
Logistic Regression Predictor
Data Explorer
Handling missing values using statistical imputation to ensure model compatibility.
Missing Value
Table Partitioner
Performance evaluation of the logistic regression model and comparison with the decision tree.
Scorer
GroupBy
Shows the count of male vs female participants, making the distribution of the categorical variable “sex” clear and easy to compare.
Bar Chart
Shows how age values are spread in the dataset — helps understand the age distribution and see common age ranges in the data.
Histogram
Visualizes the distribution of cholesterol levels, indicating how frequently different cholesterol values occur in the dataset
Histogram

Nodes

Extensions

Links