Icon

Heart Disease - Machine Learning Case - Comparing Algorithms

Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes<br /><br />prepare data with vtreat package<br />in the subfolder /data/notebooks/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb")<br /><br />Also you can further explore the H2O.ai AutoML model with the notebook "h2o_inspect_model_automl_existing.ipynb"<br /><br />------------<br />Heart Failure Prediction Dataset (Kaggle)<br />11 clinical features for predicting heart disease events.<br /><br />https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction?resource=download

URL: How to Develop Your First XGBoost Model in Python https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/
URL: A Beginner’s guide to XGBoost https://towardsdatascience.com/a-beginners-guide-to-xgboost-87f5d4c30ed7
URL: XGBoost Parameters https://xgboost.readthedocs.io/en/stable/parameter.html
URL: forum entry (45057) https://forum.knime.com/t/saving-xgboost-model-to-pmml-possible-now/45057/4?u=mlauber71
URL: Meta Collection about KNIME and Python https://kni.me/w/AvjrddXKOIoZYLV3
URL: Medium: Data preparation for Machine Learning with KNIME and the Python “vtreat” package https://medium.com/p/efcaf58fa783
URL: H2O.ai AutoML (wrapped with Python) in KNIME for classification problems https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923?u=mlauber71
URL: HUB: Binary Classification - Heart Disease - Machine Learning Case - Comparing Algorithms https://hub.knime.com/-/spaces/-/~0dXvsD0vMrv_w6Fw/current-state/
URL: forum entry (77228) https://forum.knime.com/t/how-do-know-if-your-dataset-is-good-before-using-ml-algorithms/77228/7?u=mlauber71
URL: Medium Blog: KNIME — Machine Learning and Artificial Intelligence— A Collection https://medium.com/p/12e0f7d83b50
URL: Medium Blog: About Machine-Learning — How it Fails and Succeeds https://medium.com/p/9f3ab7cb9b00
URL: Medium Blog: KNIME, XGBoost and Optuna for Hyper Parameter Optimization https://medium.com/p/dcf0efdc8ddf

Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes

also: prepare data with vtreat package
in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb"). Also a Notebook to extract H2O.ai parameters from a model ("h2o_inspect_model_automl_existing.ipynb")

This workflow has been adapted for KNIME 5.2+ from the original one for 4.7 (https://hub.knime.com/-/spaces/-/~In0Rxt7EhzQfycx3/current-state/)

in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb")

This is what deployment would look like on new data using the model from inside Python

Python Conda environment propagation. Please read this article for more details:


KNIME and Python — Setting up and managing Conda environments
https://medium.com/p/2ac217792539

About Machine-Learning — How it Fails and Succeeds

https://medium.com/p/9f3ab7cb9b00

  

KNIME — Machine Learning and Artificial Intelligence— A Collection

https://medium.com/p/12e0f7d83b50

MEDIUM Blog: Hyperparameter optimization for LightGBM — wrapped in KNIME nodes

https://medium.com/p/ddb7ae1d7e2


GitHub Repository

https://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary

train.parquetTarget 0/1 - target variablerow_id = ID column
Parquet Reader
sort by AUC (pr)DESCENDING
Sorter
collect the results
Concatenate
$$ROWID$$ = "eclipse.home.location" => TRUE
Rule-based Row Filter
Table Row to Variable
inspect the model
Binary Classification Inspector
Py_XGBoost
Prepare Heart Failure Prediction Dataset (Kaggle)https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/
Component
knime_model_gbm
also variable importance
h2o_glm
also variable importance
knime_xgboost_model
also variable importance
h2o_glm_vtreat
$$ROWID$$ = "java.home" => TRUE
Rule-based Row Filter
drop Targetas if this werecompletely new data
Column Filter
without DL
h2o_automl
SVM usingthe normalized data
knime_libsvm
Apply the XGBoost model from"knime_xgboost_model.json"
Python Script
without DL
h2o_automl_vtreat
SVM usingthe normalized data
knime_svm
knime_model_gbm_vtreat
test_normalized.table
Table Writer
knime_xgboost_model_vtreat
train_normalized.table
Table Writer
v_model_json_file"knime_xgboost_model.json"
Java Edit Variable (simple)
osgi.syspath=> find where theKNIME installtion is locatedon Windows or MacOSX also java.home
Extract System Properties
model_results.xlsx
Excel Writer
RowID
also preparingdata with normalization
BINARY_RPROP_MLP
locate and create /data/ folder with absolute paths
Collect Local Metadata
export Flow Variables from KNIME ^(?!knime.workspace).*$
Variable to Table Row
Py_XGBoost_vtreat
determine package versions
Python Script
Merge Variables
v_rounds_to_run=> edit!return 2500;
Java Edit Variable (simple)
Table Transposer
also variable importance
knime_ranfor
knime_ranfor_vtreat
just Deep Learning
h2o_automl_dl_vtreat
Binary Classification Inspector
just Deep Learning
h2o_automl_dl
Medium: Data preparation for Machine Learning withKNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783
vtreat prepare binary data
Activate Conda Environmentbased on Operating SystemWindows or macOS
conda_environment_kaggle
jupyter_test_prediction.parquet
Parquet Reader
test.parquetTarget 0/1 - target variablerow_id = ID column
Parquet Reader
v_vtreat_indicator_min_fraction=> edit!return 0.025;https://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.md
Java Edit Variable (simple)
edit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDS600
Integer Input (legacy)

Nodes

Extensions

Links