Icon

kn_​example_​ml_​binary_​python_​xgboost

Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes

Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes

prepare data with vtreat package
in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb")

Dataset: Census Income Data Set
Abstract: Predict whether income exceeds $50K/yr based on census data. Also known as "Adult" dataset.

https://archive.ics.uci.edu/ml/datasets/census+income

Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodesprepare data with vtreat packagein the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb") in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models("kn_example_python_xgboost.ipynb") This is what deployment would look like on new data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Medium: Data preparation for Machine Learning with KNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783https://forum.knime.com/t/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package/58679?u=mlauber71 Dataset: Census Income Data SetAbstract: Predict whether income exceeds $50K/yr based on censusdata. Also known as "Adult" dataset.https://archive.ics.uci.edu/ml/datasets/census+income train.parquetTarget 0/1 - target variablerow_id = ID columnjupyter_test_prediction.parquetNode 10test.parquetTarget 0/1 - target variablerow_id = ID columnv_model_json_file"knime_xgboost_model.json"edit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDS600v_vtreat_indicator_min_fraction=> edit!return 0.025;https://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.mdcollect the resultssort by AUC (pr)DESCENDINGPropagate Python environmentfor KNIME on MacOSX withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesPropagate Python environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesmodel_results.xlsxPropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)with Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesApply the XGBoost model from"knime_xgboost_model.json"drop Targetas if this werecompletely new dataexport FlowVariables from KNIME^(?!knime.workspace).*$determine package versionsMedium: Data preparation for Machine Learning withKNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783locate and create/data/ folderwith absolute pathsNode 3852v_rounds_to_run=> edit!return 1000; Parquet Reader Parquet Reader Binary ClassificationInspector Parquet Reader Java EditVariable (simple) Integer Input(legacy) Java EditVariable (simple) Concatenate Sorter RowID conda_environment_kaggle_macosx conda_environment_kaggle_windows Excel Writer conda_environment_kaggle_apple_silicon BINARY_RPROP_MLP Merge Variables Python Script Column Filter Variable toTable Row Python Script vtreat preparebinary data H2O Local Context knime_xgboost_model knime_model_gbm Py_XGBoost knime_xgboost_model_vtreat knime_model_gbm_vtreat h2o_automl_vtreat h2o_automl Py_XGBoost_vtreat Collect LocalMetadata knime_ranfor_vtreat knime_ranfor Transpose Java EditVariable (simple) Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodesprepare data with vtreat packagein the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb") in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models("kn_example_python_xgboost.ipynb") This is what deployment would look like on new data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Medium: Data preparation for Machine Learning with KNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783https://forum.knime.com/t/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package/58679?u=mlauber71 Dataset: Census Income Data SetAbstract: Predict whether income exceeds $50K/yr based on censusdata. Also known as "Adult" dataset.https://archive.ics.uci.edu/ml/datasets/census+income train.parquetTarget 0/1 - target variablerow_id = ID columnjupyter_test_prediction.parquetNode 10test.parquetTarget 0/1 - target variablerow_id = ID columnv_model_json_file"knime_xgboost_model.json"edit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDS600v_vtreat_indicator_min_fraction=> edit!return 0.025;https://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.mdcollect the resultssort by AUC (pr)DESCENDINGPropagate Python environmentfor KNIME on MacOSX withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesPropagate Python environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesmodel_results.xlsxPropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)with Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesApply the XGBoost model from"knime_xgboost_model.json"drop Targetas if this werecompletely new dataexport FlowVariables from KNIME^(?!knime.workspace).*$determine package versionsMedium: Data preparation for Machine Learning withKNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783locate and create/data/ folderwith absolute pathsNode 3852v_rounds_to_run=> edit!return 1000;Parquet Reader Parquet Reader Binary ClassificationInspector Parquet Reader Java EditVariable (simple) Integer Input(legacy) Java EditVariable (simple) Concatenate Sorter RowID conda_environment_kaggle_macosx conda_environment_kaggle_windows Excel Writer conda_environment_kaggle_apple_silicon BINARY_RPROP_MLP Merge Variables Python Script Column Filter Variable toTable Row Python Script vtreat preparebinary data H2O Local Context knime_xgboost_model knime_model_gbm Py_XGBoost knime_xgboost_model_vtreat knime_model_gbm_vtreat h2o_automl_vtreat h2o_automl Py_XGBoost_vtreat Collect LocalMetadata knime_ranfor_vtreat knime_ranfor Transpose Java EditVariable (simple)

Nodes

Extensions

Links