Icon

kn_​example_​ml_​binary_​lightgbm_​hyper_​parameter_​opt

use KNIME / Python and LightGBM to build a model - Hyperparameter tuning with BayesSearchCV and Optuna - also preparing data with vtreat

use KNIME / Python and LightGBM to build a model - Hyperparameter tuning with BayesSearchCV and Optuna - also preparing data with vtreat
some parameters have been discussed with ChatGPT ...

MEDIUM Blog: Hyperparameter optimization for LightGBM — wrapped in KNIME nodes
https://medium.com/p/ddb7ae1d7e2

GitHub Repository - binary classification
https://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary

-------------------------------
The data used has beed adapted from:
Census Income Data Set
Abstract: Predict whether income exceeds $50K/yr based on census data. Also known as "Adult" dataset.
Extract and prepare the Census Income Files for usage in KNIME
https://archive.ics.uci.edu/ml/datasets/census+income

use KNIME / Python and LightGBM to build a model - Hyperparameter tuning with BayesSearchCV and Optuna - also preparing data with vtreatsome parameters have been discussed with ChatGPT ... BayesSearchCV LightGBM - there is a Jupyter notebook in the /data/notebook/ subfolder to toy around with" kn_example_python_lightgbm_hyper_parameter_bayes_search_cv.ipynb" # conda env create -f="/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml"# conda env create -f="C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml"# conda activate py3_knime_lightgbm# conda update -n py3_knime_lightgbm --all# conda env update --name py3_knime_lightgbm --file "/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml" --prune# conda env update --name py3_knime_lightgbm --file "C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml" --prune# conda env update --name py3_knime_lightgbm --file "/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml"# conda env update --name py3_knime_lightgbm --file "C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml"# conda update -n base conda# KNIME official Python integration guide# https://docs.knime.com/latest/python_installation_guide/index.html#_introduction# KNIME and Python - Setting up and managing Conda environments# https://medium.com/p/2ac217792539# Hyperparameter optimization for LightGBM - wrapped in KNIME nodes# https://medium.com/p/ddb7ae1d7e2# conda activate py3_knime_lightgbm# file: py3_knime_lightgbm.yml with some modifications# THX Carsten Haubold (https://hub.knime.com/carstenhaubold) for hintsname: py3_knime_lightgbm # Name of the created environmentchannels: # Repositories to search for packages# - defaults # edit: removed to just use conda-forge# - anaconda # edit: removed to just use conda-forge - conda-forge# https://anaconda.org/knime - knime # conda search knime-python-base -c knime --info # to see what is in the packagedependencies: # List of packages that should be installed- python=3.9 # Python- knime-python-base # dependencies of KNIME - Python integration# - knime-python-scripting # everything you need to also build Python packages for KNIME- cairo # SVG support- pillow # Image inputs/outputs- matplotlib # Plotting- IPython # Notebook support- nbformat # Notebook support- scipy # Notebook support- jpype1 # A Python to Java bridge# Jupyter Notebook support- jupyter # Jupyter Notebook- pandas-profiling # create overview of your data- sweetviz # In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!- plotly # An interactive, browser-based graphing library for Python- python-kaleido # Fast static image export for web-based visualization libraries# Machine Learning Modules- lightgbm- xgboost- hyperopt- scikit-optimize # skopt- optuna # A hyperparameter optimization framework- pip # Python installer- pip:# - JPype1 # Databases - vtreat # https://medium.com/low-code-for-advanced-data-science/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package-efcaf58fa783 - h2o>=3.38 - boruta # Python Implementation of Boruta Feature Selection MEDIUM Blog: Hyperparameter optimization for LightGBM — wrapped in KNIME nodeshttps://medium.com/p/ddb7ae1d7e2GitHub Repositoryhttps://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary OPTUNA LightGBM - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_lightgbm_hyper_parameter_optuna.ipynb" OPTUNA XGBoost - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with" kn_example_python_xgboost_hyper_parameter_optuna.ipynb" collect the performance statistics from experiements from the Jupyter notebooks for the four approaches from JSON files collecting the measurementsThe Jupyter notebooks are in the sub-folder ../data/notebooks/ You will also find them individually on GitHub:https://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary/notebooks H2O.ai AutoML - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_h2o_automl.ipynb" Medium Blog: KNIME — Machine Learning and Artificial Intelligence— A Collectionhttps://medium.com/p/12e0f7d83b50Medium Blog: About Machine-Learning — How it Fails and Succeedshttps://medium.com/p/9f3ab7cb9b00 locate and create/data/ folderwith absolute pathsmodel_results.xlsxtrain.parquetTarget is target"row_id" will not be usedhttps://medium.com/p/efcaf58fa783test.parquetPy_vtreat_LightGBMPy_vtreat_LightGBMPy_vtreat_LightGBMremoverow_idTestml_model_lightgbm_jupyter.pkl=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_lightgbm_hyper_parameter_bayes_search_cv.ipynbPython Predictoralso apply some transformationsml_model_lightgbm_jupyter_variable_list.jsonPy_LightGBMPy_LightGBMPy_LightGBMPy_LightGBMPr (AUC)DESCENDINGpy3_knime_lightgbmyaml in node description!Apple Siliconlightgbm_feature_importance.parquetlightgbm_vtreat_feature_importance.parquetpy3_knime_lightgbmyaml in node description!Windowsml_model_lightgbm_jupyter_feature_importance.parquetJupyter_LightGBMPr (AUC) 0.823500 iterationsJupyter_LightGBMJupyter_LightGBMlightgbm_model_parameters.txtPy_vtreat_LightGBMlightgbm_vtreat_model_parameters.txtml_model_lightgbm_optuna_jupyter.pkl=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_lightgbm_hyper_parameter_optuna.ipynbPython Predictoralso apply some transformationsJupyter_LightGBM_optunaml_model_lightgbm_optuna_jupyter_variable_list.jsonJupyter_LightGBM_optunaJupyter_LightGBM_optuna.json*variable_list*=> collect all models runextract theinformation aboutthe model runsselect themodels with the best AUCPRLightGBM_BayesSearchCV_ClassificationXGBoost_Optuna_ClassificationLightGBM_BayesSearchCV_Classificationvar_importance_pathvar_importance_pathLightGBM_Optuna_ClassificationLightGBM_Optuna_Classificationml_model_lightgbm_optuna_jupyter_feature_importance.parquetvar_importance_pathvar_importance_pathJupyter_XGBoost_optunato deal with negative score valuesml_model_xgboost_optuna_jupyter.json=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_xgboost_hyper_parameter_optuna.ipynbml_model_xgboost_optuna_jupyter_feature_importance.parquetvar_importance_pathvar_importance_pathPython Predictoralso apply some transformationsJupyter_XGBoost_optunaml_model_xgboost_optuna_jupyter_variable_list.jsonJupyter_XGBoost_optunaXGBoost_Optuna_ClassificationH2O_AutoML_ClassificationH2O_AutoML_Classificationload the winningH2O.ai modelNode 3915var_h2o_mojo_filevar_h2o_mojo_fileH2O_AutoML_ClassificationH2O_AutoML_ClassificationH2O_AutoML_ClassificationPr (AUC)DESCENDINGmodel_results_jupyter.xlsxright click to openCollect LocalMetadata Excel Writer Parquet Reader vtreat preparebinary data Merge Variables Parquet Reader Table to H2O H2O Binomial Scorer H2O Local Context ConstantValue Column Column Filter ReferenceColumn Filter Python Script Python Script Python Script Column Filter ConstantValue Column Table to H2O H2O Binomial Scorer Py_LightGBM Column Filter Concatenate RowID Sorter H2O Local Context Conda EnvironmentPropagation Parquet Writer Parquet Writer Conda EnvironmentPropagation Parquet Reader Column Filter ConstantValue Column Table to H2O H2O Binomial Scorer CSV Writer Py_vtreat_LightGBM CSV Writer ROC Curve Binary ClassificationInspector H2O Local Context Python Script ROC Curve Binary ClassificationInspector Python Script Table to H2O Python Script H2O Binomial Scorer Column Filter ConstantValue Column Normalizer (PMML) JSON Reader JSON Path DuplicateRow Filter Row Filter Row Filter Table Rowto Variable Java EditVariable (simple) String to Path(Variable) Row Filter Table Rowto Variable Parquet Reader Java EditVariable (simple) String to Path(Variable) ConstantValue Column H2O Local Context Normalizer (PMML) Python Script Parquet Reader Java EditVariable (simple) ROC Curve Binary ClassificationInspector String to Path(Variable) Python Script Table to H2O Python Script H2O Binomial Scorer Column Filter Table Rowto Variable Row Filter Table Rowto Variable H2O MOJO Reader H2O MOJO Predictor(Classification) Java EditVariable (simple) String to Path(Variable) Column Filter H2O Binomial Scorer Table to H2O ConstantValue Column H2O Local Context Sorter RowID Concatenate Excel Writer Select Parametersfor Models Merge Variables Merge Variables use KNIME / Python and LightGBM to build a model - Hyperparameter tuning with BayesSearchCV and Optuna - also preparing data with vtreatsome parameters have been discussed with ChatGPT ... BayesSearchCV LightGBM - there is a Jupyter notebook in the /data/notebook/ subfolder to toy around with" kn_example_python_lightgbm_hyper_parameter_bayes_search_cv.ipynb" # conda env create -f="/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml"# conda env create -f="C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml"# conda activate py3_knime_lightgbm# conda update -n py3_knime_lightgbm --all# conda env update --name py3_knime_lightgbm --file "/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml" --prune# conda env update --name py3_knime_lightgbm --file "C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml" --prune# conda env update --name py3_knime_lightgbm --file "/Users/m_lauber/Dropbox/knime-workspace/Machine_Learning/ml_binary/kn_example_ml_binary_lightgbm_hyper_parameter_opt/data/py3_knime_lightgbm.yml"# conda env update --name py3_knime_lightgbm --file "C:\\Users\\x123456\\knime-workspace\\Machine_learning\\ml_binary\\kn_example_ml_binary_lightgbm_hyper_parameter_opt\\data\\py3_knime_lightgbm.yml"# conda update -n base conda# KNIME official Python integration guide# https://docs.knime.com/latest/python_installation_guide/index.html#_introduction# KNIME and Python - Setting up and managing Conda environments# https://medium.com/p/2ac217792539# Hyperparameter optimization for LightGBM - wrapped in KNIME nodes# https://medium.com/p/ddb7ae1d7e2# conda activate py3_knime_lightgbm# file: py3_knime_lightgbm.yml with some modifications# THX Carsten Haubold (https://hub.knime.com/carstenhaubold) for hintsname: py3_knime_lightgbm # Name of the created environmentchannels: # Repositories to search for packages# - defaults # edit: removed to just use conda-forge# - anaconda # edit: removed to just use conda-forge - conda-forge# https://anaconda.org/knime - knime # conda search knime-python-base -c knime --info # to see what is in the packagedependencies: # List of packages that should be installed- python=3.9 # Python- knime-python-base # dependencies of KNIME - Python integration# - knime-python-scripting # everything you need to also build Python packages for KNIME- cairo # SVG support- pillow # Image inputs/outputs- matplotlib # Plotting- IPython # Notebook support- nbformat # Notebook support- scipy # Notebook support- jpype1 # A Python to Java bridge# Jupyter Notebook support- jupyter # Jupyter Notebook- pandas-profiling # create overview of your data- sweetviz # In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!- plotly # An interactive, browser-based graphing library for Python- python-kaleido # Fast static image export for web-based visualization libraries# Machine Learning Modules- lightgbm- xgboost- hyperopt- scikit-optimize # skopt- optuna # A hyperparameter optimization framework- pip # Python installer- pip:# - JPype1 # Databases - vtreat # https://medium.com/low-code-for-advanced-data-science/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package-efcaf58fa783 - h2o>=3.38 - boruta # Python Implementation of Boruta Feature Selection MEDIUM Blog: Hyperparameter optimization for LightGBM — wrapped in KNIME nodeshttps://medium.com/p/ddb7ae1d7e2GitHub Repositoryhttps://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary OPTUNA LightGBM - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_lightgbm_hyper_parameter_optuna.ipynb" OPTUNA XGBoost - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with" kn_example_python_xgboost_hyper_parameter_optuna.ipynb" collect the performance statistics from experiements from the Jupyter notebooks for the four approaches from JSON files collecting the measurementsThe Jupyter notebooks are in the sub-folder ../data/notebooks/ You will also find them individually on GitHub:https://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary/notebooks H2O.ai AutoML - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_h2o_automl.ipynb" Medium Blog: KNIME — Machine Learning and Artificial Intelligence— A Collectionhttps://medium.com/p/12e0f7d83b50Medium Blog: About Machine-Learning — How it Fails and Succeedshttps://medium.com/p/9f3ab7cb9b00 locate and create/data/ folderwith absolute pathsmodel_results.xlsxtrain.parquetTarget is target"row_id" will not be usedhttps://medium.com/p/efcaf58fa783test.parquetPy_vtreat_LightGBMPy_vtreat_LightGBMPy_vtreat_LightGBMremoverow_idTestml_model_lightgbm_jupyter.pkl=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_lightgbm_hyper_parameter_bayes_search_cv.ipynbPython Predictoralso apply some transformationsml_model_lightgbm_jupyter_variable_list.jsonPy_LightGBMPy_LightGBMPy_LightGBMPy_LightGBMPr (AUC)DESCENDINGpy3_knime_lightgbmyaml in node description!Apple Siliconlightgbm_feature_importance.parquetlightgbm_vtreat_feature_importance.parquetpy3_knime_lightgbmyaml in node description!Windowsml_model_lightgbm_jupyter_feature_importance.parquetJupyter_LightGBMPr (AUC) 0.823500 iterationsJupyter_LightGBMJupyter_LightGBMlightgbm_model_parameters.txtPy_vtreat_LightGBMlightgbm_vtreat_model_parameters.txtml_model_lightgbm_optuna_jupyter.pkl=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_lightgbm_hyper_parameter_optuna.ipynbPython Predictoralso apply some transformationsJupyter_LightGBM_optunaml_model_lightgbm_optuna_jupyter_variable_list.jsonJupyter_LightGBM_optunaJupyter_LightGBM_optuna.json*variable_list*=> collect all models runextract theinformation aboutthe model runsselect themodels with the best AUCPRLightGBM_BayesSearchCV_ClassificationXGBoost_Optuna_ClassificationLightGBM_BayesSearchCV_Classificationvar_importance_pathvar_importance_pathLightGBM_Optuna_ClassificationLightGBM_Optuna_Classificationml_model_lightgbm_optuna_jupyter_feature_importance.parquetvar_importance_pathvar_importance_pathJupyter_XGBoost_optunato deal with negative score valuesml_model_xgboost_optuna_jupyter.json=> the model producedin the Jupyter Notebook/data/notebooks/kn_example_python_xgboost_hyper_parameter_optuna.ipynbml_model_xgboost_optuna_jupyter_feature_importance.parquetvar_importance_pathvar_importance_pathPython Predictoralso apply some transformationsJupyter_XGBoost_optunaml_model_xgboost_optuna_jupyter_variable_list.jsonJupyter_XGBoost_optunaXGBoost_Optuna_ClassificationH2O_AutoML_ClassificationH2O_AutoML_Classificationload the winningH2O.ai modelNode 3915var_h2o_mojo_filevar_h2o_mojo_fileH2O_AutoML_ClassificationH2O_AutoML_ClassificationH2O_AutoML_ClassificationPr (AUC)DESCENDINGmodel_results_jupyter.xlsxright click to openCollect LocalMetadata Excel Writer Parquet Reader vtreat preparebinary data Merge Variables Parquet Reader Table to H2O H2O Binomial Scorer H2O Local Context ConstantValue Column Column Filter ReferenceColumn Filter Python Script Python Script Python Script Column Filter ConstantValue Column Table to H2O H2O Binomial Scorer Py_LightGBM Column Filter Concatenate RowID Sorter H2O Local Context Conda EnvironmentPropagation Parquet Writer Parquet Writer Conda EnvironmentPropagation Parquet Reader Column Filter ConstantValue Column Table to H2O H2O Binomial Scorer CSV Writer Py_vtreat_LightGBM CSV Writer ROC Curve Binary ClassificationInspector H2O Local Context Python Script ROC Curve Binary ClassificationInspector Python Script Table to H2O Python Script H2O Binomial Scorer Column Filter ConstantValue Column Normalizer (PMML) JSON Reader JSON Path DuplicateRow Filter Row Filter Row Filter Table Rowto Variable Java EditVariable (simple) String to Path(Variable) Row Filter Table Rowto Variable Parquet Reader Java EditVariable (simple) String to Path(Variable) ConstantValue Column H2O Local Context Normalizer (PMML) Python Script Parquet Reader Java EditVariable (simple) ROC Curve Binary ClassificationInspector String to Path(Variable) Python Script Table to H2O Python Script H2O Binomial Scorer Column Filter Table Rowto Variable Row Filter Table Rowto Variable H2O MOJO Reader H2O MOJO Predictor(Classification) Java EditVariable (simple) String to Path(Variable) Column Filter H2O Binomial Scorer Table to H2O ConstantValue Column H2O Local Context Sorter RowID Concatenate Excel Writer Select Parametersfor Models Merge Variables Merge Variables

Nodes

Extensions

Links