Icon

kn_​forum_​62317_​brazil_​credit_​risk

Forum question (62317) Brazil Credit Risk - Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes

Forum question (62317) Brazil Credit Risk - Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodes

prepare data with vtreat package
in the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb")



Forum question (62317) Brazil Credit Risk - Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodesprepare data with vtreat packagein the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb") This is what deployment would look like on new data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Medium: Data preparation for Machine Learning with KNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783https://forum.knime.com/t/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package/58679?u=mlauber71 H2O.ai AutoML - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_h2o_automl.ipynb" prepare Test and Training data once train.parquettest.parquetv_model_json_file"knime_xgboost_model.json"file.separatorfile.separatoredit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDSv_vtreat_indicator_min_fraction=> edit!return 0.025;https://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.mdcollect the resultssort by AUC (pr)DESCENDINGmodel_results.xlsxreturn "/";Apply the XGBoost model from"knime_xgboost_model.json"drop Targetas if this werecompletely new dataexport FlowVariables from KNIME^(?!knime.workspace).*$determine package versionspy3_knime_lightgbmyaml in node description!Apple Siliconpy3_knime_lightgbmyaml in node description!WindowsPAKDD2010_Training_Data.parquet70 / 30train.parquettest.parquet.json*variable_list*=> collect all models runextract theinformation aboutthe model runsselect themodels with the best AUCPRH2O_AutoML_ClassificationH2O_AutoML_Classificationload the winningH2O.ai modelNode 3915var_h2o_mojo_filevar_h2o_mojo_fileH2O_AutoML_ClassificationH2O_AutoML_ClassificationH2O_AutoML_Classificationlocate and create/data/ folderwith absolute paths Parquet Reader Parquet Reader Java EditVariable (simple) Extract ContextProperties Extract SystemProperties Table Columnto Variable Integer Input(legacy) Java EditVariable (simple) Concatenate Sorter RowID Excel Writer BINARY_RPROP_MLP Java EditVariable (simple) Merge Variables Python Script Column Filter Variable toTable Row Python Script vtreat preparebinary data H2O Local Context knime_xgboost_model knime_model_gbm Py_XGBoost knime_xgboost_model_vtreat knime_model_gbm_vtreat h2o_automl_vtreat h2o_automl Py_XGBoost_vtreat Conda EnvironmentPropagation Conda EnvironmentPropagation Parquet Reader Partitioning Parquet Writer Parquet Writer JSON Reader JSON Path DuplicateRow Filter Row Filter Table Rowto Variable H2O MOJO Reader H2O MOJO Predictor(Classification) Java EditVariable (simple) String to Path(Variable) Column Filter H2O Binomial Scorer Table to H2O ConstantValue Column H2O Local Context Collect LocalMetadata Forum question (62317) Brazil Credit Risk - Binary Classification - use Python XGBoost package and other nodes to build model and deploy that thru KNIME Python nodesprepare data with vtreat packagein the subfolder /data/ there is a Jupyter notebook to experiment and build XGBoost models ("kn_example_python_xgboost.ipynb") This is what deployment would look like on new data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Medium: Data preparation for Machine Learning with KNIME and the Python “vtreat” packagehttps://medium.com/p/efcaf58fa783https://forum.knime.com/t/data-preparation-for-machine-learning-with-knime-and-the-python-vtreat-package/58679?u=mlauber71 H2O.ai AutoML - here is a Jupyter notebook in the /data/notebook/ subfolder to toy around with"kn_example_python_h2o_automl.ipynb" prepare Test and Training data once train.parquettest.parquetv_model_json_file"knime_xgboost_model.json"file.separatorfile.separatoredit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDSv_vtreat_indicator_min_fraction=> edit!return 0.025;https://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.mdcollect the resultssort by AUC (pr)DESCENDINGmodel_results.xlsxreturn "/";Apply the XGBoost model from"knime_xgboost_model.json"drop Targetas if this werecompletely new dataexport FlowVariables from KNIME^(?!knime.workspace).*$determine package versionspy3_knime_lightgbmyaml in node description!Apple Siliconpy3_knime_lightgbmyaml in node description!WindowsPAKDD2010_Training_Data.parquet70 / 30train.parquettest.parquet.json*variable_list*=> collect all models runextract theinformation aboutthe model runsselect themodels with the best AUCPRH2O_AutoML_ClassificationH2O_AutoML_Classificationload the winningH2O.ai modelNode 3915var_h2o_mojo_filevar_h2o_mojo_fileH2O_AutoML_ClassificationH2O_AutoML_ClassificationH2O_AutoML_Classificationlocate and create/data/ folderwith absolute paths Parquet Reader Parquet Reader Java EditVariable (simple) Extract ContextProperties Extract SystemProperties Table Columnto Variable Integer Input(legacy) Java EditVariable (simple) Concatenate Sorter RowID Excel Writer BINARY_RPROP_MLP Java EditVariable (simple) Merge Variables Python Script Column Filter Variable toTable Row Python Script vtreat preparebinary data H2O Local Context knime_xgboost_model knime_model_gbm Py_XGBoost knime_xgboost_model_vtreat knime_model_gbm_vtreat h2o_automl_vtreat h2o_automl Py_XGBoost_vtreat Conda EnvironmentPropagation Conda EnvironmentPropagation Parquet Reader Partitioning Parquet Writer Parquet Writer JSON Reader JSON Path DuplicateRow Filter Row Filter Table Rowto Variable H2O MOJO Reader H2O MOJO Predictor(Classification) Java EditVariable (simple) String to Path(Variable) Column Filter H2O Binomial Scorer Table to H2O ConstantValue Column H2O Local Context Collect LocalMetadata

Nodes

Extensions

Links