Icon

kn_​example_​python_​iris_​2022

Simple example to make a random forest model with new Python Scrip in KNIME 4.6 using the iris dataset. Saving and reusing the model with Pickle

Simple example to make a random forest model with new Python Scrip in KNIME 4.6 using the iris dataset. Saving and reusing the model with Pickle
Also creating some graphics and exporting them to disk. Via Python code or via KNIME ports. It is not really necessary to do all this with the colourful ports, just to check how it does work

using bundled Python version (no additional Python installation necessary)

Simple example to make a random forest model with new Python Scrip in KNIME 4.6 using the iris dataset. Saving and reusing the model with PickleAlso creating some graphics and exporting them to disk. Via Python code or via KNIME ports. It is not really necessary to do all this with the colourful ports, just to check how it does workusing bundled Python version (no additional Python installation necessary) # Python Random Forest Learner# import the KNIME moduleimport knime.scripting.io as knio#Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pd# recreate the classic input_table from the new Python import since KNIME 4.5input_table = knio.input_tables[0].to_pandas()x_train = input_table.loc[:, input_table.columns != 'Target']y_train = input_table.loc[:, input_table.columns == 'Target']# settings for Random Forests# https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html# https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74n_trees = 500max_features = int( round( sqrt( x_train.shape[1] ) * 2 )) # try more features at each splitmax_features = 'auto'max_depth = 7verbose = 1n_jobs = 1# user the settings from the variablesoutput_model = RF( n_estimators = n_trees, max_features = max_features, max_depth = max_depth, verbose = verbose, n_jobs = n_jobs )output_model.fit(x_train, y_train)# clean the memorydel x_traindel y_train# garbage collectionimport gcgc.collect()# export the settings into a pandas data frame (to bring it back to KNIME)output_table = pd.DataFrame( { "n_trees" : [n_trees],"max_features" : [max_features],"max_depth" : [max_depth],"verbose" : [verbose], "n_jobs" : [n_jobs] } ) # These are the node's outputs that need to be populated:knio.output_tables[0] = knio.Table.from_pandas(output_table)knio.output_objects[0] = output_model import pickleimport os# set the path for the pickel filepath = flow_variables['context.workflow.data-path'] + 'random_forest.pkl'# Save object as pickle file# the input object is the 'old' style to import objedcts into the Python object writerpickle.dump(input_object, open(path, 'wb'), pickle.HIGHEST_PROTOCOL) iimport knime.scripting.io as knio#Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdinput_table = knio.input_tables[0].to_pandas()input_model = knio.input_objects[0] x_test = input_table.loc[:, input_table.columns != 'Target']y_test = input_table.loc[:, input_table.columns == 'Target']v_indices = input_table.index.values # https://stackoverflow.com/questions/48947194/add-randomforestclassifier-predict-proba-results-to-original-dataframeprediction_of_probability = input_model.predict_proba( x_test )# determine the original classes of the model# https://stackoverflow.com/questions/16858652/how-to-find-the-corresponding-class-in-clf-predict-probav_classes = input_model.classes_# store the prediction in a Data Framedf_prediction = pd.DataFrame(data=prediction_of_probability, # values index=v_indices, # 1st column as index columns=v_classes) # the classes as column names# convert the predictions to float64, this is necessary because # sometimes there were problems with pandas and KNIMEdf_prediction[v_classes] = df_prediction[v_classes ].astype('float64')# put prefix pred_ to the variables to indicate the predictiondf_prediction = df_prediction.add_prefix('pred_')# print(df_prediction)# merge the original table and the predictionoutput_table = input_table.copy()output_table = pd.merge(output_table, df_prediction, left_index=True, right_index=True)# exampels how to keep some ID columns with a prediction if you do not want to keep all columns# output_table = pd.DataFrame(data=np_array, columns=['save_id_txt', 'solution', 'submission'])# output_table = pd.DataFrame(data=np_array, columns=['solution', 'submission'])del x_testdel y_testdel df_predictiondel v_classesimport gcgc.collect()# These are the node's outputs that need to be populated:knio.output_tables[0] = knio.Table.from_pandas(output_table) import pickleimport os# set the path for the pickel filepath = flow_variables['context.workflow.data-path'] + 'random_forest.pkl'# Load object from pickle fileoutput_object = pickle.load(open(path, 'rb')) import knime.scripting.io as knio#Import Libraryfrom io import BytesIOimport seaborn as snsimport osinput_table = knio.input_tables[0].to_pandas()sns_plot = sns.jointplot(x=input_table['Petal.Length'], y=input_table['Sepal.Width'], fill=True, kind="kde")# Create buffer to write intobuffer = BytesIO()# Create plot and write it into the buffersns_plot.savefig(buffer, format='svg')# The output is the content of the bufferoutput_image = buffer.getvalue()# define paths for PNG and PDF filesvar_path_png = knio.flow_variables['context.workflow.data-path'] + 'iris_from_python_script.png'var_path_pdf = knio.flow_variables['context.workflow.data-path'] + 'iris_from_python_script.pdf'# export the PNG and PDF files directly to disksns_plot.savefig(var_path_png, dpi=900, format='png')sns_plot.savefig(var_path_pdf, dpi=900, format='pdf')knio.output_images[0] = output_image import knime.scripting.io as knioimport pickleimport os# Path is workspace/python_object.pklpath = knio.flow_variables['context.workflow.data-path'] + 'random_forest2.pkl'# Load object from pickle fileoutput_object = pickle.load(open(path, 'rb'))# bring the imported onject to the new KNIME Python script styleknio.output_objects[0] = output_object locate and create/data/ folderwith absolute pathsrandom_forest.pkl=> Targetsplitrandom_forest.pkliris.tablePython Learneroutput the tree settings as knime tablePython Predictorrandom_forest2.pklrandom_forest2.pkl1.024x768PNG file../data/iris_from_knime.pngkde plotusing seaborn packageexport to../data/iris_from_python_script.png../data/iris_from_python_script.pdfiris.parquetknio.flow_variables['var_py_version_pandas'] = pd.__version__knio.flow_variables['var_py_version_numpy'] = np.__version__knio.flow_variables['var_py_version'] = sys.version_infoknio.flow_variables['var_sys_path'] = sys.pathvar_*Collect LocalMetadata Python ObjectWriter (legacy) Column Rename Partitioning Python ObjectReader (legacy) Table Reader Python Script Python Script Python Script Python Script Image To Table Renderer to Image Table To Image Image Writer (Port) Python Script Parquet Writer Python Script Variable toTable Row Simple example to make a random forest model with new Python Scrip in KNIME 4.6 using the iris dataset. Saving and reusing the model with PickleAlso creating some graphics and exporting them to disk. Via Python code or via KNIME ports. It is not really necessary to do all this with the colourful ports, just to check how it does workusing bundled Python version (no additional Python installation necessary) # Python Random Forest Learner# import the KNIME moduleimport knime.scripting.io as knio#Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pd# recreate the classic input_table from the new Python import since KNIME 4.5input_table = knio.input_tables[0].to_pandas()x_train = input_table.loc[:, input_table.columns != 'Target']y_train = input_table.loc[:, input_table.columns == 'Target']# settings for Random Forests# https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html# https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74n_trees = 500max_features = int( round( sqrt( x_train.shape[1] ) * 2 )) # try more features at each splitmax_features = 'auto'max_depth = 7verbose = 1n_jobs = 1# user the settings from the variablesoutput_model = RF( n_estimators = n_trees, max_features = max_features, max_depth = max_depth, verbose = verbose, n_jobs = n_jobs )output_model.fit(x_train, y_train)# clean the memorydel x_traindel y_train# garbage collectionimport gcgc.collect()# export the settings into a pandas data frame (to bring it back to KNIME)output_table = pd.DataFrame( { "n_trees" : [n_trees],"max_features" : [max_features],"max_depth" : [max_depth],"verbose" : [verbose], "n_jobs" : [n_jobs] } ) # These are the node's outputs that need to be populated:knio.output_tables[0] = knio.Table.from_pandas(output_table)knio.output_objects[0] = output_model import pickleimport os# set the path for the pickel filepath = flow_variables['context.workflow.data-path'] + 'random_forest.pkl'# Save object as pickle file# the input object is the 'old' style to import objedcts into the Python object writerpickle.dump(input_object, open(path, 'wb'), pickle.HIGHEST_PROTOCOL) iimport knime.scripting.io as knio#Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdinput_table = knio.input_tables[0].to_pandas()input_model = knio.input_objects[0] x_test = input_table.loc[:, input_table.columns != 'Target']y_test = input_table.loc[:, input_table.columns == 'Target']v_indices = input_table.index.values # https://stackoverflow.com/questions/48947194/add-randomforestclassifier-predict-proba-results-to-original-dataframeprediction_of_probability = input_model.predict_proba( x_test )# determine the original classes of the model# https://stackoverflow.com/questions/16858652/how-to-find-the-corresponding-class-in-clf-predict-probav_classes = input_model.classes_# store the prediction in a Data Framedf_prediction = pd.DataFrame(data=prediction_of_probability, # values index=v_indices, # 1st column as index columns=v_classes) # the classes as column names# convert the predictions to float64, this is necessary because # sometimes there were problems with pandas and KNIMEdf_prediction[v_classes] = df_prediction[v_classes ].astype('float64')# put prefix pred_ to the variables to indicate the predictiondf_prediction = df_prediction.add_prefix('pred_')# print(df_prediction)# merge the original table and the predictionoutput_table = input_table.copy()output_table = pd.merge(output_table, df_prediction, left_index=True, right_index=True)# exampels how to keep some ID columns with a prediction if you do not want to keep all columns# output_table = pd.DataFrame(data=np_array, columns=['save_id_txt', 'solution', 'submission'])# output_table = pd.DataFrame(data=np_array, columns=['solution', 'submission'])del x_testdel y_testdel df_predictiondel v_classesimport gcgc.collect()# These are the node's outputs that need to be populated:knio.output_tables[0] = knio.Table.from_pandas(output_table) import pickleimport os# set the path for the pickel filepath = flow_variables['context.workflow.data-path'] + 'random_forest.pkl'# Load object from pickle fileoutput_object = pickle.load(open(path, 'rb')) import knime.scripting.io as knio#Import Libraryfrom io import BytesIOimport seaborn as snsimport osinput_table = knio.input_tables[0].to_pandas()sns_plot = sns.jointplot(x=input_table['Petal.Length'], y=input_table['Sepal.Width'], fill=True, kind="kde")# Create buffer to write intobuffer = BytesIO()# Create plot and write it into the buffersns_plot.savefig(buffer, format='svg')# The output is the content of the bufferoutput_image = buffer.getvalue()# define paths for PNG and PDF filesvar_path_png = knio.flow_variables['context.workflow.data-path'] + 'iris_from_python_script.png'var_path_pdf = knio.flow_variables['context.workflow.data-path'] + 'iris_from_python_script.pdf'# export the PNG and PDF files directly to disksns_plot.savefig(var_path_png, dpi=900, format='png')sns_plot.savefig(var_path_pdf, dpi=900, format='pdf')knio.output_images[0] = output_image import knime.scripting.io as knioimport pickleimport os# Path is workspace/python_object.pklpath = knio.flow_variables['context.workflow.data-path'] + 'random_forest2.pkl'# Load object from pickle fileoutput_object = pickle.load(open(path, 'rb'))# bring the imported onject to the new KNIME Python script styleknio.output_objects[0] = output_object locate and create/data/ folderwith absolute pathsrandom_forest.pkl=> Targetsplitrandom_forest.pkliris.tablePython Learneroutput the tree settings as knime tablePython Predictorrandom_forest2.pklrandom_forest2.pkl1.024x768PNG file../data/iris_from_knime.pngkde plotusing seaborn packageexport to../data/iris_from_python_script.png../data/iris_from_python_script.pdfiris.parquetknio.flow_variables['var_py_version_pandas'] = pd.__version__knio.flow_variables['var_py_version_numpy'] = np.__version__knio.flow_variables['var_py_version'] = sys.version_infoknio.flow_variables['var_sys_path'] = sys.pathvar_*Collect LocalMetadata Python ObjectWriter (legacy) Column Rename Partitioning Python ObjectReader (legacy) Table Reader Python Script Python Script Python Script Python Script Image To Table Renderer to Image Table To Image Image Writer (Port) Python Script Parquet Writer Python Script Variable toTable Row

Nodes

Extensions

Links