Icon

kn_​example_​python_​iris

use Python and KNIME to make a random forest
Simple example to make a random forest model with Python in KNIME using the iris dataset. And saving and reusing the model with Pickle #Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdx_train = input_table.loc[:, input_table.columns != 'Target']y_train = input_table.loc[:, input_table.columns == 'Target']n_trees = 100max_features = int( round( sqrt( x_train.shape[1] ) * 2 )) # try more features at each splitmax_features = 'auto'max_depth = 7verbose = 1n_jobs = 1output_model = RF( n_estimators = n_trees, max_features = max_features, max_depth = max_depth, verbose = verbose, n_jobs = n_jobs )output_model.fit(x_train, y_train)del x_traindel y_trainimport gcgc.collect() import pickleimport os# Path is workspace/python_object.pklpath = flow_variables['context.workflow.absolute-path'] + os.sep + 'data' + os.sep + 'random_forest.pkl'# Save object as pickle filepickle.dump(input_object, open(path, 'wb'), pickle.HIGHEST_PROTOCOL) #Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdx_test = input_table.loc[:, input_table.columns != 'Target']y_test = input_table.loc[:, input_table.columns == 'Target']v_indices = input_table.index.values # https://stackoverflow.com/questions/48947194/add-randomforestclassifier-predict-proba-results-to-original-dataframeprediction_of_probability = input_model.predict_proba( x_test )# determine the original classes of the model# https://stackoverflow.com/questions/16858652/how-to-find-the-corresponding-class-in-clf-predict-probav_classes = input_model.classes_# store the prediction in a Data Framedf_prediction = pd.DataFrame(data=prediction_of_probability, # values index=v_indices, # 1st column as index columns=v_classes) # the classes ascolumn names# convert the predictions to float64, this is necessary because # sometimes there were problems with pandas and KNIMEdf_prediction[v_classes] = df_prediction[v_classes ].astype('float64')# put prefix pred_ to the variables to indicate the predictiondf_prediction = df_prediction.add_prefix('pred_')# print(df_prediction)# merge the original table and the predictionoutput_table = input_table.copy()output_table = pd.merge(output_table, df_prediction, left_index=True, right_index=True)# output_table = pd.DataFrame(data=np_array, columns=['save_id_txt', 'solution', 'submission'])# output_table = pd.DataFrame(data=np_array, columns=['solution', 'submission'])del x_testdel y_testdel df_predictiondel v_classesimport gcgc.collect() iris.tablerandom_forest.pkl=> Targetsplitrandom_forest.pklpy3_knime Table Reader(deprecated) Python Learner Python ObjectWriter Extract ContextProperties (deprecated) Column Rename Partitioning Python ObjectReader Python Predictor Merge Variables Conda EnvironmentPropagation Simple example to make a random forest model with Python in KNIME using the iris dataset. And saving and reusing the model with Pickle #Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdx_train = input_table.loc[:, input_table.columns != 'Target']y_train = input_table.loc[:, input_table.columns == 'Target']n_trees = 100max_features = int( round( sqrt( x_train.shape[1] ) * 2 )) # try more features at each splitmax_features = 'auto'max_depth = 7verbose = 1n_jobs = 1output_model = RF( n_estimators = n_trees, max_features = max_features, max_depth = max_depth, verbose = verbose, n_jobs = n_jobs )output_model.fit(x_train, y_train)del x_traindel y_trainimport gcgc.collect() import pickleimport os# Path is workspace/python_object.pklpath = flow_variables['context.workflow.absolute-path'] + os.sep + 'data' + os.sep + 'random_forest.pkl'# Save object as pickle filepickle.dump(input_object, open(path, 'wb'), pickle.HIGHEST_PROTOCOL) #Import Libraryfrom sklearn.ensemble import RandomForestClassifier as RFfrom math import sqrtimport numpy as npimport pandas as pdx_test = input_table.loc[:, input_table.columns != 'Target']y_test = input_table.loc[:, input_table.columns == 'Target']v_indices = input_table.index.values # https://stackoverflow.com/questions/48947194/add-randomforestclassifier-predict-proba-results-to-original-dataframeprediction_of_probability = input_model.predict_proba( x_test )# determine the original classes of the model# https://stackoverflow.com/questions/16858652/how-to-find-the-corresponding-class-in-clf-predict-probav_classes = input_model.classes_# store the prediction in a Data Framedf_prediction = pd.DataFrame(data=prediction_of_probability, # values index=v_indices, # 1st column as index columns=v_classes) # the classes ascolumn names# convert the predictions to float64, this is necessary because # sometimes there were problems with pandas and KNIMEdf_prediction[v_classes] = df_prediction[v_classes ].astype('float64')# put prefix pred_ to the variables to indicate the predictiondf_prediction = df_prediction.add_prefix('pred_')# print(df_prediction)# merge the original table and the predictionoutput_table = input_table.copy()output_table = pd.merge(output_table, df_prediction, left_index=True, right_index=True)# output_table = pd.DataFrame(data=np_array, columns=['save_id_txt', 'solution', 'submission'])# output_table = pd.DataFrame(data=np_array, columns=['solution', 'submission'])del x_testdel y_testdel df_predictiondel v_classesimport gcgc.collect() iris.tablerandom_forest.pkl=> Targetsplitrandom_forest.pklpy3_knimeTable Reader(deprecated) Python Learner Python ObjectWriter Extract ContextProperties (deprecated) Column Rename Partitioning Python ObjectReader Python Predictor Merge Variables Conda EnvironmentPropagation

Nodes

Extensions

Links