Parameter Optimization (Table)

Adopt this component to optimize any number of parameters of any binary or multiclass classification model. The component optionally offers an interactive view to visualize the parameter search performed by the component.

This component requires the parameter ranges listed in a table, the training data partition and the workflow object with the learner and predictor nodes of the classification model you are optimizing.

The output of the component is a flow variable with the optimized parameter values. Connect the flow variable to the learner node and select those values in its flow variable panel to adopt the optimized parameters combination when training the final model.

Various settings are available: for example you can define the performance metric to be maximized (e.g. accuracy), or the optimization criteria,(e.g. brute-force/grid-search). Inside the component, cross validation takes place for each combination of parameters to avoid overfitting.

The former version of this component, “Parameter Optimization” (kni.me/c/A_91QC387NtvJ6g8), was hardcoded on Random Forest and two of its parameters. To understand how to use this new version on any classification model, data, and set of parameters (and without editing the workflow inside) inspect the example workflow referenced at the bottom of this page.

Options

Activate Interactive View:
If selected, the Component creates an interactive view to browse the combinations of parameters with an attached iteration number and achieved performance.
Target Column:
Select the target column. Only columns with nominal data can be selected.
Stop Column:
Select the column containing the stop value for parameter optimization. Only columns with numerical data can be selected.
Start Column:
Select the column containing the start value for parameter optimization. Only columns with numerical data can be selected.
Step Column:
Select the column containing the step value for parameter optimization. This column defines the granularity of the search. The smaller the value, the slower the optimization, the more precise the final results. Only columns with numerical data can be selected.
Parameter Column:
Select the column containing the parameter names for parameter optimization. Only columns with nominal data can be selected. This column should list the same names of the flow variables configured in the capture workflow object.
Datatype Column:
Select the column containing the datatype values for parameter optimization. Only columns with Nominal data can be selected.
Seed:
A seed is used to get reproducible results. The results may vary for different seeds.
Number of Folds in Cross Validation:
A k-fold cross validation takes place in the various parameter optimization phases. Insert the number of folds here.
Parameter Optimization Strategy:
Select the search strategy that should be used. There are four different strategies to choose from:%%00010- Random Search: Hyperparameter combinations are randomly sampled.%%00010- Bayesian Optimization (TPE): Tree-structured Parzen Estimators are used to learn which hyperparameter combinations are likely to improve the model’s performance.%%00010- Brute Force: All possible hyperparameter combinations are evaluated. This strategy is also called "Grid Search".%%00010- Hillclimbing: A random start combination is created and the direct neighbors are evaluated. The best combination among the neighbors is the start point for the next iteration. If no neighbor improves the model's performance, the loop terminates.
Performance Metric:
Select the metric to be optimized as an objective function. The available metrics come from the computed confusion matrix. When the target is binary, some of those metrics rely on the automatically guessed positive class (the rarest class). Make sure to select either “Accuracy” or “Cohen's kappa”, when performing multiclass classification.

Input Ports

Icon
The parameter table should list one row for each parameter to be optimized and four columns in total: 2 string columns with the name of the parameter and its numerical type, either Number (integer) or Number (double); 3 numerical columns with start, stop and stepping of the parameter search.
Icon
The training data with the target column to be classified and the feature columns to be learned.
Icon
The workflow object captured with KNIME Integrated Deployment Capture nodes. The workflow object should have 3 inputs: parameter combination flow variable, train partition table, validation partition table. The workflow segment captured in a workflow object should contain the learner and the predictor node. The learner node should have one one flow variable controlling for each of the parameters.

Output Ports

Icon
A flow variable that contains values with the best parameters found during the optimization process and the corresponding performance with the selected metric.

Nodes

Extensions

Links