0 ×

Feature Selection Loop Start (1:1)

KNIME Base Nodes version 4.0.1.v201908131444 by KNIME AG, Zurich, Switzerland

This node is the start of the feature selection loop. The feature selection loop allows you to select, from all the features in the input data set, the subset of features that is best for model construction. With this node you determine (i) which features/columns are to be held fixed in the selection process. These constant or "static" features/columns are included in each loop iteration and are exempt from elimination; (ii) which selection strategy is to be used on the other (variable) features/columns and its settings; and (iii) the specific settings of the selected strategy.

Options

Static and Variable Features
Columns can be selected manually or by means of regular expressions. The columns in the left pane are the static columns, those in the right pane the variable columns. If you want to learn a supervised model (e.g. classification or regression), at least one static column and more than one variable column will be needed. For an unsupervised model (e.g. clustering), no constant column but only variable columns will be needed. Columns can be moved from one pane to the other by clicking on the appropriate button in the middle.
Feature selection strategy
Here you can choose between the selection strategies: Forward Feature Selection, Backward Feature Elimination, Genetic Algorithm and Random.
Use threshold for number of features
Check this option if you want to set a bound for the number of selected features. Since Forward Feature Selection adds features while Backward Feature Elimination subtracts them, this will be an upper bound for Forward Feature Selection and a lower bound for Backward Feature Elimination.
Select threshold for number of features
Set the upper or lower bound for the number of selected features.
Use lower bound for number of features
Check this option if you want to set a lower bound for the number of selected features.
Use upper bound for number of features
Check this option if you want to set an upper bound for the number of selected features.
Population size
Set the number of individuals in each population. Changing this value directly influences the maximal number of loop iterations which is Population size * (Number of generations + 1). This is just an upper bound, usually less iterations will be necessary.
Max. number of generations
Set the number of generations. Changing this value directly influences the maximal number of loop iterations which is Population size * (Number of generations + 1). This is just an upper bound, usually less iterations will be necessary.
Max. number of iterations
Set the number of iterations. This is an upper bound. If the same feature subset is randomly generated for a second time, it won't be processed again but will be counted as iteration. Furthermore, if early stopping is enabled, the algorithm may stop before the max. number of iterations is reached.
Use static random seed
Choose a seed to get reproducible results.

Advanced Options

Selection strategy
Choose the strategy to use for the selection of offspring.
Fraction of survivors
Set the fraction of survivors during evaluation of the next generation. 1 - fraction of survivors defines the fraction of offspring which is evaluated for the next generation.
Elitism rate
Set the fraction of the best individuals within a generation that are transfered to the next generation without alternation.
Crossover strategy
Choose the strategy to use for crossover.
Crossover rate
Set the crossover rate used to alter offspring.
Mutation rate
Set the mutation rate used to alter offspring.
Enable early stopping
Check this option if you want to enable early stopping which means that the algorithm stops after a specified number of generations/iterations without improvement. If using the random strategy, this is based on a moving average whereby the size of the moving window is the same number as the specified number of iterations. If the ratio of improvement is lower than a specified tolerance, the search stops.
Number of generations/iterations without improvement
Set the number of generations/iterations without improvement (or with less improvement than the specified tolerance in case of random strategy) used for early stopping. In case of random strategy it also defines the size of the moving window.
Tolerance
The tolerance used for early stopping which defines the threshold for the ratio of improvement. If the ratio is lower than the threshold, the strategy stops.

Input Ports

A data table containing all features and static columns needed for the feature selection.

Output Ports

The input table with some columns filtered out.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Core from the following update site:

KNIME 4.0
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.