Automated Feature Selection

This component applies an embedded feature selection algorithm. By default, a Random forest is used as model which could be changed inside the component.

Options

Columns
Only the included columns will be selected. All the excluded ones will be in the output.
Target Column
Select the target column. Only columns with nominal data can be selected.
Max. number of iterations
Limits the runtime of the algorithm.
Seed
A seed is used to get reproducible results. The results may vary for different seeds.
Cross Validation Folds
The number of folds used for cross validation.
Feature Selection Strategy
Select the strategy that is applied.%%00010- Forward Feature Selection is an iterative approach. It starts with having no feature selected. In each iteration, the feature that improves the model the most is added to the feature set.%%00010- Backward Feature Elimination is an iterative approach. It starts with having all features selected. In each iteration, the feature that has on its removal the least impact on the models performance is removed.%%00010- Genetic Algorithm is a stochastic approach that bases its optimization on the mechanics of biological evolution and genetics. Similar to natural selection, different solutions (individuals) are carried and mutated from generation to generation based on their performance (fitness). This approach converges into a local optimum and enabling early stopping might be recommended. See, e.g., this article for more insights.%%00010- Random is a simple approach that selects feature combinations randomly. There is no converging and by chance (one of) the best feature combination will be drawn in an early iteration, so that early stopping might be recommended.%%00010

Input Ports

Icon
The algorithm will be performed on the train data.
Icon
The same columns that are filtered out for the train data will be filtered out for the test data.

Output Ports

Icon
The selected, i.e., filtered, train data.
Icon
The selected, i.e., filtered, test data.
Icon
Performance metrics of the selected feature set.
Icon
A workflow port object that allows to deploy the performed actions.

Nodes

Extensions

Links