Automated Statistical Feature Selection

This component automatically runs a number of statistical tests in order to calculate a relevance for each of the selected columns. Based on a threshold, features that have a low relevance are filtered out.

Options

Columns
Only the included columns will be selected. All the excluded ones will be in the output.
Relevance threshold
Included columns having a relevance smaller than this threshold will be filtered out.
Tests
Select the statistical tests that should be performed. %%00010- ID/Noise Test: tests how many distinct values does a column have, i.e., the column's "id-ness". %%00010- Constant Value Test: tests how many of the values are the same.%%00010- Missing Value Test: tests how many of the values are missing.

Input Ports

Icon
The statistical tests will be performed on the train data.
Icon
The same columns that are filtered out for the train data will be filtered out for the test data.

Output Ports

Icon
The selected, i.e., filtered, train data.
Icon
The selected, i.e., filtered, test data.
Icon
A table listing all the columns ranked by the overall relevance and containing a metrics of the different tests.
Icon
A workflow port object that allows to deploy the performed actions.

Nodes

Extensions

Links