Autofeat Generator

This component uses 'autofeat' python library to generate new features. The use of these features is directed towards building linear models. The performance of the linear models is comparable to non-linear models. These linear models have an additional benefit of models being transparent and easy to explain and interpret.

Inputs to the component are train and test DataFrames. Missing values must be filled in prior to data input. The component builds model using train data and the built model is then applied on test data. The model itself is saved to a file (in pickle format) on disk by name of 'autofeat_model.pkl'. Feature engineering can only be on numeric features. Target column should also be numeric.

Feature generation takes time as feature selection process is also involved. Number of feature generation steps is an important parameter that decides the number of features. More the number of steps, more the number of features, more the possibility of overfitting. Outputs from the component are train and test data with newly created features. Another output is the autofeat model built on train data.

Given the model output, you can also use the component 'Autofeat Apply' for feature generation on test data.

The component uses python autofeat library along with numpy and pandas. For more about 'autofeat' library, please see this paper: https://arxiv.org/pdf/1901.07329.pdf OR github site: https://github.com/cod3licious/autofeat .

The autofeat project is Copyright (c) 2016 by its authors and released under MIT License (https://github.com/cod3licious/autofeat/blob/master/LICENSE).

Options

Select those numeric columns that you would like to use for feature engineering. Keep target column in the left panel.
Select all numeric columns on which you would like feature transformations to be applied. Keep others inclidng the target column in the left panel.
Select target column (should be numeric):
Name of target variable in the input dataset. Target column must not be nominal.
Select transformations to be carried out. Selection of 'default' will override all other selections:
Select what all transformations are to be applied to features. If you check 'default', all other selections will be overridden. No selection will imply 'default'.
No of steps for feature generation:
Specify number of steps for generating new features. Recommended: 2 or maximum 3. More may lead to overfitting as also very large number of features.
Number of iterations for feature selection process:
Specify number of runs for feature selection. Recommended is 5.

Input Ports

Icon
train data: Feed here data that will be used for training the feature generator. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.
Icon
test data: Feed here test data. Normalized data would be preferable. Missing values need to be filled in before feeding here. Data should also include target column.

Output Ports

Icon
Output train data with generated features and features already present in the dataframe.
Icon
Outputs autofeat model.
Icon
Output test data with generated features and features already present in the dataframe.

Nodes

Extensions

Links