Icon

AutoML (Regression)

Options

Enable One Hot Encoding of String Columns:
By checking this box all columns of Domain "String", that is categorical features, are one hot encoded. The resulting Double columns are going to replace all String columns during training. DISCLAIMER: For Deep Learning (Keras) and Polynomial Regression models this setting is necessary if you are providing only String columns.
Activate Interactive View:
If selected the Component creates an interactive view to browse the models ranked by the selected metric.
Remove Extreme Predictions:
Models performing a regression task can output values that are simply unrealistic given the domain of the target value: either too great or too small. Keeping even a handful of 'extreme predictions' is going to impact the measured performance on the model. We enabled a system to produce a model which automatically removes the extreme predictions based on the input target distribution. To evaluate all models on the same test set, predictions that are extreme for at least one model are removed for all models before computing performance. Please notice that the component does not automatically remove outliers in the input data. When the output workflow object is adopted on new data, extreme predictions are going to be replaced with missing values. See “Extreme Predictions Range” to understand how extreme predictions are detected.
Feature Column Selection:
Select the columns which the model should use as input features during training. Excluded columns are discarded and won't be used at all in the workflow. Domain accepted: Number (Integer), Number (double), Number (long) and String.
Target Column:
Select which column of Numeric type you want to predict.
Extreme Predictions Range:
The non-negative constant k that will be used as a parameter for detection and removal of extreme values among the predictions. Extreme predictions will be detected using the target column distribution of the train partition and will be replaced by missing values. The range of normal predictions is the following: mean +- k * sd. Setting k to 0 will replace all the predictions. Setting k to 1.5 will remove most predictions and not only the extreme ones. Setting k >= 3 should remove the most extreme cases. Deactivate the removal of extreme predictions by using the “Remove Extreme Predictions” setting.
Number of Folds in Cross Validation:
A k-fold cross validation takes place in the various parameter optimization phases. Insert the number of folds here.
Size of Training Set Partition (%):
Enter the size of the train set in percentage (%) to define the number of rows that will be used to train the models. The Test set partition is defined by the remaining rows (100% - defined value). Random sampling is performed.
Maximum Amount of Unique Values in a Categorical Column:
Categorical columns with more than this amount of unique values will be removed. This setting ensures you are not starting an endless training process because you forgot to remove columns such RowIDs.
Models to Train:
Select which machine learning algorithms should be used in the AutoML process. The H2O AutoML is going to train even more models types and ensembles: if selected your machine might become slow for a maximum of 2 minutes.
Metric for Auto Selection:
Select performance metric that should be used to automatically select the best model and tune the hyperparameters.
Output Settings:
Select the output format of the captured workflow created by the Component. By "features" we mean the columns selected by the user in the component configuration under "Feature Column Selection". By “prepared” we mean features processed from raw format to the format required by the model or the user. Any extra and unexpected column not recognized as a feature, such as an additional label or identifier, can still be provided to the captured workflow and it will be kept at its output no matter what you select here.

Input Ports

This node has no input ports

Output Ports

This node has no output ports

Nodes

Extensions

Links