H2O AutoML Learner

Learns the specified types of models using H2O AutoML and returns the leading model amongst these. As part of the learning process, hyperparameters are automatically optimized by H2O using a random grid search.

Options

General Settings

Target Column
Select target column. The column must contain nominal values.
Column selection
Select columns used for model training.
Max. runtime in seconds
Select to specify the maximum runtime for AutoML learning in seconds. Note that this setting can affect AutoML reproducibility (max_runtime_secs) .
Max number of models
Select to specify the maximum number of models that should be trained, excluding Stacked Ensemble models (max_models) .
Use static random seed
Select to use static seed for randomization.

Algorithm Settings

Scoring metric used to select best model
Select the metric used to sort the leaderboard at the end of an AutoML run. The leading model according to the metric will be returned (sort_metric) .
Include algorithms
Select the algorithms that should be included in the AutoML run. Note that Deep Learning means a multi-layer feedforward artificial neural network. If Stacked Ensemble is checked, a second-level model is learned that stacks/combines the learned and optimized models. Hence, Stacked Ensemble can only be included if at least one other model type is included (include_algos) .
Number of folds
The number of folds that should be used for k-fold cross-validation of the models in the AutoML run (nfolds) .
Use fold column
Select to specify a column with cross-validation fold index assignment per observation. The column must not be the same column as the target column and must either contain integer or nominal values (fold_column) .

Advanced Settings

Early Stopping
Select to activate early stopping.
Stopping metric
Specify the metric to use for early stopping. The metric is calculated on the cross-validation folds (stopping_metric) .
Stopping tolerance
Specify the relative tolerance for the metric-based stopping to stop training if the improvement is less than this value (stopping_tolerance) .
Number of last seen rows for moving average
Stops training when the option selected for stopping_metric doesn’t improve for the specified number of training rounds, based on a simple moving average. The metric is calculated on the cross-validation folds (stopping_rounds) .
Max. runtime in seconds per model
Specify the maximum amount of time dedicated to the training of each individual model in the AutoML run. This setting can affect AutoML reproducibility (max_runtime_secs_per_model) .
Weight column (optional)
Select a column to use for the observation weights which are used for bias correction. Note that this setting can affect AutoML reproducibility slightly (weights_column) .
Balance classes
Oversample the minority classes to balance the class distribution. This option is not enabled by default and can increase the data frame size (balance_classes) .
Define max relative number of rows after balancing
This specifies the maximum relative size of the training data after balancing class counts (max_after_balance_size) .
Class specific sampling factors
Specify the per-class (in lexicographical order) over/under-sampling ratios. By default, these ratios are automatically computed during training to obtain the class balance (class_sampling_factors) .

Input Ports

Icon
H2O Frame with training data.

Output Ports

Icon
The best H2O model trained in the AutoML process based on the selected scoring metric. The leading model corresponds with the first row of the leaderboard table.
Icon
A leaderboard of models trained in the AutoML process. The models are ranked by the selected scoring metric, i.e., the model of the first row is the one that is output.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.