H2O Generalized Linear Model Learner (Regression)

Learns a Generalized Linear Model (GLM) regression model using H2O .

Options

General Settings

Target Column
Select target column. Must be numeric for regression problems.
Column selection
Select columns used for model training.
Ignore constant columns
Select to ignore constant columns.
Use static random seed
Select to use static seed for randomization.

Algorithm Settings

Family
Specify the model type (family) .
Link
Specify a link function (Family_Default, Identity, Logit, Log, Inverse, and Tweedie). The available link functions depend on the selected family (link) .
Solver
Specify the solver to use (AUTO, IRLSM, L_BFGS, COORDINATE_DESCENT_NAIVE, or COORDINATE_DESCENT). The available solvers depend on the selected family. IRLSM is fast on problems with a small number of predictors and for lambda search with L1 penalty, while L_BFGS scales better for datasets with many columns. COORDINATE_DESCENT is IRLSM with the covariance updates version of cyclical coordinate descent in the innermost loop. COORDINATE_DESCENT_NAIVE is IRLSM with the naive updates version of cyclical coordinate descent in the innermost loop (solver) .
Set alpha
If enabled, specify the regularization distribution between L1 and L2. If disabled, H2O determines a default value using a heuristic (alpha) .
Set lambda
If enabled, specify the regularization strength. If disabled, H2O determines a default value using a heuristic (lambda) .
Enable lambda search
Specify whether to enable lambda search. The search will start with the highest lambda value (highest lambda value which makes sense - i.e. lowest value driving all coefficients to zero) and then keep decreasing it each step on log scale until the minimum lambda is reached. The minimum lambda will automatically be calculated if no lambda minimum ratio is defined. The number of lambdas will also be defined by a heuristic if undefined. The resulting model uses the "best" lambda value which has been evaluated on the validation set (its size can be defined in the Advanced Settings tab) (lambda_search) .
More detailed information about the process can also be found here .
Set number of lambdas
(Applicable only if lambda_search is enabled) If enabled, specify the number of lambdas to use in the search. If disabled, H2O determines a default value using a heuristic (nlambdas) .
Set lambda minimum ratio
(Applicable only if lambda_search is enabled) If enabled, specify the minimum lambda to use for lambda search (specified as a ratio of lambda_max). If disabled, H2O determines a default value using a heuristic (lambda_min_ratio) .
Set beta epsilon
If enabled, specify the beta epsilon value for convergence. If the L1 normalization of the current beta change is below this threshold, the model is converged. If disabled, H2O determines a default value using a heuristic (beta_epsilon) .
Set objective epsilon
If enabled, specify a threshold for convergence. If the objective value is less than this threshold, the model is converged. If disabled, H2O determines a default value using a heuristic (objective_epsilon) .
Set gradient epsilon
(For L-BFGS only) If enabled, specify a threshold for convergence. If the objective value (using the L-infinity norm) is less than this threshold, the model is converged. If disabled, H2O determines a default value using a heuristic (gradient_epsilon) .
Tweedie variance power
(Only applicable if Tweedie is specified for Family) Specify the Tweedie variance power (tweedie_variance_power) .
Tweedie link power
(Only applicable if Tweedie is specified for Family) Specify the Tweedie link power (tweedie_link_power)
Non negative coefficients
Specify whether to force coefficients to have non-negative values (non_negative) .
Set maximum iterations
If enabled, specify the number of training iterations. If disabled, the number of iterations is not limited (max_iterations) .
Include a constant term in the model
Specify whether to include a constant term in the model. This option is enabled by default (intercept) .
Set maximum active predictors
If enabled, specify the maximum number of active predictors during computation. This value is used as a stopping criterion to prevent expensive model building with many predictors. If disabled, H2O determines a default value using a heuristic (max_active_predictors) .
Remove collinear columns
(Only applicable if IRLSM is specified for Solver and lambda=0) Specify whether to automatically remove collinear columns during model-building. When enabled, collinear columns will be dropped from the model and will have 0 coefficient in the returned model (remove_colinear_columns) .
Standardize numeric columns
Specify whether to standardize the numeric columns to have a mean of zero and unit variance (recommended) (standardize) .
Missing values handling
Specify how to handle missing values (Skip or MeanImputation) (missing_values_handling) .

Advanced Settings

Size of validation set (in %)
Specify the size of the validation dataset used to evaluate early stopping and lambda search. The option can only be specified if either early stopping or lambda search is enabled.
Early Stopping
Select to activate early stopping. The defined validation set will be used to evaluate criteria for early stopping (early_stopping) .
Max runtime in seconds
Maximum allowed runtime in seconds for model training (max_runtime_secs) .
Weights column (optional)
Select a column to use for the observation weights which are used for bias correction (weights_column) .
Offset column (optional)
Specify a column to use as the offset. Note: Offsets are per-row “bias values” that are used during model training. (offset_column) .

Input Ports

Icon
H2O Frame with training data.

Output Ports

Icon
H2O Generalized Linear Model regression model.
Icon
Coefficients of the resulting model.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.