H2O Generalized Linear Model Learner (Regression)

This Node Is Deprecated — This version of the node has been replaced with a new and improved version. The old version is kept for backwards-compatibility, but for all new workflows we suggest to use the version linked below.
Go to Suggested ReplacementH2O Generalized Linear Model Learner (Regression)

Learns a Generalized Linear Model (GLM) regression model using H2O .

Options

General Settings

Target column selection
Select target column. Must be numeric for regression problems.
Column selection
Select columns used for model training.
Ignore constant columns
Select to ignore constant columns.
Use static random seed
Select to use static seed for randomization.

Algorithm Settings

Solver
Specify the solver to use (AUTO, IRLSM, L_BFGS, COORDINATE_DESCENT_NAIVE, or COORDINATE_DESCENT). IRLSM is fast on problems with a small number of predictors and for lambda search with L1 penalty, while L_BFGS scales better for datasets with many columns. COORDINATE_DESCENT is IRLSM with the covariance updates version of cyclical coordinate descent in the innermost loop. COORDINATE_DESCENT_NAIVE is IRLSM with the naive updates version of cyclical coordinate descent in the innermost loop. COORDINATE_DESCENT_NAIVE and COORDINATE_DESCENT are currently experimental (solver) .
Family
Specify the model type (family) .
Link
Specify a link function (Identity, Family_Default, Logit, Log, Inverse, or Tweedie) (link) .
Alpha
Specify the regularization distribution between L1 and L2 (alpha) .
Lambda
Specify the regularization strength (lambda) .
Enable Lambda search
Specify whether to enable lambda search, starting with lambda max. If you also specify a value for lambda_min_ratio, then this value is interpreted as lambda min. If you do not specify a value for lambda_min_ratio, then GLM will calculate the minimum lambda (lambda_search) .
Number of Lambdas
(Applicable only if lambda_search is enabled) Specify the number of lambdas to use in the search. The default is 100. (nlambdas) .
Lambda minimum ratio
Specify the minimum lambda to use for lambda search (specified as a ratio of lambda_max) (lambda_min_ratio) .
Beta epsilon
Specify the beta epsilon value. If the L1 normalization of the current beta change is below this threshold, consider using convergence (beta_epsilon) .
Objective epsilon
Specify a threshold for convergence. If the objective value is less than this threshold, the model is converged (objective_epsilon) .
Gradient epsilon
(For L-BFGS only) Specify a threshold for convergence. If the objective value (using the L-infinity norm) is less than this threshold, the model is converged (gradient_epsilon) .
Tweedie variance power
(Only applicable if Tweedie is specified for Family) Specify the Tweedie variance power (tweedie_variance_power) .
Tweedie link power
(Only applicable if Tweedie is specified for Family) Specify the Tweedie link power (tweedie_link_power)
Non negative?
Specify whether to force coefficients to have non-negative values (non_negative) .
Max iterations
Specify the number of training iterations (max_iterations) .
Include a constant term in the model
Specify whether to include a constant term in the model. This option is enabled by default (intercept) .
Maximum active predictors
Specify the maximum number of active predictors during computation. This value is used as a stopping criterium to prevent expensive model building with many predictors (max_active_predictors) .
Compute P values
Request computation of p-values. Only applicable with no penalty (lambda = 0 and no beta constraints). Setting remove_collinear_columns is recommended. H2O will return an error if p-values are requested and there are collinear columns and remove_collinear_columns flag is not enabled (compute_p_values) .
Remove collinear columns
Specify whether to automatically remove collinear columns during model-building. When enabled, collinear columns will be dropped from the model and will have 0 coefficient in the returned model. This can only be set if there is no regularization (lambda=0) (remove_colinear_columns) .
Standardize numeric columns
Specify whether to standardize the numeric columns to have a mean of zero and unit variance (recommended). (standardize) .
Missing values handling
Specify how to handle missing values (Skip or MeanImputation) (missing_values_handling) .

Advanced Settings

Weight column selection
Select a column to use for the observation weights, which are used for bias correction (weights_column) .
Offset column selection
Specify a column to use as the offset. Note: Offsets are per-row “bias values” that are used during model training. (offset_column) .
Max Runtime?
Maximum allowed runtime in seconds for model training (max_runtime_secs) .
Early Stopping
Select to activate early stopping.
Stopping metric
Specify the metric to use for early stopping (stopping_metric) .
Stopping tolerance
Specify the relative tolerance for the metric-based stopping to stop training if the improvement is less than this value (stopping_tolerance) .
Number of last seen rows for moving average
Stops training when the option selected for stopping_metric doesn’t improve for the specified number of training rounds, based on a simple moving average. To disable this feature, specify 0. The metric is computed on the validation data (if provided); otherwise, training data is used (stopping_rounds) .
Size of validation set (in %)
Specify the size of the validation data-set used to evaluate early stopping criteria.

Input Ports

Icon
H2O Frame with training data.

Output Ports

Icon
Coefficients of the resulting model
Icon
H2O Generalized Linear Model regression model.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.