Learns a
Distributed Random Forest (DRF)
regression model, which is a special version of the random forest* algorithm provided by
H2O
.

(*) RANDOM FORESTS is a registered trademark of Minitab, LLC and is used with Minitab’s permission.

- Target Column
- Select target column. Must be numeric for regression problems.
- Column selection
- Select columns used for model training.
- Ignore constant columns
- Select to ignore constant columns.
- Number of levels (tree depth)
- Specify the maximum tree depth (max_depth) .
- Number of models
- Specify the number of trees (ntrees) .
- Use static random seed
- Select to use a static seed for randomization.

- Min (weighted) observations
- Specify the minimum number of observations for a leaf (min_rows) .
- Min relative split improvement rate
- The value of this option specifies the minimum relative improvement in squared error reduction in order for a split to happen. When properly tuned, this option can help reduce overfitting. Optimal values would be in the 1e-10...1e-3 range (min_split_improvement) .
- Row sample rate (per tree)
- Specify the row sampling rate (x-axis). The range is 0.0 to 1.0. Higher values may improve training accuracy. Test accuracy improves when either columns or rows are sampled. For details, refer to “Stochastic Gradient Boosting” (sample_rate) .
- Column sample rate (per tree)
- Specify the column sample rate per tree. This can be a value from 0.0 to 1.0 (col_sample_rate_per_tree) .
- Relative change of column sample rate per level
- This option specifies to change the column sampling rate as a function of the depth in the tree (col_sample_rate_change_per_level) .
- Histogram type
- By default (AUTO) DRF bins from min...max in steps of (max-min)/N. Random split points or quantile-based split points can be selected as well (histogram_type) .
- Number of histogram bins (numerical)
- Specify the number of bins for the histogram to build, then split at the best point (nbins) .
- Number of histogram bins (categorical)
- Specify the number of bins for the histogram to build, then split at the best point. Higher values can lead to more overfitting. The levels are ordered alphabetically; if there are more levels than bins, adjacent levels share bins. This value has a more significant impact on model fitness than nbins. Larger values may increase runtime, especially for deep trees and large clusters, so tuning may be required to find the optimal value for your configuration (nbins_cats) .
- Number of root histogram bins (numerical)
- Specify the number of bins at the root level to use to build the histogram. This number will then be decreased by a factor of two per level, whereby nbins controls when to stop dividing (nbins_top_level) .
- M Tries
- Specify the columns to randomly select at each level. If disabled, the number of variables is the square root of p for classification and p/3 for regression (where p is the number of columns). The range is 1 to p. (mtries) .

- Select categorical encoding
- Specify one of the following encoding schemes for handling categorical features (categorical_encoding) .
- Early Stopping
- Select to activate early stopping.
- Stopping metric
- Specify the metric to use for early stopping (stopping_metric) .
- Stopping tolerance
- Specify the relative tolerance for the metric-based stopping to stop training if the improvement is less than this value (stopping_tolerance) .
- Number of last seen rows for moving average
- Stops training when the option selected for stopping_metric doesn’t improve for the specified number of training rounds, based on a simple moving average. To disable this feature, specify 0. If disabled, the metric is computed on the validation data (if provided); otherwise, training data is used (stopping_rounds) .
- Size of validation set (in %)
- Specify the size of the validation dataset used to evaluate early stopping criteria.
- Max runtime in seconds
- Maximum allowed runtime in seconds for model training (max_runtime_secs) .
- Weights column (optional)
- Select a column to use for the observation weights which are used for bias correction (weights_column) .

- This node has no views

- 07_Customer_prediction_with_H2OKNIME Hub
- 07_KNIMEでのH2Oによる顧客予測KNIME Hub
- K-means LIMEKNIME Hub
- K-means LIMEKNIME Hub
- kn_example_ml_regression_housing_pricesKNIME Hub
- Show all 16 workflows

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME H2O Machine Learning Integration from the below update site following our NodePit Product and Node Installation Guide:

v4.7

A zipped version of the software site can be downloaded here.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**