Learns a tree based XGBoost model for classification. XGBoost is a popular machine learning library that is based on the ideas of boosting. Checkout the official documentation for some tutorials on how XGBoost works. Since XGBoost requires its features to be single precision floats, we automatically cast double precision values to float, which can cause problems for extreme numbers.

- Objective
- For binary classification tasks there exists the option to use the binary logistic or the softprob objective function, while for more than two classes only softprob is available.
- Target column
- The column containing the class variable. Note that the column domain must contain the possible values. Please use the Domain Calculator node to calculate the possible values if they are not assigned yet.
- Weight column
- The column containing the row weights (also called sample weights or instance weights). Note that the selected column must not contain missing values.
- Feature columns
- Allows to select which columns should be used as features in training. Note that the domain of nominal features must contain the possible values otherwise the node can't be executed. Use the Domain Calculator node to calculate any missing possible value sets.
- Boosting rounds
- The number of models to train in the boosting ensemble.
- Base score
- The initial prediction score of all instances; this global bias will have little effect for a sufficiently large number of iterations.
- Use static random seed
- If checked, the seed displayed in the text field is used as seed for randomized operations such as sampling. Otherwise a new seed is generated for each node execution.
- Manual number of threads
- Allows to specify the number of threads to use for training. The default if the checkbox is not selected is the number of available cores.

- Eta
- Also known as learning rate. Step size shrinkage used in updates in order to prevent overfitting. A smaller Eta value results in a more conservative boosting process.
- Lambda
- L2 regularization term on leaf weights. Increasing this value will make model more conservative
- Alpha
- L1 regularization term on leaf weights. Increasing this value will make model more conservative.
- Gamma
- Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger Gamma is, the more conservative the algorithm will be.
- Maximum delta step
- Maximum delta step we allow each leaf output to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update.
- Booster
- Select either the default tree booster or the DART booster.
- Maximum depth
- Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 indicates no limit. Note that limit is required when grow_policy is set of depthwise.
- Minimum child weight
- Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. The larger min_child_weight is, the more conservative the algorithm will be.
- Tree method
- The tree construction algorithm used in XGBoost.
Can be one of
- Auto: Use heuristic to choose the fastet method.
- Exact: Exact greedy algorithm.
- Approx: Approximate greedy algorithm using quantile sketch and gradient histogram.
- Hist: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bin caching.

- Sketch Epsilon
- Only used for approximate tree method. Usually does not have to be set manually but consider it to a lower value for a more accurate enumeration of split candidates.
- Scale positive weight
- Controls the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative instances) / sum(positive instances).
- Grow policy
- Controls the way new nodes are added to the trees.
Currently only supported for tree method hist.
One of
- Depthwise: Split at nodes closest to the root.
- Lossguide: Split at nodes with highest loss change.

- Maximum number of leaves
- Maximum number of nodes to be added. Only relevant for grow policy lossguide.
- Maximum number of bins
- Only used for tree method hist. Maximum number of discrete bins to bucket continuous features. Increasing this number improves the optimality of splits at the cost of higher computation time.
- Sample type
- Only relevant for DART booster. Uniform will drop trees uniformly while weighted will drop trees in proportion to weight.
- Normalize type
- Only relevant for DART booster.
- Tree: New trees have the same weight as each of the dropped trees. Weights of new trees are 1 / (k + eta). Dropped trees are scaled by a factor of k / (k + eta).
- Forest: New trees have the same weight as the sum of the dropped trees. Weights of new trees are 1 / (1 + eta). Dropped trees are scaled by a factor of 1 / (1 + eta).

- Dropout rate
- Only relevant for DART booster. Fraction of previous trees to drop during the dropout.
- Drop at least one tree
- Only relevant for DART booster. When this flag is enabled, at least one tree is always dropped during the dropout.
- Skip dropout rate
- Only relevant for DART booster. Probability of skipping the dropout procedure during a booster iteration. If a dropout is skipped, new trees are added in the same manner as for the vanilla tree booster. Not that a non-zero skip rate has a higher priority than the "drop at least one tree" flag.
- Subsampling rate
- Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. This is equivalent to bagging and can help to reduce overfitting. Subsampling will occur once in every boosting iteration.
- Column sampling rate by tree
- Subsample ratio of columns/features when constructing each tree. Subsampling will occur once in every boosting iteration.
- Column sampling rate by level
- Subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.
- Column sampling rate by node
- Subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.

- This node has no views

- 00_SentimentClassificationAndVisualisationKNIME Hub
- 00_SetupKNIME Hub
- 01_AutoML_Component_via_Interactive_ViewsKNIME Hub
- 01_Classify_Forest_Covertypes_with_XGBoostKNIME Hub
- 01_Compute_LIMEsKNIME Hub
- Show all 143 workflows

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME XGBoost Integration from the below update site following our NodePit Product and Node Installation Guide:

v4.7

A zipped version of the software site can be downloaded here.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**