Learns a tree based XGBoost model for classification. XGBoost is a popular machine learning library that is based on the ideas of boosting. Checkout the official documentation for some tutorials on how XGBoost works. Since XGBoost requires its features to be single precision floats, we automatically cast double precision values to float, which can cause problems for extreme numbers.

- Objective
- For binary classification tasks there exists the option to use the binary logistic or the softprob objective function, while for more than two classes only softprob is available.
- Target column
- The column containing the class variable. Note that the column domain must contain the possible values. Please use the Domain Calculator node to calculate the possible values if they are not assigned yet.
- Weight column
- The column containing the row weights (also called sample weights or instance weights). Note that the selected column must not contain missing values.
- Feature columns
- Allows to select which columns should be used as features in training. Note that the domain of nominal features must contain the possible values otherwise the node can't be executed. Use the Domain Calculator node to calculate any missing possible value sets.
- Boosting rounds
- The number of models to train in the boosting ensemble.
- Base score
- The initial prediction score of all instances; this global bias will have little effect for a sufficiently large number of iterations.
- Use static random seed
- If checked, the seed displayed in the text field is used as seed for randomized operations such as sampling. Otherwise a new seed is generated for each node execution.
- Manual number of threads
- Allows to specify the number of threads to use for training. The default if the checkbox is not selected is the number of available cores.

- Eta
- Also known as learning rate. Step size shrinkage used in updates in order to prevent overfitting. A smaller Eta value results in a more conservative boosting process.
- Lambda
- L2 regularization term on leaf weights. Increasing this value will make model more conservative
- Alpha
- L1 regularization term on leaf weights. Increasing this value will make model more conservative.
- Gamma
- Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger Gamma is, the more conservative the algorithm will be.
- Maximum delta step
- Maximum delta step we allow each leaf output to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update.
- Booster
- Select either the default tree booster or the DART booster.
- Maximum depth
- Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. 0 indicates no limit. Note that limit is required when grow_policy is set of depthwise.
- Minimum child weight
- Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. The larger min_child_weight is, the more conservative the algorithm will be.
- Tree method
- The tree construction algorithm used in XGBoost.
Can be one of
- Auto: Use heuristic to choose the fastet method.
- Exact: Exact greedy algorithm.
- Approx: Approximate greedy algorithm using quantile sketch and gradient histogram.
- Hist: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bin caching.

- Sketch Epsilon
- Only used for approximate tree method. Usually does not have to be set manually but consider it to a lower value for a more accurate enumeration of split candidates.
- Scale positive weight
- Controls the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative instances) / sum(positive instances).
- Grow policy
- Controls the way new nodes are added to the trees.
Currently only supported for tree method hist.
One of
- Depthwise: Split at nodes closest to the root.
- Lossguide: Split at nodes with highest loss change.

- Maximum number of leaves
- Maximum number of nodes to be added. Only relevant for grow policy lossguide.
- Maximum number of bins
- Only used for tree method hist. Maximum number of discrete bins to bucket continuous features. Increasing this number improves the optimality of splits at the cost of higher computation time.
- Sample type
- Only relevant for DART booster. Uniform will drop trees uniformly while weighted will drop trees in proportion to weight.
- Normalize type
- Only relevant for DART booster.
- Tree: New trees have the same weight as each of the dropped trees. Weights of new trees are 1 / (k + eta). Dropped trees are scaled by a factor of k / (k + eta).
- Forest: New trees have the same weight as the sum of the dropped trees. Weights of new trees are 1 / (1 + eta). Dropped trees are scaled by a factor of 1 / (1 + eta).

- Dropout rate
- Only relevant for DART booster. Fraction of previous trees to drop during the dropout.
- Drop at least one tree
- Only relevant for DART booster. When this flag is enabled, at least one tree is always dropped during the dropout.
- Skip dropout rate
- Only relevant for DART booster. Probability of skipping the dropout procedure during a booster iteration. If a dropout is skipped, new trees are added in the same manner as for the vanilla tree booster. Not that a non-zero skip rate has a higher priority than the "drop at least one tree" flag.
- Subsampling rate
- Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees. This is equivalent to bagging and can help to reduce overfitting. Subsampling will occur once in every boosting iteration.
- Column sampling rate by tree
- Subsample ratio of columns/features when constructing each tree. Subsampling will occur once in every boosting iteration.
- Column sampling rate by level
- Subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.
- Column sampling rate by node
- Subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.

- The trained model.
- The feature importance measures for the training features.
If the values are missing, then this indicates that the feature isn't used by the model at all.
- Feature name column: The column containing feature names.
- Weight column: The weight of a feature is the number of times a feature is used to split the data across all trees.
- Gain column: The gain implies the average gain across all splits the feature is used in. A higher value of this metric when compared to another feature implies it is more important for generating a prediction.
- Cover column: The cover of a feature is the average coverage across all splits the feature is used in.
- Total gain column: The total gain sums up the gain across all splits the feature is used in.
- Total cover column: The total cover sums up the total coverage across all splits the feature is used in.

- This node has no views

- 00_Extensions_installationKNIME Hub
- 00_Extensions_installationKNIME Hub
- 00_Extensions_installationKNIME Hub
- 00_Extensions_installationKNIME Hub
- 00_Extensions_installationKNIME Hub
- Show all 133 workflows

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension KNIME XGBoost Integration from the below update site following our NodePit Product and Node Installation Guide:

v5.2

A zipped version of the software site can be downloaded here.

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud
or on-premises – with our brand new **NodePit Runner**.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**