Learns Gradient Boosted Trees with the objective of classification. The algorithm uses very shallow regression trees and a special form of boosting to build an ensemble of trees. The implementation follows the algorithm in section 4.6 of the paper "Greedy Function Approximation: A Gradient Boosting Machine" by Jerome H. Friedman (1999). For more information you can also take a look at this.
The used base learner for this ensemble method is a simple regression tree as it is used in the Tree Ensemble , Random Forest and Simple Regression Tree nodes. Per default a tree is build using binary splits for numeric and nominal attributes (the later can be changed to multiway splits). The built-in missing value handling tries to find the best direction for missing values to go to by testing each possible direction and selecting the one yielding the best result (i.e. largest gain).
This node allows to perform row sampling (bagging) and attribute sampling (attribute bagging) similar to the random forest* and tree ensemble nodes. If sampling is used this is usually referred to as Stochastic Gradient Boosted Trees. The respective settings can be found in the Advanced Options tab.
Select the attributes on which the model should be learned. You can choose from two modes.
Fingerprint attribute Uses a fingerprint/vector (bit, byte and double are possible) column to learn the model by treating each entry of the vector as separate attribute (e.g. a bit vector of length 1024 is expanded into 1024 binary attributes). The node requires all vectors to be of the same length.
Column attributes Uses ordinary columns in your table (e.g. String, Double, Integer, etc.) as attributes to learn the model on. The dialog allows to select the columns manually (by moving them to the right panel) or via a wildcard/regex selection (all columns whose names match the wildcard/regex are used for learning). In case of manual selection, the behavior for new columns (i.e. that are not available at the time you configure the node) can be specified as either Enforce exclusion (new columns are excluded and therefore not used for learning) or Enforce inclusion (new columns are included and therefore used for learning).
All columns (no sampling) Each sample consists of all columns which corresponds to no sampling at all.
Sample (square root) Use the square root of the total number of attributes as sample size. This method is typically used in random forests.
Sample (linear fraction) Use the specified linear fraction of the total number of attributes as sample size. A linear fraction of 0.5 corresponds to using 50% of all attributes.
Sample (absolute value) Use the specified number as sample size.
Use same set of attributes for each tree With this option the attributes are sampled per tree. That means that we draw an attribute sample and use it to learn an individual tree so every node of this tree sees the same attributes.
Use different set of attributes for each tree node This strategy draws a new attribute sample per tree node. A random forest typically uses this strategy to make the trees more diverse. (Note that diversity is not important for gradient boosted trees so the effect won't be as large)
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.