TD_​XGBoost

The TD_XGBoost is an implementation of gradient boosted decision tree designed for speed and performance. The TD_XGBoost function supports both regression and classification predictive modeling problems. The model created by it is used in the TD_XGBoostPredict function for making predictions.

Options

ColumnSampling
Specify the fraction of features to sample during boosting. The sample_fraction is a DOUBLE PRECISION value in the range (0, 1].
CoverageFactor
Specify the level of coverage for the dataset while boosting trees (in percentage, e.g., 1.25 = 125% coverage). CoverageFactor can only be used if NumBoostedTrees is not supplied. When NumBoostedTrees is specified, coverage depends on the value of NumBoostedTrees. If NumBoostedTrees is not specified, NumBoostedTrees is chosen to achieve this level of coverage.
InputColumns
Specify the names of the input table columns that need to be used for training the model (predictors, features or independent variables).
IterNum
Specify the number of iterations (rounds) to boost the weak classifiers. The iterations must be an INTEGER in the range [1, 100000].
MaxDepth
Specify a decision tree stopping criterion. If the tree reaches a depth past this value, the algorithm stops looking for splits. Decision trees can grow to (2^(max_depth+1)-1) nodes. This stopping criterion has the greatest effect on the performance of the function.
MinImpurity
Specify the minimum impurity at which the tree stops splitting further down. For regression, a criteria of squared error is used whereas for classification, gini impurity is used.
MinNodeSize
Specify a decision tree stopping criterion; the minimum size of any node within each decision tree.
ModelType
Specify whether the analysis is a regression (continuous response variable) or a multiple-class classification (predicting result from the number of classes).
NumBoostedTrees
Specify the number of parallels boosted trees.Each boosted tree operates on a sample of data that fits in an AMP's memory. By default, NumBoostedTrees is chosen equal to the number of AMPs with data.
RegularizationLambda
Specify the L2 regularization that the loss function uses while boosting trees. The lambda is a DOUBLE PRECISION value in the range [0, 100000]. The higher the lambda, the stronger the regularization effect. The value 0 specifies no regularization.
ResponseColumn
Specify the name of the column that contains the class label for classification or target value (dependent variable) for regression.
Seed
Specify an integer value to use in determining the random seed for column sampling. By default, seed is 1.
ShrinkageFactor
Specify the learning rate (weight) of a learned tree in each boosting step. After each boosting step, the algorithm multiplies the learner by shrinkage to make the boosting process more conservative. The shrinkage is a DOUBLE PRECISION value in the range (0, 1]. The value 1 specifies no shrinkage.
Output Schema
Output Schema, if Volatile is true then use user login as the schema.
Output Table
Output Table
VAL Location
VAL Location
Volatile
Specifies whether the table should be a VOLATILE table. If true, then the table is automatically deleted, otherwise it is users responsibility to remove or clean it up for space.
TreeSize
Specify the number of rows that each tree uses as its input dataset.

Input Ports

Icon
Connection to a Teradata Database Instance
Icon
Specifies the table containing the input data.

Output Ports

Icon
output of TD_XGBoost

Nodes

Extensions

Links