TreeSHAP Tree Ensemble

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. While SHAP can explain the output of any machine learning model, Lundberg and his collaborators have developed a high-speed exact algorithm for tree ensemble methods [1], [2].

Usage

The Tree SHAP Tree Ensemble Predictor is used as a substitute to the Tree Ensemble Predictor. Simply replace every Tree Ensemble Predictor with this node to get started. If you are using a different tree based method, consider the other nodes in this package.

Interpretation

The beautiful thing about SHAP values is the intuitive interpretation. Every model has an expected output, the average prediction. The model prediction for a data row is the expected output plus the summation of SHAP values. This leads to intuitive explanations, for example in predictive maintenance "The high production output over the last three months contributed +20% probability that the machine breaks down in the next month.".

Enterprise Support

If you need help integrating explainable machine learning methods in your company, please contact me at morriskurz@gmail.com

Credits

All credits to the original research and development of the C++ and Python code go to Lundberg and his collaborators.

Options

Change prediction column name
Select if you want to change the name of the column containing the prediction.
Prediction column name
The name of the column that will contain the prediction of the tree ensemble model
Append overall prediction confidence
The confidence of the predicted class. It is the maximum of all confidence values (which can be appended separately).
Append individual class probabilities
For each class the prediction confidence. It's the number of trees predicting to the current class (as per column name) divided by the total number of trees.
Suffix for probability columns
Here a suffix for the names of the class probability columns can be entered.
Use soft voting
Per default ("hard voting") the class that receives the most votes is predicted. In case of "soft voting" the probabilities of all trees are aggregated and the class with the highest probability is returned. In order for this to work properly, the tree ensemble model needs to contain the class distributions. This can be specified in the learner node by selecting the option "Save target distribution in tree nodes". Setting this option on models that do not have the target distributions saved, will cause a warning message to be issued.
Show explanation
Activate this to compute the SHAP values. If this box is unchecked, the node is equivalent to a simple predictor node.
Compute interactions
Computes the Shapley interaction values exactly. WARNING: Computationally expensive. The runtime increases by 2 * #features compared to the SHAP values without interactions.
Positive class
Select the value from the class column that stands for the "positive" class. In most use cases, the positive class corresponds to the class of interest. For example: In churn prediction, the positive class could be the customers who will cancel the subscription. The node will automatically select the first possible option when the node is not configured.

Input Ports

Icon
The output of the Tree Ensemble Learner. Remember to activate the option "Save target distribution in tree nodes" in the learner node.
Icon
Data to be predicted and explained.

Output Ports

Icon
The input data along with prediction columns and corresponding SHAP values.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.