Icon

TreeSHAP Example Workflow

TreeSHAP Example Workflow
Why Explainable Machine Learning:Discovering patterns and structures in data in an automated manner is acore component of data science. But how do we understand the decisionssuggested by these systems in order that we can trust them? SHAP valuesmay be the answer. What are SHAP values?SHAP values interpret the impact of having a certain value for a givenfeature in comparison to the prediction we'd make if that feature took somebaseline value. The beautiful thing about SHAP values is the intuitive interpretation. Everymodel has an expected output, the average prediction. The model predictionfor a data row is the expected output plus the summation of SHAPvalues.This leads to intuitive explanations, for example in this dataset: "Thehigh tenure contributed -15% probability that this customer churns in thenext month." But what about the nodes developed by KNIME?The nodes developed by KNIME are applicable to any model, but this comesat the cost of runtime and accuracy. SHAP values can be computed exactlyand with blazing speed for tree models, which is implemented in theTreeSHAP nodes. Furthermore, you can calculate SHAP interaction valuesin this package. How can this benefit our use case?1. Control and trustBeing able to verify your model on a case-to-case basis opens up the blackbox of big tree models and builds trust. Extracting the features your modelthinks are important, you can detect potential errors in the model and delivermore robust business decisions as a result.2. Find the actual most important featuresSumming up the SHAP values for each feature in the whole dataset, you canfind the actual most important features for the model.3. Discover interesting relations between featuresBy plotting the results, you can find interesting relations between thefeatures. For examples, see the following worklow. GLOBAL IMPORTANCE How do I use these nodes?The Tree SHAP Predictor Nodes are used as a substitute to thecorresponding Tree Ensemble Predictor. Simply replace every TreeEnsemble Predictor with a node from the Tree SHAP package to get started. What is the positive class in my use case?The positive class is the class of interest. The interpretation of the algorithmis always with respect to the positive class, i.e. a SHAP value of +0.2 meansa +0.2 contribution to the prediction of the positive class. For binary classification, the SHAP values are simply sign-reversed whenswitching the positive class. LOCAL IMPORTANCEAs an example, we look at the two most important columns:tenure and the type of contract. The trend in the scatter plotshows that with increasing tenure, the SHAP value for tenureincreases, i.e. the higher the tenure, the higher the predictionof a customer churning.The color coded contract type shows that this is especiallytrue for monthly based contracts. The one year and two yearcontracts are more centered around the zero SHAP value. SHAP Interaction ValuesSHAP Interaction values describe the interaction of features,e.g. if the combination of tenure and contract type had apositive or negative impact on the churn probability.In this example, we plot the tenure (X-Axis) against theinteraction value of tenure and monthly charges (Y-Axis) withthe monthly charges as color. As you would expect,customers with a high monthly charge and a high tenure aremore likely to churn, i.e. a positive SHAP value. But for atenure less than twelve months, those customers with a highmonthly charge are less likely to churn. This is an interestingrelation that should be explored further. Read the TelCo dataTrain a tree ensemble model with 100 trees. IMPORTANT: In order forthe TreeSHAP nodes to work, you need to enable the"Save target distribution" optionCompute the SHAP valuesaccurately and blazingly fastin parallel.80:20 splitCalculate the sum of squaresof each SHAPcolumnGlobal importance ofthe features.Higher values mean higherimportance for the model.Look at the viewof this componentCompute the SHAP valuesaccurately and blazingly fastin parallel.Look at the viewof this componentCSV Reader Tree EnsembleLearner TreeSHAP TreeEnsemble Partitioning Math Formula(Multi Column) Row Filter Bar Chart Dependence Plot TreeSHAP TreeEnsemble Dependence Plot Why Explainable Machine Learning:Discovering patterns and structures in data in an automated manner is acore component of data science. But how do we understand the decisionssuggested by these systems in order that we can trust them? SHAP valuesmay be the answer. What are SHAP values?SHAP values interpret the impact of having a certain value for a givenfeature in comparison to the prediction we'd make if that feature took somebaseline value. The beautiful thing about SHAP values is the intuitive interpretation. Everymodel has an expected output, the average prediction. The model predictionfor a data row is the expected output plus the summation of SHAPvalues.This leads to intuitive explanations, for example in this dataset: "Thehigh tenure contributed -15% probability that this customer churns in thenext month." But what about the nodes developed by KNIME?The nodes developed by KNIME are applicable to any model, but this comesat the cost of runtime and accuracy. SHAP values can be computed exactlyand with blazing speed for tree models, which is implemented in theTreeSHAP nodes. Furthermore, you can calculate SHAP interaction valuesin this package. How can this benefit our use case?1. Control and trustBeing able to verify your model on a case-to-case basis opens up the blackbox of big tree models and builds trust. Extracting the features your modelthinks are important, you can detect potential errors in the model and delivermore robust business decisions as a result.2. Find the actual most important featuresSumming up the SHAP values for each feature in the whole dataset, you canfind the actual most important features for the model.3. Discover interesting relations between featuresBy plotting the results, you can find interesting relations between thefeatures. For examples, see the following worklow. GLOBAL IMPORTANCE How do I use these nodes?The Tree SHAP Predictor Nodes are used as a substitute to thecorresponding Tree Ensemble Predictor. Simply replace every TreeEnsemble Predictor with a node from the Tree SHAP package to get started. What is the positive class in my use case?The positive class is the class of interest. The interpretation of the algorithmis always with respect to the positive class, i.e. a SHAP value of +0.2 meansa +0.2 contribution to the prediction of the positive class. For binary classification, the SHAP values are simply sign-reversed whenswitching the positive class. LOCAL IMPORTANCEAs an example, we look at the two most important columns:tenure and the type of contract. The trend in the scatter plotshows that with increasing tenure, the SHAP value for tenureincreases, i.e. the higher the tenure, the higher the predictionof a customer churning.The color coded contract type shows that this is especiallytrue for monthly based contracts. The one year and two yearcontracts are more centered around the zero SHAP value. SHAP Interaction ValuesSHAP Interaction values describe the interaction of features,e.g. if the combination of tenure and contract type had apositive or negative impact on the churn probability.In this example, we plot the tenure (X-Axis) against theinteraction value of tenure and monthly charges (Y-Axis) withthe monthly charges as color. As you would expect,customers with a high monthly charge and a high tenure aremore likely to churn, i.e. a positive SHAP value. But for atenure less than twelve months, those customers with a highmonthly charge are less likely to churn. This is an interestingrelation that should be explored further. Read the TelCo dataTrain a tree ensemble model with 100 trees. IMPORTANT: In order forthe TreeSHAP nodes to work, you need to enable the"Save target distribution" optionCompute the SHAP valuesaccurately and blazingly fastin parallel.80:20 splitCalculate the sum of squaresof each SHAPcolumnGlobal importance ofthe features.Higher values mean higherimportance for the model.Look at the viewof this componentCompute the SHAP valuesaccurately and blazingly fastin parallel.Look at the viewof this componentCSV Reader Tree EnsembleLearner TreeSHAP TreeEnsemble Partitioning Math Formula(Multi Column) Row Filter Bar Chart Dependence Plot TreeSHAP TreeEnsemble Dependence Plot

Nodes

Extensions

Links