SHAP is an acronym for SHapley Additive exPlanations and represents a unified approach to explain the predictions of any machine learning model. For a single output (e.g. probability of the positive class in a binary classification) it assigns to each feature a so-called Shapley Value that quantifies how this particular feature changed the output. If you have multiple outputs, multiple such Shapley Value sets are calculated. The sum of all Shapley Values for a single output adds up to the deviation from the mean prediction (aka null prediction), which is the prediction the model would have made if no feature had been available. KNIME Analytics Platform also offers a second means to calculate Shapley Values via the Shapley Values loop nodes. In contrast to these, SHAP allows you to also find sparse explanations via regularization with the LASSO. The advantage of this is that you can pick the maximal number of features you want to have in your explanation, which makes the explanations far more understandable in cases with hundreds or thousands of features. If a maximal number of features is specified, SHAP will find for each explainable row those features that have the most impact on its prediction and then only consider those when calculating the Shapley Values.
The first input table of this node contains the rows of interest (ROI), for which an explanation is required. The SHAP algorithm replaces certain subsets of features of a ROI and observes how the model output changes. These replacement features are taken from the second input table. Note that in contrast to the Shapley Values and LIME loops, this sampling table should not be much larger than 100 rows so as to keep the runtime reasonable (don't worry, SHAP is usually still on par with the other methods). The output of the SHAP Loop Start node contains only those columns specified as feature columns in the dialog. This table has to be predicted by the model, whose predictions you want to better understand, and then fed into the SHAP Loop End node to calculate the explanations. Note that the SHAP loop has n + 1 iterations where n is the number of ROIs (rows of the first input table). The first iteration is special as it doesn't explain a ROI like the other iterations but is used to estimate the mean prediction by letting the model predict the sampling table (second input of the SHAP Loop Start). In the loop body you should use your model to predict the data produced by the SHAP Loop Start node and feed the table containing the appended predictions into the SHAP Loop End node. Note that SHAP can only explain numerical predictions, so you have to configure your predictor to output probabilities in case of a classification task.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Machine Learning Interpretability Extension from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.