SHAP Summarizer

This Component can be used before the bottom input port of SHAP Loop Start. This technique will use k-means to summarize the validation set and create a sampling table to use when creating coalitions.
The created sampling table is large n rows, each row is a different prototype of the data. This n can be adjusted from the configuration dialogue of the Component. The n default value is 100.
The output sampling table has, for each of the n clusters created by k-means, a prototype row and a column "SHAP Summarizer Sampling weight" that can be used by the SHAP Loop Start node.
This Component can summarize data of the following domains: Number (integer), Number (double) and String.

DISCLAIMER : the Component statistical sampling is not always guaranteed when you provide String columns in the input table. Current computer science research is still looking for a more solid solution than training k-means via one-hot encoding-decoding of categorical columns.

Options

Normalize Data
If this option is selected all columns will be normalized using Min-Max normalization between 0 and 1. In this case the sampling table output will be denormalized to its original domain.
Number of Prototypes
The number of prototypes to generate as summary of the input table.

Input Ports

Icon
Data to be summarized containing all the features SHAP Loop Start needs. Supported domains: Numeric (double), Numeric (integer), String.

Output Ports

Icon
A table with a prototype for each cluster with all the features with average value of the belonging cluster. Additionaly a column called "SHAP Summarizer Sampling weight" that can be used by SHAP Loop Start as sampling weight.

Nodes

Extensions

Links