SHAP Summarizer

This Component can be used before the bottom input port of SHAP Loop Start. This technique will use k-means to summarize the validation set and create a sampling table to use when creating coalitions.
The created sampling table is large n rows, each row is a different prototype of the data. This n can be adjusted from the configuration dialogue of the Component. The n default value is 100.
The output sampling table has, for each of the n clusters created by k-means, a prototype row and a column "SHAP Summarizer Sampling weight" that can be used by the SHAP Loop Start node.
This Component can summarize data of the following domains: Number (integer), Number (double) and String.

DISCLAIMER : the Component statistical sampling is not always guaranteed when you provide String columns in the input table. Current computer science research is still looking for a more solid solution than training k-means via one-hot encoding-decoding of categorical columns.

Options

Normalize Data: If this option is selected all columns will be normalized using Min-Max normalization between 0 and 1. In this case the sampling table output will be denormalized to its original domain.
Number of Prototypes: The number of prototypes to generate as summary of the input table.

Input Ports

: Data to be summarized containing all the features SHAP Loop Start needs. Supported domains: Numeric (double), Numeric (integer), String.

Output Ports

: A table with a prototype for each cluster with all the features with average value of the belonging cluster. Additionaly a column called "SHAP Summarizer Sampling weight" that can be used by SHAP Loop Start as sampling weight.

Nodes

Extensions

No modules found