This component generates an interactive visualization to help the user understand their model’s behavior on a single example data point. It works in two main steps.
COUNTERFACTUALS
1) The component synthetically oversamples the example dataset (data input 1) by shuffling features and deploying the SMOTE algorithm after that. This expanded dataset is then searched for nearby data points that when scored by the input model output the classification selected in the component configuration dialog. We call these data points that generate the desired classification Counterfactuals.
*Nearby data points in this instance means as little numeric variation as possible. For example we might call <1,1> and <1,1.1> close but <1,1> and <17,23> distant.
FEATURE IMPORTANCE FROM GLM
2) Next we define a neighborhood around the original data point. This might mean, for example, all data points with less than 0.5 numeric variation from the original point. Specifically we use the Manhattan Distance on the normalized data points. We use the smallest neighborhood we can that includes examples of the desired class from the configuration dialog. On this set of data points we train a Surrogate GLM to mimic the input model. From this model we extract and normalize the coefficients and display these in a bar chart as a local feature importance measure.
* The Manhattan Distance between two vectors is the sum of the differences between each element. For example the Manhattan Distance between <1,2> and <1.5,1> is:
|1.5-1| + |1-2| = 0.5 + 1 = 1.5
HOW TO USE
1) Drag and drop the component into your workflow
2) To the first input port connect a model captured by the integrated deployment framework, such as a model from the AutoML component
3) To the second input port connect a sample set of data points. These will be used to generate artificial data points to be used as counterfactual candidates.
4) To the third input port connect a table containing one row, the data point around which you want to generate model explanations and generate counterfactuals for.
5) In the configuration dialog select the target column and the class desired in the counterfactuals, this must be different from the value in the data point from input 3.
6) Feel free to tune the other configuration parameters. Higher oversampling and permutation rates will reduce sparsity in the data but increase processing time. If these settings are raised or if you are using a large sample set in input 2 it is recommended to increase the expansion rate parameter.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.