0 ×

**KNIME Machine Learning Interpretability Extension** version **4.3.0.v202011191524** by **KNIME AG, Zurich, Switzerland**

LIME stands for Local Interpretable Model-agnostic Explanations.
It
tries to explain individual predictions of a black box model by
training a local surrogate
model that is easier to understand (e.g. a
linear model).
The intuition behind this approach is that a globally
nonlinear model might actually be
linear within a small local region
of the feature space.
In order to learn this kind of local surrogate model,
LIME creates a dataset of perturbed
rows for a single row of interest, predicts it with the black box model
and then learns a local
surrogate, which approximates the predictions of the black box model.
For more details on the algorithm please see the paper
*"Why Should I Trust You?" Explaining the Predictions of Any
Classifier
*
by Ribeiro et al.

The top input of this node contains the rows of interest for which the predictions of your model should be explained. Each row in the top table corresponds to one loop iteration, so its size will directly affect the runtime of the loop. The bottom input table is used for sampling, which, in this case, means that column statistics are calculated for all of the feature columns. These statistics are later used to sample new values for the feature columns.

In each iteration of the loop one row of interest is explained. This node produces two tables used for these explanations. The top table contains rows, which are created by sampling according to the statistics of the feature columns in the sampling table. Note that numeric columns (including bit and byte vectors) are assumed to be distributed normally. This table has to be predicted with the Predictor node appropriate to your model at hand. The bottom table is intended for training a local surrogate model (e.g. a linear model). It differs from the top table as follows:

- Nominal feature columns are replaced by Double columns where a 1.0 indicates that the sampled value matches that of the row of interest.
- Bit and byte vector columns are split up into multiple columns, one for each element.
- A weight column is appended, which indicates how similar the sampled row is to the row of interest. A higher value indicates greater similarity.

- Predict the top table with the black box model (predictions must be numerical i.e. in case of a classification model the class probabilities).
- Append the prediction column(s) to the bottom table.
- Train a local surrogate model that uses the features from the bottom table, weights each row according to the weight column, and approximates the predictions of the black box model. The currently recommended Learner for this task is the H2O Generalized Linear Model Learner (Regression).
- Extract and collect the local explanations from the local surrogate model (e.g. the linear coefficients) in one of our Loop End nodes.

Since the number of elements in a vector column is not known during configuration, the spec for the second table can't be generated if vectors are among the feature columns. In this case downstream nodes can only be configured once this node has been executed.

- Feature columns
- The feature columns, which are used by your model. These columns will be contained in the top table that has to be predicted by your model. For nonvector columns, the bottom table will also contain one column per feature where nominal columns are replaced by numeric columns.
- Retain non-feature columns
- If this option is set, all non-feature columns of the current ROI are appended to the rows in the first output table of this node. This is useful if you want to evaluate only a subset of the actual features your model uses. Note that the second output table is not affected by this option.
- Explanation set size
- The number of rows to use for learning the local surrogate model for a single incoming row of interest.
- Sample around instances
- If checked, samples for numerical columns are drawn around the value of the current row of interest. Otherwise samples are drawn around the mean of the feature (which is calculated from the sampling table).
- Use seed
- Using a seed allows you to reproduce the results of the loop. If this box is checked the seed displayed in the text box is used, otherwise a new seed is generated for each execution.
- Use element names for vector features
- Vector columns like Bit and Byte vectors can contain names for their individual elements. If this option is set, these names are used if possible i.e. if the number of element names matches the element count. If this option is not set or the number of names doesn't match the number of elements, new names based on the vector name are created.
- Manual kernel width
- LIME uses an exponential kernel to calculate the similarity of a
sampled row to the row explained.
The exponential kernel is defined
as
*sqrt(exp(-(d^2) / w^2))*where*d*is the Euclidean distance of two datapoints and*w*is the kernel width. Intuitively, the kernel width controls how local the surrogate model is. A larger kernel width means a larger region around the row that needs to be explained is considered. By default the kernel width LIME uses for its exponential kernel is*sqrt(number of features) * 0.75*but by checking this box it is also possible to provide a custom kernel width.

- Table containing the rows to be explained.
- Table containing rows used to perturb rows in the first table.

- This table contains samples that have to be predicted by the Predictor node corresponding to your particular model.
- This table contains the data used to learn a
local surrogate model including a
**weight**column. (The name of the column holding the weights is output as a flow variable with the name*weightColumnName*).

- Partitioning (13 %)
- Row Filter (12 %) Streamable
- Random Forest Predictor (10 %) Streamable
- Reference Row Filter (8 %)
- Row Sampling (8 %)
- Show all 20 recommendations

- Random Forest Predictor (27 %) Streamable
- LIME Loop Start (27 %)
- k-Means (9 %)
- Send to Power BI (9 %)
- SHAP Loop End (9 %)
- Show all 7 recommendations

- 01_Compute_LIMEs (KNIME Hub)
- 01_Compute_LIMEs (KNIME Hub)
- 03_Titantic_Prediction_Explanations (KNIME Hub)
- 03_Titantic_Prediction_Explanations (KNIME Hub)

To use this node in KNIME, install KNIME Machine Learning Interpretability Extension from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter, or chat on Gitter!

Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.