Numeric Outliers

This node detects and treats the outliers for each of the selected columns individually by means of interquartile range (IQR).

To detect the outliers for a given column, the first and third quartile (Q₁, Q₃) is computed. An observation is flagged an outlier if it lies outside the range R = [Q₁ - k(IQR), Q₃ + k(IQR)] with IQR = Q₃ - Q₁ and k >= 0. Setting k = 1.5 the smallest value in R corresponds, typically, to the lower end of a boxplot's whisker and largest value to its upper end.
Providing grouping information allows to detect outliers only within their respective groups.

If an observation is flagged an outlier, one can either replace it by some other value or remove/retain the corresponding row.

Missing values contained in the data will be ignored, i.e., they will neither be used for the outlier computation nor will they be flagged as an outlier.

Options

Outlier Selection

Outlier selection: Allows the selection of columns for which outliers have to be detected and treated. If "Compute outlier statistics on groups" (see tab "Group Settings") is selected, the outliers for each of the columns are computed solely with respect to the different groups.

General Settings

interquartile range multiplier (k)

Allows scaling the interquartile range (IQR). The default is k = 1.5. Larger values will cause less values to be considered outliers.

Quartile calculation

Allows to specify how the quartiles are computed.

Use heuristic (memory friendly): This option ensure that the quartiles are calculated using a heuristical approach. This choice is recommended for large data sets due to its low memory requirements. However, for small data sets the results of this approach can be quite far away from the accurate results.
Full data estimate using: This option typically creates more accurate results than its counterpart, but also requires far more additional memory. Therefore, we recommend this option for smaller data sets.
Since the value of the quartiles often lies between two observations, this option additionally allows to specify how the actual value is computed, which is encoded by the various estimation types (LEGACY, R_1, ..., R_9). A detailed explanation of the different types can be found here.

Update domain

If checked the domain of the selected outlier columns is updated.

Outlier Treatment

Apply to

Allows to apply the selected treatment strategy to

All outliers
Outliers below lower bound
Outliers above upper bound

Treatment option

Defines three different strategies to treat outliers:

Replace outlier values: Allows to replace outliers based on the selected "Replacement strategy"
Remove outlier rows: Removes all rows from the input data that contain in any of the selected columns at least one outlier
Remove non-outlier rows: Retains only those rows of the input data that contain at least one outlier in any of the selected columns

Replacement strategy

Defines two different strategies to replace outliers:

Missing values: Replaces every outlier by a missing value
Closest permitted value: Replaces the value of each outlier by the closest value within the permitted interval R. If the column type is an integer the replacement value is the closest integer within the permitted interval.

Note that this option is only enabled if outliers have to be replaced.

Group Selection

Compute outlier statistics on groups: If selected, allows the selection of columns to identify groups. A group comprises all rows of the input exhibiting the same values in every single column (similar to the GroupBy node). The outliers will finally be computed with respect to each of the individual groups.
Column Filter: Move the columns defining the groups into the Include list. The group definition will take priority, i.e. if a column is selected for both group definition and outlier handling, it will be used to define groups (no outlier handling done for that column).

Memory Policy

Process groups in memory: Processes the groups in the memory. This option comes with higher memory requirements, but is faster since the table does not need any additional treatment.

Input Ports

: Numeric input data to evaluate + optional group information

Output Ports

: Data table where outliers were either replaced or rows containing outliers/non-outliers were removed
: Data table holding the number of members, i.e., non-missing values and outliers as well as the lower and upper bound for each outlier groups
: Model holding the permitted interval bounds for each outlier group and the outlier treatment specifications

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Statistics Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191419

On NodePit since: 2025-07-02

Last update: 2025-08-12

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!