This node detects and treats the outliers for each of the selected columns individually by means of interquartile range (IQR).
To detect the outliers for a given column, the first and third quartile (Q1, Q3)
is computed. An observation is flagged an outlier if it lies outside the range R = [Q1 -
k(IQR), Q3 + k(IQR)] with IQR = Q3 - Q1 and k >= 0. Setting k = 1.5
the smallest value in R corresponds, typically, to the lower end of a boxplot's whisker and largest
value to its upper end.
Providing grouping information allows to detect outliers only within
their respective groups.
If an observation is flagged an outlier, one can either replace it by some other value or remove/retain the corresponding row.
Missing values contained in the data will be ignored, i.e., they will neither be used for the outlier computation nor will they be flagged as an outlier.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Statistics Nodes from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!