Conditional Box Plot

A box plot displays robust statistical parameters: minimum, lower quartile, median, upper quartile, and maximum. These parameters are called robust, since they are not sensitive to extreme outliers.

The conditional box plot partitions the data of a numeric column into classes according to another nominal column and creates a box plot for each of the classes.

A box plot for one numerical attribute is constructed in the following way: The box itself goes from the lower quartile (Q1) to the upper quartile (Q3). The median is drawn as a horizontal bar inside the box. The distance between Q1 and Q3 is called the interquartile range (IQR). Above and below the box are the so-called whiskers. They are drawn at the minimum and the maximum value as horizontal bars and are connected with the box by a dotted line. The whiskers never exceed 1.5 * IQR. This means if there are some data points which exceed either Q1 - (1.5 * IQR) or Q3 + (1.5 * IQR) than the whiskers are drawn at the first value in these ranges and the data points are drawn separately as outliers. For the outliers the distinction between mild and extreme outliers is made. As mild outliers are those data points p considered for which holds: p < Q1 - (1.5 * IQR) AND p > Q1 - (3 * IQR) or p > Q3 + (1.5 * IQR) AND p < Q3 + (3 * IQR). In other words mild outliers are those data points which lay between 1.5 * IRQ and 3 * IRQ. Extreme outliers are those data points p for which holds: p < Q1 - (3 * IQR) or p > Q3 + (3 * IQR). Thus, three times the box width (IQR) marks the boundary between "mild" and "extreme" outliers. Mild outliers are painted as dots, while extreme outliers are displayed as crosses. In order to identify the outliers they can be selected and hilited. This provides a quick overview over extreme characteristics of a dataset.

The node supports custom CSS styling. You can simply put CSS rules into a single string and set it as a flow variable 'customCSS' in the node configuration dialog. You will find the list of available classes and their description on our documentation page.

Options

Category Column
Select the column that contains the category values.
Included columns
Select the columns for which you wish to plot boxes. Missing values in data columns will be ignored with a corresponding warning messages.
Selected Column
Select the column that contains the numeric values.
Report on missing values
Check to get the detailed warning messages in the view about missing values and enable 'Missing values' class. If not checked, missing values will be ignored without raising a warning. 'Missing values' class will not be present.
Include 'Missing values' class
If checked, missing values in the category column will form a separate class named "Missing values". Otherwise they will be ignored.
Fail on special doubles
If checked the option will let the execution of the node fail, when it encounters a special double in the input data. This can either be NaN, negative or positive infinity values. When unchecked special doubles are treated the same as missing values and be reported together if Report on missing values is set.

General Plot Options

Title (*)
The chart title.
Subtitle (*)
The chart subtitle.
Display fullscreen button
Check to display a button which switches the view into fullscreen mode. The button is only available in the KNIME WebPortal.
Image
Settings for image generation.
Background color
The color of the background.
Data area color
The background color of the data area, within the axes.
Apply colors by category
Check to apply a color scheme to the boxes by category. The colors can be defined as a table with a single column which contains the category names and color settings applied accordingly.
If the table with desired color scheme is not provided, a standard color scheme will be used.
Box color
The filling color of the boxes. Not available if the previous option is checked.
Show warnings in view
If checked, warning messages will be displayed in the view when they occur.

Control Options

Enable view controls
Check to enable controls in the chart.
Enable column selection
Check to enable the selection of the numeric column to show the box plot for.
Enable Title editing
Check to enable the editing of the title within the view.
Enable Subtitle editing
Check to enable the editing of the subtitle within the view.
Enable switching 'Missing values' class
Check to enable to show and hide 'Missing values' class in the view.

Input Ports

Icon
Data table containing the categories and values to be plotted in a box plot.
Icon
Data table containing the category names with colors applied.

Output Ports

Icon
SVG image of the box plot.

Views

Interactive View: Conditional Box Plot
A JavaScript implementation of a Box Plot.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.