2D Kernel Density Plot

This node plots a 2D Kernel density function based on an incoming data table

Kernel Estimators

A variety of kernel estimators are available, as shown in the table:

NameFunction
UNIFORMK(u) = 0.5 (|u| ≤ 1), 0 (|u) > 1); aka 'Uniform' or 'Boxcar'
TRIANGLEK(u) = 1-|u| (|u| ≤ 1), 0 (|u) > 1)
EPANECHNIKOVK(u) = 3•(1-u²)/4 (|u| ≤ 1), 0 (|u) > 1)
QUARTICK(u) = 15•(1-u²)²/16 (|u| ≤ 1), 0 (|u) > 1)
TRIWEIGHTK(u) = 35•(1-u²)³/32 (|u| ≤ 1), 0 (|u) > 1)
TRICUBEK(u) = 70•(1-|u|³)³/81 (|u| ≤ 1), 0 (|u) > 1)
GAUSSIANK(u) = e^(-u²/2) / √(2π)
COSINUSK(u) = (π/4)•cos(πu/2) (|u| ≤ 1), 0 (|u) > 1)
LOGISTICK(u) = 1/(e^u + 2 + e^-u)
SIGMOIDK(u) = 2/(π•(e^u + e^-u))
SILVERMANK(u) = 0.5•e^(-|u|/√2)•sin((|u|/√2) + (π/4))

In the 2D case, u is a vector. The 'Kernel Symmetry' option controls how the 1-dimensional 'Kernel Estimator' is applied, as shown in the table

NameFunction
RADIAL_MULTIPLICATIVEThe kernel estimator is applied multiplicatively across dimensions, e.g. K(u) = K(u(x)) • K(u(y)), where u(x) is the x-dimension component of u, and u(y) the y-dimension component
SPHERICALThe kernel estimator is applied spherically symmetrically - i.e. any point of the same distance from the kernel estimator center has the same value. This is equivalent to K(u) = K(√uᵀu)

Bandwidth estimation

The bandwidth effects the 'smoothness' of the kernel density function. There are a number of methods to automatically guess a suitable bandwidth. In this node we only offer three options, as shown in the table below. For further details see the Wikipedia Multivariate Kernel Density estimation page. Bandwidths and estimation methods are set independantly for each dimension. The bandwidth matrix, H is a diagonal matrix. Currently off-diagonal elements are not supported.

The methods offered are:

NameFunction
SilvermanBandwidth is estimated using the Silverman approximation (H = stdDev * [4 / ((d + 2) * n)]^(1 / (d + 4)), where d is thenumber of dimensions and n the number of datapoints)
ScottBandwidth is estimated using the Scott approximation (H = stdDev / n^(1 / (d + 4)), where d is thenumber of dimensions and n the number of datapoints)
User DefinedThe user specifies the bandwidth (H)
All methods are for a constant bandwidth across the whole data series

This node was developed by Vernalis Research. For feedback and more information, please contact knime@vernalis.com

Options

Kernel Options

X-Values Column
The column in the incoming table containing the x-values from which to generate the kernel(s)
X-Values Column
The column in the incoming table containing the y-values from which to generate the kernel(s)
Kernel Estimator
The Kernel function to apply at each data point. See above for details of the individual kernel estimators
Kernel Symmetry
The kernel symmetry function to be applied to combined kernel estimators from the x- and y-dimensions. See above for further details.
X-Values Column Bandwidth
The bandwidth estimation method to used for the x-dimension. See above for details
Bandwidth (H x)
User-defined bandwidth
Y-Values Column Bandwidth
The bandwidth estimation method to used for the y-dimension. See above for details
Bandwidth (H y)
User-defined bandwidth
Number of grid points along axis
The number of grid points to calculate the kernel density function value for
Number of outliers (% of dataset)
The %age of the dataset to show as outliers. Outliers are defined here as the first n points when sorted by increasing value of the kernel density function
Outlier Size
The size of the outlier symbols
Outlier shape
The plot symbol to use for the outliers
Outlier Colour
The colour of the outlier symbols
Auto-range x-Axis
Should the x-Axis range be calculated automatically?
x-Axis Range
The manual axis range
Auto-range y-Axis
Should the y-Axis range be calculated automatically?
y-Axis Range
The manual axis range
Show legend
The colour spectrum or contour colours
Show bandwidths (H) on axis labels
Should the bandwidth be shown on the axis label (or in the legend if a grouping column is selected)?
Upper bound
The colour used for the highest density regions
Lower bound
The colour to use for the lowest density regions
Number of Contours
The number of contours to plot. If this value is '0', then a continuous colour gradient will be used
Fill Contours
Should the contours be filled with solid colour, or only drawn as contour lines? Filled contours show all areas between contour levels as the same block colour
Contour Interval Schema
The method used to determine contour intervals. Options are 'LINEAR', where contours are spaced equally across the intensity range, and 'QUANTILE' where the contours are spaced to give equal areas of each contour interval

General Plot Options

Type of Image
The type of the created image can either be png or svg. PNGs are mostly smaller, SVGs provide details about plot and the possibility to be changed individually
Title of Graph
The title of the graph shown above the generated image. If the title is not activated, no title will be shown
Width of Image (in pixel)
The width of the generated image, not the plot width
Height of Image (in pixel)
The height of the generated image, not the plot height
Background Colour
The color of the background of the plot. Hence this color is used for the empty space in a plot
Plot background Alpha
The transparency of the plot background can be modified using an additional alpha value. An alpha value of 1 does not change the background transparency. Decreasing the alpha value will increase the plot background transparency
Scale Font Size
Factor changes the font sizes within the JFreeChart view. A value greater the 1 increases all view fonts, a value between 0 and 1 decrease them

Input Ports

Icon
The incoming data table for the plot to be generated from

Output Ports

Icon
The image of the plot (SVG or PNG)

Views

2D Kernel Density Plot
View showing the 2D Kernel Density Plot

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.