Auto-Binner

This node allows to group numeric data in intervals - called bins. There are two naming options for the bins and two methods which define the number and the range of values that fall in a bin. Please use the "Numeric Binner" node if you want to define custom bins.

Options

Column Selection:
Columns in the include list are processed separately. The columns in the exclude list are omitted by the node.
Binning Method:
Use Fixed number of bins for bins with equal width over the domain range or bins that have an equal frequency of element occurrences. Use Sample quantiles to produces bins corresponding to the given list of probabilities. The smallest element corresponds to a probability of 0 and the largest do probability of 1. The applied estimation method is Type 7 which is the default method in R, S and Excel.
Bin Naming:
Use Numbered for bins labeled by an integer with prefix "Bin", Borders for labels using '"(a,b]"' interval notation or Midpoints for labels that show the midpoint of the interval.
Force integer bounds
Forces the bounds of the interval to be integers. The decimal bounds will be converted so that the lower bound of the first interval will be the floor of the lowest value and the upper bound of the last interval will be the ceiling of the highest value. The edges that separate the intervals will be the ceiling of the decimal edges. Duplicates of edges will be removed.

Examples:
[0.1,0.9], (0.9,1.8] -> [0,1], (1,2]
[3.9,4.1], (4.1,4.9], (4.9,5.1] -> [3,5], (5,6]
Replace target column(s):
If set the columns in the include list are replaced by the binned columns otherwise columns named with suffix '[binned]' are appended.
Advanced formatting
If enabled the format of the doubles in the labels can be configured by the options in this tab.
Output format
Specify the output format. The number 0.00000035239 will be displayed as 3.52E-7 with Standard String , 0.000000352 with Plain String (no exponent) and 352E-9 with Engineering String .
Precision
The scale of the double values to round to. If the scale is reduced the specified rounding mode is applied.
Precision mode
The type of precision to which the values are rounded. Decimal places, the default option rounds to the specified decimal places, whereas significant figures rounds to significant figures or numbers.
Rounding mode
The rounding mode which is applied when double values are rounded. The rounding mode specifies the rounding behavior. Seven different rounding modes are available:
  • UP: Rounding mode to round away from zero.
  • DOWN: Rounding mode to round towards zero.
  • CEILING: Rounding mode to round towards positive infinity.
  • FLOOR: Rounding mode to round towards negative infinity.
  • HALF_UP: Rounding mode to round towards "nearest neighbor" unless both neighbors are equidistant, in which case round up.
  • HALF_DOWN: Rounding mode to round towards "nearest neighbor" unless both neighbors are equidistant, in which case round down.
  • HALF_EVEN: Rounding mode to round towards the "nearest neighbor" unless both neighbors are equidistant, in which case, round towards the even neighbor.
For a detailed description of each rounding mode please see the Java documentation .

Input Ports

Icon
Data to be categorized

Output Ports

Icon
Data with bins defined
Icon
The PMML Model fragment containing information how to bin

Popular Predecessors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.