Equal Size Sampling

Removes rows from the input data set such that the values in a categorical column are equally distributed. This can be useful, for instance if a learning algorithm is prone to unequal class distributions and you want to downsize the data set so that the class attributes occur equally often in the data set.

The node will remove random rows belonging to the majority classes. The rows returned by this node will contain all records from the minority class(es) and a random sample from each of the majority classes, whereby each sample contains as many objects as the minority class contains.

Options

Nominal Column: Select the class column here. The node will run over the data set once to count the occurrences in this selected column and then do the filtering in a second pass. Note that missing values in this column are treated as a separate category (can also build the minority class).
Use exact sampling: If selected, the final output will be determined up-front. Each class will have the same number of instances in the output table. This sampling is slightly more memory expensive as each class will need to be represented by a bit set containing instances of the corresponding rows. In most cases it is save to select this option unless you have very large data with many different class labels.
Use approximate sampling: If selected, the final output will be determined on the fly. The number of occurrences of each class may slightly differ as the final number can't be determined beforehand.
Enable static seed: If selected, the removal of rows is driven by a static seed (result is reproducable).

Input Ports

: Arbitrary input data.

Output Ports

: The input data with fewer rows.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Base nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.4

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.4.4.v202504301443

On NodePit since: 2024-12-06

Last update: 2025-06-13

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!