Partitioning

The input table is split row-wise into two partitions, for instance into a train and test data set. The two partitions are available at the two output ports.

Options

Absolute
Specify the absolute number of rows in the first partition. If there are less rows than specified here, all rows are entered into the first table, while the second table contains no rows.
Relative
The percentage of the number of rows in the input table that are in the first partition. It must be between 0 and 100, inclusively.
Take from top
This mode allows you to specify the number of top-most rows to be put into the first output table, with the remainder going into the second table.
Linear sampling
This mode always includes the first and the last row and selects the remaining rows linearly over the whole table (e.g. every third row). This is useful to downsample a sorted column while maintaining minimum and maximum value.
Draw randomly
Random sampling of all rows, you may optionally specify a fixed seed (see below).
Stratified sampling
Check this button if you want stratified sampling, i.e. the distribution of values in the selected column is (approximately) retained in the output tables. You may optionally specify a fixed seed (see below).
Use random seed
If either random or stratified sampling is selected, you may enter a seed to ensure reproducible results upon re-execution. Entering a seed makes row selection random but fixed, meaning the same rows will be selected each time. If you do not specify a seed different rows will be selected each time.

Input Ports

Icon
Table to partition.

Output Ports

Icon
Rows from the input table that have been selected as per the node configuration.
Icon
Remaining rows from the input table.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.