TVT (Train-Validate-Test) Split

Splits a single input dataset into training, validation, and testing subsets. The proportion of the training set is controlled by the user; the remaining data is split equally into the validation and testing sets.

The node also allows for stratified sampling based on a string or boolean column of the input data. The column contents must have a domain defined on them.

A seed value can be supplied for randomized sampling to allow for consistent results. If no seed is given, a random seed will be used each time.

Requires the "KNIME Streaming Execution" extension.

Options

Stratify By
Column for stratified sampling. If "(None)" is selected, sampling will be random.
Training Set Size
Size of training set in percent (default: 60%).
Sampling Seed
Seed used for sampling. If left empty, a new random integer is used on each execution.

Input Ports

Icon
Input dataset

Output Ports

Icon
Training set. The size of this table is defined by the slider control in the configuration dialog.
Icon
Validation set. The size of this table is half of the remaining data after the training set has been split off.
Icon
Testing set. The size of this table is half of the remaining data after the training set has been split off.

Nodes

Extensions

Links