TVT (Train-Validate-Test) Split

Splits a single input dataset into training, validation, and testing subsets. The proportion of the training set is controlled by the user; the remaining data is split equally into the validation and testing sets.

The node also allows for stratified sampling based on a string or boolean column of the input data. The column contents must have a domain defined on them.

A seed value can be supplied for randomized sampling to allow for consistent results. If no seed is given, a random seed will be used each time.

Requires the "KNIME Streaming Execution" extension.

Options

Stratify By: Column for stratified sampling. If "(None)" is selected, sampling will be random.
Training Set Size: Size of training set in percent (default: 60%).
Sampling Seed: Seed used for sampling. If left empty, a new random integer is used on each execution.

Input Ports

: Input dataset

Output Ports

: Training set. The size of this table is defined by the slider control in the configuration dialog.
: Validation set. The size of this table is half of the remaining data after the training set has been split off.
: Testing set. The size of this table is half of the remaining data after the training set has been split off.

TVT (Train-Validate-Test) Split

Options

Input Ports

Output Ports

Nodes

Extensions

Links

Download