DB Row Sampler

This node extracts a sample from the input data. The dialog enables you to specify the sample size and the sampling strategy.

Options

Output size type
Defines how the size of the output is specified: as a percentage of total rows (relative) or as an absolute number of rows.
Relative size
Specifies the percentage of rows in the database table to extract. Must be between 0 and 100 (inclusive).
Number of rows
Specifies the absolute number of rows to include in the output. If the input table contains fewer rows than specified, all rows are placed in the output.
Sampling strategy
Determines how rows are selected for the output. Strategies include random, stratified, and first rows (sequential).
  • Random: Randomly selects rows from the input table if the connected database supports random sampling. Note that this method might be very slow for large database tables.
  • Stratified: Preserves the distribution of values in the selected group column.
  • First rows: Allows you to select the top-most rows of the input table. Note that the order of the rows depends on the connected database.
Group column
Specifies the column whose value distribution should be preserved in stratified sampling. Ensures both selected and non-selected rows reflect the same distribution of values.
Fixed random seed
Optional seed value for random or stratified sampling. Using a seed ensures the same rows are selected each time the node is executed. Without a seed, a different random selection will occur each time.

Input Ports

Icon
DB Data to apply sampling.

Output Ports

Icon
DB Data with sampled rows.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.