0 ×

Spark Partitioning

KNIME Extension for Apache Spark core infrastructure version 4.1.0.v201911281435 by KNIME AG, Zurich, Switzerland

The input data is split into two partitions (i.e. row-wise), e.g. train and test data. The two partitions are available at the two output ports.

Options

Absolute
Specify the absolute number of rows in the sample. If there are less rows than specified here, all rows are used.
Relative
The percentage of the number of rows in the sample. Must be between 0 and 100, inclusively.
Take from top
This mode selects the top most rows of the input data.
Draw randomly
Random sampling of all rows, you may optionally specify a fixed seed and adapt the sample with replacement setting (see below).
Stratified sampling
Check this button if you want stratified sampling, i.e. the distribution of values in the selected column is (approximately) retained in the output table. You may optionally specify a fixed seed and adapt the exact sampling and sample with replacement setting (see below).
Exact sampling
Exact sampling requires significant more resources than the per-stratum simple random sampling used in by default, but will provide the exact sampling size with 99.99% confidence.
Use random seed
If either random or stratified sampling is selected, you may enter a fixed seed here in order to get reproducible results upon re-execution. If you do not specify a seed, a new random seed is taken for each execution.
Sample with replacement
If selected a row from the input data can be chosen more than once.

Input Ports

Spark DataFrame/RDD to take the sample from.

Output Ports

First partition (as defined in dialog).
Second partition (remaining rows).

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Extension for Apache Spark from the following update site:

KNIME 4.1
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.