This node is currently not available in KNIME v5.12 — instead we’re showing this page for KNIME v5.11. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

Spark Partitioning

The input data is split into two partitions (i.e. row-wise), e.g. train and test data. The two partitions are available at the two output ports.

Options

Absolute: Specify the absolute number of rows in the sample. If there are less rows than specified here, all rows are used.
Relative: The percentage of the number of rows in the sample. Must be between 0 and 100, inclusively.
Take from top: This mode selects the top most rows of the input data.
Draw randomly: Random sampling of all rows, you may optionally specify a fixed seed and adapt the sample with replacement setting (see below).
Stratified sampling: Check this button if you want stratified sampling, i.e. the distribution of values in the selected column is (approximately) retained in the output table. You may optionally specify a fixed seed and adapt the exact sampling and sample with replacement setting (see below).
Exact sampling: Exact sampling requires significant more resources than the per-stratum simple random sampling used in by default, but will provide the exact sampling size with 99.99% confidence.
Use random seed: If either random or stratified sampling is selected, you may enter a fixed seed here in order to get reproducible results upon re-execution. If you do not specify a seed, a new random seed is taken for each execution.
Sample with replacement: If selected a row from the input data can be chosen more than once.

Input Ports

: Spark DataFrame/RDD to take the sample from.

Output Ports

: First partition (as defined in dialog).
: Second partition (remaining rows).

Popular Predecessors

Parquet to Spark6 %
Hive to Spark6 %
Spark Category To Number6 %
Create Local Big Data Environment5 %
PySpark Script (2 to 1)4 %
Show all 42 recommendations

Popular Successors

Spark Predictor27 %
Spark Decision Tree Learner5 %
Spark Transformations Applier4 %
Spark Decision Tree Learner (MLlib)3 %
Spark Gradient-Boosted Trees Learner3 %
Show all 59 recommendations

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Extension for Apache Spark from the below update site following our NodePit Product and Node Installation Guide:

v5.11

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.9.0.v202511131754

On NodePit since: 2026-03-10

Last update: 2026-06-15

KNIME versions: From v3.6 to v5.11

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!