Database to Spark (legacy)

This node is part of the legacy database framework and will be deprecated at the end of 2022. For more information on how to migrate to the new database framework see the migration section of the database documentation.

Reads a database query/table into a Spark RDD/DataFrame. See Spark documentation for more information.

Notice: This feature requires at least Apache Spark 1.5.


Upload local driver (used in this KNIME instance) or depend on cluster side provided driver.
Fetch size
Optional: The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows).
Partition column, lower bound, upper bound, num partitions
These options must all be specified if any of them is specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question. Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned.
Query DB for upper and lower count
Fetch bounds via min/max query or use manual entered bounds.

Input Ports

Input query
Required Spark context.

Output Ports

Spark RDD/DataFrame


This node has no views


  • No workflows found



You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.