Spark Column Filter

This node allows columns to be filtered from the input Spark DataFrame/RDD while only the remaining columns are passed to the output Spark DataFrame/RDD. Within the dialog, columns can be moved between the Include and Exclude list.

Options

Manual Selection

Include: This list contains the names of those columns in the input Spark DataFrame/RDD to be included in the output Spark DataFrame/RDD.
Exclude: This list contains the names of those columns in the input Spark DataFrame/RDD to be excluded from the output Spark DataFrame/RDD.
Filter: Use one of these fields to filter either the Include or Exclude list for certain column names or name substrings.
Buttons: Use these buttons to move columns between the Include and Exclude list. Single-arrow buttons will move all selected columns. Double-arrow buttons will move all columns (filtering is taken into account).
Enforce Exclusion: Select this option to enforce the current exclusion list to stay the same even if the input Spark DataFrame/RDD specification changes. If some of the excluded columns are not available anymore, a warning is displayed. (New columns will automatically be added to the inclusion list.)
Enforce Inclusion: Select this option to enforce the current inclusion list to stay the same even if the input Spark DataFrame/RDD specification changes. If some of the included columns are not available anymore, a warning is displayed. (New columns will automatically be added to the exclusion list.)

Wildcard/Regex Selection

: Type a search pattern which matches columns to move into the Include or Exclude list. Which list is used can be specified. You can use either Wildcards ('?' matching any character, '*' matching a sequence of any characters) or Regex. You can specify whether your pattern should be case sensitive.

Type Selection

: Select the column types that you want to include. Column types that are currently not present are depicted in italic.

Input Ports

: Spark DataFrame/RDD from which columns are to be excluded.

Output Ports

: Spark DataFrame/RDD excluding selected columns.

Popular Predecessors

Hive to Spark20 %
CSV to Spark10 %
Parquet to Spark6 %
Spark SQL Query5 %
~~Table Row to Variable~~3 %
Show all 50 recommendations

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Extension for Apache Spark (legacy) from the below update site following our NodePit Product and Node Installation Guide:

v5.11

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.9.0.v202511131754

On NodePit since: 2026-03-10

Last update: 2026-03-13

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!