0 ×

Spark Correlation Filter

KNIME Extension for Apache Spark core infrastructure version 4.2.0.v202007072005 by KNIME AG, Zurich, Switzerland

This node uses the model as generated by a Correlation node to determine which columns are redundant (i.e. correlated) and filters them out. The output will contain the reduced set of columns.

The filtering step works roughly as follows: For each column in the correlation model the count of correlated columns is determined given a threshold value for the correlation coefficient (specified in the dialog). The column with the most correlated columns is chosen to "survive" and all correlated columns are filtered out. This procedure is repeated until no more columns can be identified. The problem of finding a minimum set of columns to satisfy the constraints is difficult to solve analytically. This method applied here is known to be good approximation, however.

Options

Columns from Model
Displays the set of columns for which the model has information. These columns must also be present in the input data table. The (automatically) selected elements in the list will be present in the output table. This list can not be edited.
Correlation Threshold
Choose the correlation threshold here. The higher the value the fewer columns get filtered out. Hit Enter or click the "Calculate" to see a preview of the filtered columns. The counts of included vs. excluded columns are shown in the label.
Calculate
Click this button to update the statistics. It will determine the reduced set of columns using the procedure outlined above.

Input Ports

Icon
The model from the correlation node.
Icon
Numeric input data to filter. It must contain the set of columns that were used to create the correlation model. (Typically you connect the input data from the correlation node here.)

Output Ports

Icon
Filtered data from input.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Extension for Apache Spark from the following update site:

KNIME 4.2

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.