0 ×

Spark PCA

KNIME Extension for Apache Spark core infrastructure version 4.3.1.v202101261633 by KNIME AG, Zurich, Switzerland

This node performs a principal component analysis (PCA) on the given data using the Apache Spark implementation. The input data is projected from its original feature space into a space of (possibly) lower dimension with a minimum of information loss.


Fail if missing values are encountered
If checked, execution fails, when the selected columns contain missing values. By default, rows containing missing values are ignored and not considered in the computation of the principal components.
Target dimensions
Select the number of dimensions the input data is projected to. You can select either one of:
  • Dimensions to reduce to: Directly specify the number of target dimensions. The specified number must be lower or equal than the number of input columns.
  • Minimum information fraction to preserve (%): Specify the fraction in percentage of information to preserve from the input columns. This option requires Apache Spark 2.0 or higher.
Replace original data columns
If checked, the projected DataFrame/RDD will not contain columns that were included in the principal component analysis. Only the projected columns and the input columns that were not included in the principal component analysis remain.
Select columns that are included in the analysis of principal components, i.e the original features.

Input Ports

Input Spark DataFrame/RDD

Output Ports

The input DataFrame/RDD projected onto the principal components. Input columns that were not included in the principal component analysis are retained.
A DataFrame/RDD with the principal components matrix.

Best Friends (Incoming)

Best Friends (Outgoing)



To use this node in KNIME, install KNIME Extension for Apache Spark from the following update site:


A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.