Dimensionality Reduction (LDA)

This component reduces the number of columns in the input data by linear discriminant analysis. Linear discriminant analysis is based on separating two or more classes in the data. Therefore, one string column has to be selected as the target column. Numeric columns are projected into their linear combinations, linear discriminants, that best separate the different target classes.

This component can be used for dimensionality reduction, for example, before training a machine learning model. Linear discriminant analysis also works as a classifier: the linear discriminants separate the normally distributed data into two or more target classes.

Linear discriminant analysis can only create as many linear discriminants as there are target classes minus one, or if smaller, the number of numeric columns in the input data. Linear discriminant analysis may fail due to high dimensional input data and few target classes. In such case, it is recommended to first reduce dimensions by PCA, and then apply linear discriminant analysis to the principal components. This method is applied inside this component.

Notice that the input data of the component have to be normalized, and missing value handling is recommended.

If you want to apply the dimensionality reduction model to new data, for example, a test set, the LDA model is available in the table in the second output port of the node. If the LDA model cannot be applied directly, the table also contains a PCA model, normalizer model, and the number of dimensions for PCA. You are supposed to apply the PCA model to the new data using the number of dimensions given in the “PCA-dimensions” column, and then use the normalizer model to normalize the reduced dimensions, i.e. principal components. Finally, you can then apply LDA to the reduced, normalized dimensions.

Required extensions:
-KNIME Ensemble Learning Wrappers
(https://hub.knime.com/knime/extensions/org.knime.features.ensembles/latest)
-KNIME Data Generation
(https://hub.knime.com/knime/extensions/org.knime.features.datageneration/latest)
-KNIME Math Expression (JEP)
(https://hub.knime.com/knime/extensions/org.knime.features.ext.jep/latest)
-KNIME Quick Forms
(https://hub.knime.com/knime/extensions/org.knime.features.js.quickforms/latest)
-KNIME Statistics Nodes (Labs)
(https://hub.knime.com/knime/extensions/org.knime.features.stats2/latest)

Options

Select target column
Data are separated by linear discriminants based on their target class values, i.e. their values in this column
Select number of linear discriminants
Number of reduced dimensions as linear combinations of numeric columns

Input Ports

Icon
Data that contain at least one string column and one numeric column One string column is selected as the target column Numeric columns in the data will be reduced to linear discriminants Numeric columns have to be normalized

Output Ports

Icon
Linear discriminants together with the original string columns in the input data
Icon
LDA model Additional output columns, if LDA cannot be applied directly: - PCA model to apply to the data before LDA - Normalizer model to apply to the principal components before applying LDA - Number of dimensions for PCA

Nodes

Extensions

Links