Icon

04_​Dimensionality_​Reduction_​solution

Dimensionality Reduction - solution
Dimensionality Reduction by Low Variance Dimensionality Reduction by Linear Correlation Dimensionality Reduction by PCA Exercise: Dimensionality Reduction1) Use the Normalizer node to apply min-max normalization to the training set2) Filter out columns in the training set that have variance lower than 0.01 (Low Variance Filter node)3) Filter out columns in the training set that have Linear Correlation higher or equal to 0.8 with another column (Linear Correlation andCorrelation Filter nodes)4) Apply automatic dimensionality reduction by replacing the numeric columns with principal components. Retain 90 % of the information in theoriginal numeric columns. (PCA Compute and PCA Apply nodes)- note: PCA required z-score normalized input values- use Reference Column Filter node to apply the filtering of the Low Variance Filter and Correlation Filter nodes to the original input values- normalize using z-score (Normalizer node)5) Apply these dimensionality reduction techniques to the test set (Reference Column Filter, Normalizer (Apply) and PCA Apply nodes) exclude columnswith variance<0.01Max 0.8ComputePCAsFraction to preserveapply to the test setFraction to preserveCalculate linearcorrelationRead AmesHousing.csvz-scoreApply to test setmin-max normalizationcontinue withunnormalized values Low Variance Filter Correlation Filter PCA Compute PCA Apply ReferenceColumn Filter PCA Apply Missing ValueHandling Linear Correlation CSV Reader Normalizer Normalizer (Apply) Normalizer ReferenceColumn Filter Preprocessing Outlier Detection Dimensionality Reduction by Low Variance Dimensionality Reduction by Linear Correlation Dimensionality Reduction by PCA Exercise: Dimensionality Reduction1) Use the Normalizer node to apply min-max normalization to the training set2) Filter out columns in the training set that have variance lower than 0.01 (Low Variance Filter node)3) Filter out columns in the training set that have Linear Correlation higher or equal to 0.8 with another column (Linear Correlation andCorrelation Filter nodes)4) Apply automatic dimensionality reduction by replacing the numeric columns with principal components. Retain 90 % of the information in theoriginal numeric columns. (PCA Compute and PCA Apply nodes)- note: PCA required z-score normalized input values- use Reference Column Filter node to apply the filtering of the Low Variance Filter and Correlation Filter nodes to the original input values- normalize using z-score (Normalizer node)5) Apply these dimensionality reduction techniques to the test set (Reference Column Filter, Normalizer (Apply) and PCA Apply nodes) exclude columnswith variance<0.01Max 0.8ComputePCAsFraction to preserveapply to the test setFraction to preserveCalculate linearcorrelationRead AmesHousing.csvz-scoreApply to test setmin-max normalizationcontinue withunnormalized values Low Variance Filter Correlation Filter PCA Compute PCA Apply ReferenceColumn Filter PCA Apply Missing ValueHandling Linear Correlation CSV Reader Normalizer Normalizer (Apply) Normalizer ReferenceColumn Filter Preprocessing Outlier Detection

Nodes

Extensions

Links