Icon

04_​Dimensionality_​Reduction_​exercise

Dimensionality Reduction - exercise

Introduction to Machine Learning Algorithms course - Session 4
Exercise 4
Apply the following dimensionality reduction techniques to the data:
- Filter out columns with a low variance
- Filter out one of two columns with a high linear correlation
- Replace numeric columns with principal components
- Filter out columns which are not important in predicting the target column


Dimensionality Reduction by Low Variance Dimensionality Reduction by Linear Correlation Dimensionality Reduction by PCA Exercise: Dimensionality Reduction1) Use the Normalizer node to apply min-max normalization to the training set2) Filter out columns in the training set that have variance lower than 0.01 (Low Variance Filter node)3) Filter out columns in the training set that have Linear Correlation higher or equal to 0.8 with another column (Linear Correlation andCorrelation Filter nodes)4) Apply automatic dimensionality reduction by replacing the numeric columns with principal components. Retain 90 % of the information in theoriginal numeric columns. (PCA Compute and PCA Apply nodes)- note: PCA required z-score normalized input values- use Reference Column Filter node to apply the filtering of the Low Variance Filter and Correlation Filter nodes to the original input values- normalize using z-score (Normalizer node)5) Apply these dimensionality reduction techniques to the test set (Reference Column Filter, Normalizer (Apply) and PCA Apply nodes) Read AmesHousing.csv Missing ValueHandling CSV Reader Preprocessing Outlier Detection Dimensionality Reduction by Low Variance Dimensionality Reduction by Linear Correlation Dimensionality Reduction by PCA Exercise: Dimensionality Reduction1) Use the Normalizer node to apply min-max normalization to the training set2) Filter out columns in the training set that have variance lower than 0.01 (Low Variance Filter node)3) Filter out columns in the training set that have Linear Correlation higher or equal to 0.8 with another column (Linear Correlation andCorrelation Filter nodes)4) Apply automatic dimensionality reduction by replacing the numeric columns with principal components. Retain 90 % of the information in theoriginal numeric columns. (PCA Compute and PCA Apply nodes)- note: PCA required z-score normalized input values- use Reference Column Filter node to apply the filtering of the Low Variance Filter and Correlation Filter nodes to the original input values- normalize using z-score (Normalizer node)5) Apply these dimensionality reduction techniques to the test set (Reference Column Filter, Normalizer (Apply) and PCA Apply nodes) Read AmesHousing.csv Missing ValueHandling CSV Reader Preprocessing Outlier Detection

Nodes

Extensions

Links