Icon

02_​Techniques_​for_​Dimensionality_​Reduction

Techniques for Dimensionality Reduction

This workflow performs classification on data sets that were reduced using the following dimensionality reduction techniques:
- Linear Discriminant Analysis (LDA)
- Auto-encoder
- t-SNE
- Missing values ratio
- Low variance filter
- High correlation filter
- Ensemble tree
- PCA
- Backward feature elimination
- Forward feature selection
---
The performances of the classification models are compared to the performance that is achieved when all columns are retained in terms of overall accuracy and AuC statistics. These evaluation metrics are produced by the best performing classification model out of this bag of models:
- Multilayer Feedforward Neural Networks
- Naive Bayes
- Decision Tree

This part takes very long time! Execute with discretion! Dimensionality Reduction This workflow shows methods for dimensionality reduction and calculates a baselinewhere no dimensionality reduction technique is applied:1. Baseline evaluation2. Linear Discriminant Analysis (LDA)3. Auto-encoder4. t-SNE5. High ratio of missing values6. Low variance7. High correlation with other data columns8. Tree ensemble based 9. Principal Component Analysis (PCA)10. Backward feature elimination11. Forward feature selectionROC Curve shows the final performances using different dimensionality reductiontechniques. The positive class probabilities are accessible via the top output ports ofthe components.Accuracies obtained using different techniques are accessible via the bottom outputsof the Components Reading DataRead the KDD train small data set as text files eliminate colsw/ > 30% missingSelect prediction task- churn- appetency- upsellingread smalldata set:233 columns50K rowsSeparate target2500 stratified onTarget for betterperformance Auto-encoderbased Reduction Column Selectionby Missing Values Target Selection Reading fullsmall data set Column Splitter Joiner Baseline Evaluation Reduction basedon Missing Values Tree Ensemblebased Reduction Backward FeatureElimination Reduction basedon Low Variance Reduction basedon High Corr. ROC Curve Column Appender Forward FeatureSelection Reductionbased on PCA Reductionbased on LDA Reduction basedon t-SNE ROC Curve Row Sampling Positive classprobabilities Bar Chart Accuracies This part takes very long time! Execute with discretion! Dimensionality Reduction This workflow shows methods for dimensionality reduction and calculates a baselinewhere no dimensionality reduction technique is applied:1. Baseline evaluation2. Linear Discriminant Analysis (LDA)3. Auto-encoder4. t-SNE5. High ratio of missing values6. Low variance7. High correlation with other data columns8. Tree ensemble based 9. Principal Component Analysis (PCA)10. Backward feature elimination11. Forward feature selectionROC Curve shows the final performances using different dimensionality reductiontechniques. The positive class probabilities are accessible via the top output ports ofthe components.Accuracies obtained using different techniques are accessible via the bottom outputsof the Components Reading DataRead the KDD train small data set as text files eliminate colsw/ > 30% missingSelect prediction task- churn- appetency- upsellingread smalldata set:233 columns50K rowsSeparate target2500 stratified onTarget for betterperformance Auto-encoderbased Reduction Column Selectionby Missing Values Target Selection Reading fullsmall data set Column Splitter Joiner Baseline Evaluation Reduction basedon Missing Values Tree Ensemblebased Reduction Backward FeatureElimination Reduction basedon Low Variance Reduction basedon High Corr. ROC Curve Column Appender Forward FeatureSelection Reductionbased on PCA Reductionbased on LDA Reduction basedon t-SNE ROC Curve Row Sampling Positive classprobabilities Bar Chart Accuracies

Nodes

Extensions

Links