Icon

Solution 7 Practicing with Data Preprocessing Techniques

This workflow shows a solution to a hands-on exercise in the L4-ML Introduction to Machine Learning Algorithms self-paced course

Task 2: Normalize data using two alternative methods and compare the statistics ofthe normalized columns1. Min-max normalize column 0 into the range [0,1]2. Normalize column 0 using the decimal scaling method in a parallel workflow branch3. Calculate the statistics of the normalized columns Task 1: Perform outlier detection, missing value imputation, dimensionalityreduction, and encoding1. Visualize the numeric columns in a box plot2. Replace the numeric outliers with missing values. Apply k=1.5.3. Replace the missing values in the numeric columns with the column mean4. Remove rows with missing values in the class column5. Transform the numeric columns into principal components. Preserve 90% of theinformation. 6. Perform category and one-hot encoding on the “category” column in parallelworkflow branches Readpreprocessing-data.tablenumeric: meanclass: remove row90%informationReadnormalization-example.tablemin-maxdecimal scaling Table Reader Numeric Outliers Box Plot Missing Value Category To Number One to Many PCA Table Reader Normalizer Normalizer Statistics Statistics Task 2: Normalize data using two alternative methods and compare the statistics ofthe normalized columns1. Min-max normalize column 0 into the range [0,1]2. Normalize column 0 using the decimal scaling method in a parallel workflow branch3. Calculate the statistics of the normalized columns Task 1: Perform outlier detection, missing value imputation, dimensionalityreduction, and encoding1. Visualize the numeric columns in a box plot2. Replace the numeric outliers with missing values. Apply k=1.5.3. Replace the missing values in the numeric columns with the column mean4. Remove rows with missing values in the class column5. Transform the numeric columns into principal components. Preserve 90% of theinformation. 6. Perform category and one-hot encoding on the “category” column in parallelworkflow branches Readpreprocessing-data.tablenumeric: meanclass: remove row90%informationReadnormalization-example.tablemin-maxdecimal scalingTable Reader Numeric Outliers Box Plot Missing Value Category To Number One to Many PCA Table Reader Normalizer Normalizer Statistics Statistics

Nodes

Extensions

Links