Icon

02 Preprocessing for FFNN Training

02 Preprocessing for FFNN Training - Exercise

This workflow shows a hands-on exercise in the L4-DL Introduction to Deep Learning self-paced course

Task 1. Missing values handling1. Remove columns with more than 90%missing values2. Replace missing values with string"missing" Task 2. Encodings and Partitioning1. Replace the categorical values of the education column with the numerical encoding read by the ExcelReader node2. Convert all the other categorical features to integer encodings using the Category to Number node. Donot convert the income column, since it will be used as target column.3. Partition the data using stratified sampling on the income column. Use 70% of the entries for trainingand the rest for testing4. Only for the training data, convert the income column to one-hot-encoding using the One to Many node Task 3. Normalization1. Normalize the training data into the range [0, 1]using min-max normalization2. Apply the normalization on the testing data Task 4. Build a FFNN Network1. Create an input layer with the appropriate number of units according to the input data2. Create two dense layers with 6 units and ReLU activation function3. Create the output layer with a Keras Dense Layer. Since in Task 2 the target column is converted toone-hot-encoding, the output has two units and Softmax activation function Task 5. Train and apply the network1. Train the network for 20 epochs using the KerasNetwork Learner node. Use the categorical crossentropy as the loss function. Make sure to select thecorrect input and target columns.2. Execute the trained network on the testing data.Select the last dense layer as output layer Task 6. Evaluate the trained network1. Rename the two output columns produced by thenetwork. The first output refers to the "<=50K" classand the second to ">50K"2. Condense the two output columns with a Many toOne node. Retain the column with the highest value3. Add a Scorer node to evaluate the modelperformance Read adult.csvTarget column: incomeReading dictionarytable for educationcategoriesCSV Reader Excel Reader Task 1. Missing values handling1. Remove columns with more than 90%missing values2. Replace missing values with string"missing" Task 2. Encodings and Partitioning1. Replace the categorical values of the education column with the numerical encoding read by the ExcelReader node2. Convert all the other categorical features to integer encodings using the Category to Number node. Donot convert the income column, since it will be used as target column.3. Partition the data using stratified sampling on the income column. Use 70% of the entries for trainingand the rest for testing4. Only for the training data, convert the income column to one-hot-encoding using the One to Many node Task 3. Normalization1. Normalize the training data into the range [0, 1]using min-max normalization2. Apply the normalization on the testing data Task 4. Build a FFNN Network1. Create an input layer with the appropriate number of units according to the input data2. Create two dense layers with 6 units and ReLU activation function3. Create the output layer with a Keras Dense Layer. Since in Task 2 the target column is converted toone-hot-encoding, the output has two units and Softmax activation function Task 5. Train and apply the network1. Train the network for 20 epochs using the KerasNetwork Learner node. Use the categorical crossentropy as the loss function. Make sure to select thecorrect input and target columns.2. Execute the trained network on the testing data.Select the last dense layer as output layer Task 6. Evaluate the trained network1. Rename the two output columns produced by thenetwork. The first output refers to the "<=50K" classand the second to ">50K"2. Condense the two output columns with a Many toOne node. Retain the column with the highest value3. Add a Scorer node to evaluate the modelperformance Read adult.csvTarget column: incomeReading dictionarytable for educationcategoriesCSV Reader Excel Reader

Nodes

Extensions

Links