Icon

02 Preprocessing for FFNN Training - Solution

02 Preprocessing for FFNN Training - Exercise (Solution)

This workflow shows a solution to a hands-on exercise in the L4-DL Introduction to Deep Learning self-paced course


Task 1. Missing values handling1. Remove columns with more than 90%missing values2. Replace missing values with string"missing" Task 2. Encodings and Partitioning1. Replace the categorical values of the education column with the numerical encoding read by the ExcelReader node2. Convert all the other categorical features to integer encodings using the Category to Number node. Donot convert the income column, since it will be used as target column.3. Partition the data using stratified sampling on the income column. Use 70% of the entries for trainingand the rest for testing4. Only for the training data, convert the income column to one-hot-encoding using the One to Many node Task 3. Normalization1. Normalize the training data into the range [0, 1]using min-max normalization2. Apply the normalization on the testing data Task 4. Build a FFNN Network1. Create an input layer with the appropriate number of units according to the input data2. Create two dense layers with 6 units and ReLU activation function3. Create the output layer with a Keras Dense Layer. Since in Task 2 the target column is converted toone-hot-encoding, the output has two units and Softmax activation function Task 5. Train and apply the network1. Train the network for 20 epochs using the KerasNetwork Learner node. Use the categorical crossentropy as the loss function. Make sure to select thecorrect input and target columns.2. Execute the trained network on the testing data.Select the last dense layer as output layer Task 6. Evaluate the trained network1. Rename the two output columns produced by thenetwork. The first output refers to the "<=50K" classand the second to ">50K"2. Condense the two output columns with a Many toOne node. Retain the column with the highest value3. Add a Scorer node to evaluate the modelperformance Read adult.csvTarget column: incomeReading dictionarytable for educationcategorieschanging education classes to numbersReplace missing values withstring "missing" Integer encodinginput features70% for training 30 % testingNode 170Hidden layer6 units ReLUHidden layer6 units ReLUOutput2 unit Softmax11 inputfeaturesTraining networkBatch size: 100Epochs: 20Optimizer: adamPrediction fromthe testing setNode 186Normalizetraining data[0, 1]Apply normaliz.testing dataNode 189Node 190Node 191CSV Reader Excel Reader Cell Replacer Missing Value Category To Number Partitioning Missing ValueColumn Filter Keras Dense Layer Keras Dense Layer Keras Dense Layer Keras Input Layer Keras NetworkLearner Keras NetworkExecutor One to Many Normalizer Normalizer (Apply) Many to One Column Rename Scorer Task 1. Missing values handling1. Remove columns with more than 90%missing values2. Replace missing values with string"missing" Task 2. Encodings and Partitioning1. Replace the categorical values of the education column with the numerical encoding read by the ExcelReader node2. Convert all the other categorical features to integer encodings using the Category to Number node. Donot convert the income column, since it will be used as target column.3. Partition the data using stratified sampling on the income column. Use 70% of the entries for trainingand the rest for testing4. Only for the training data, convert the income column to one-hot-encoding using the One to Many node Task 3. Normalization1. Normalize the training data into the range [0, 1]using min-max normalization2. Apply the normalization on the testing data Task 4. Build a FFNN Network1. Create an input layer with the appropriate number of units according to the input data2. Create two dense layers with 6 units and ReLU activation function3. Create the output layer with a Keras Dense Layer. Since in Task 2 the target column is converted toone-hot-encoding, the output has two units and Softmax activation function Task 5. Train and apply the network1. Train the network for 20 epochs using the KerasNetwork Learner node. Use the categorical crossentropy as the loss function. Make sure to select thecorrect input and target columns.2. Execute the trained network on the testing data.Select the last dense layer as output layer Task 6. Evaluate the trained network1. Rename the two output columns produced by thenetwork. The first output refers to the "<=50K" classand the second to ">50K"2. Condense the two output columns with a Many toOne node. Retain the column with the highest value3. Add a Scorer node to evaluate the modelperformance Read adult.csvTarget column: incomeReading dictionarytable for educationcategorieschanging education classes to numbersReplace missing values withstring "missing" Integer encodinginput features70% for training 30 % testingNode 170Hidden layer6 units ReLUHidden layer6 units ReLUOutput2 unit Softmax11 inputfeaturesTraining networkBatch size: 100Epochs: 20Optimizer: adamPrediction fromthe testing setNode 186Normalizetraining data[0, 1]Apply normaliz.testing dataNode 189Node 190Node 191CSV Reader Excel Reader Cell Replacer Missing Value Category To Number Partitioning Missing ValueColumn Filter Keras Dense Layer Keras Dense Layer Keras Dense Layer Keras Input Layer Keras NetworkLearner Keras NetworkExecutor One to Many Normalizer Normalizer (Apply) Many to One Column Rename Scorer

Nodes

Extensions

Links