Icon

Adult_​Data_​Classification

Adult data income classification with ANN
Use Case DescriptionThe dataset contains some demographic information from 32k adults collected by the US Census. The goal of this exercise is to predict whether a person's income is above $50k. We do so bybuilding an ANN model for binary classification. Data PreparationThere are 13 features available, as well as the binary target Income, in this dataset. While some of them are numerical, the others are categorical. These categorical features need to beconverted to numbers. Since the education level is ordered, we use a table of education levels with the corresponding numerical encoding. Other categorical features are converted to numbersby the Category to number node. The dataset is partitioned into the training set 70% and the rest 30%. The second partition is further separated into the validation and testing sets (30% and70%, respectively). All the features are then normalized [0, 1] before the analysis by the ANN. These preprocessing steps have been implemented by two metanodes. Exercise: Adult data classification with ANN1). Build a network with 13-6-6-1 units-Layer 1 (input layer): Keras Input Layer node with Shape 13-Layer 2 (hidden layer): Keras Dense Layer node with Shape 6, ReLu Activation function-Layer 3 (hidden layer): Keras Dense Layer node with Shape 6, ReLu Activation function-Layer 4 (output layer): Keras Dense Layer node with Shape 1, Sigmoid Activation function2). Train the network model with the Keras Network Lerner node-The training and validation sets should be supplied-Input: Conversion is From Number (double), all 13 features are included as the input (i.e., everything except income)-Target Data: Conversion is From Number (double), and income is selected. Loss function is Binary cross entropy-Options: Batch size 100 for both training and validation. 20 Epochs. Adam Optimizer 3). Apply the trained network to the testing set with the Keras Network Executor node -Keep input columns in output table-Inputs: Conversion is From Number (double), include all available features-Outputs: click on add output, Conversion is To Number (double)4). Convert the network outcome to binary categories with the Rule Engine node-If the predicted probability > 0.5, then the predicted category is >50K. Otherwise the predicted category is <=50K. 5). Assess the accuracy with the Scorer node convert outputRead adult.csvReading dictionarytable for educationcategories Training networkBatch size: 100Epochs: 20Optimizer: adamPrediction fromthe testing setConverts nominalfeatures tonumericalPartition data into70% training30% restParitioning the rest to30% validation70% testingNormalize all featuresto [0,1]13 inputfeaturesHidden layer6 unitsReLUHidden layer6 unitsReLUOutput1 unitSigmoidCheckingmodel performanceRule Engine CSV Reader Excel Reader Keras NetworkLearner Keras NetworkExecutor Nominal featureconversion PartitioningNormalization Keras Input Layer Keras Dense Layer Keras Dense Layer Keras Dense Layer Scorer Use Case DescriptionThe dataset contains some demographic information from 32k adults collected by the US Census. The goal of this exercise is to predict whether a person's income is above $50k. We do so bybuilding an ANN model for binary classification. Data PreparationThere are 13 features available, as well as the binary target Income, in this dataset. While some of them are numerical, the others are categorical. These categorical features need to beconverted to numbers. Since the education level is ordered, we use a table of education levels with the corresponding numerical encoding. Other categorical features are converted to numbersby the Category to number node. The dataset is partitioned into the training set 70% and the rest 30%. The second partition is further separated into the validation and testing sets (30% and70%, respectively). All the features are then normalized [0, 1] before the analysis by the ANN. These preprocessing steps have been implemented by two metanodes. Exercise: Adult data classification with ANN1). Build a network with 13-6-6-1 units-Layer 1 (input layer): Keras Input Layer node with Shape 13-Layer 2 (hidden layer): Keras Dense Layer node with Shape 6, ReLu Activation function-Layer 3 (hidden layer): Keras Dense Layer node with Shape 6, ReLu Activation function-Layer 4 (output layer): Keras Dense Layer node with Shape 1, Sigmoid Activation function2). Train the network model with the Keras Network Lerner node-The training and validation sets should be supplied-Input: Conversion is From Number (double), all 13 features are included as the input (i.e., everything except income)-Target Data: Conversion is From Number (double), and income is selected. Loss function is Binary cross entropy-Options: Batch size 100 for both training and validation. 20 Epochs. Adam Optimizer 3). Apply the trained network to the testing set with the Keras Network Executor node -Keep input columns in output table-Inputs: Conversion is From Number (double), include all available features-Outputs: click on add output, Conversion is To Number (double)4). Convert the network outcome to binary categories with the Rule Engine node-If the predicted probability > 0.5, then the predicted category is >50K. Otherwise the predicted category is <=50K. 5). Assess the accuracy with the Scorer node convert outputRead adult.csvReading dictionarytable for educationcategories Training networkBatch size: 100Epochs: 20Optimizer: adamPrediction fromthe testing setConverts nominalfeatures tonumericalPartition data into70% training30% restParitioning the rest to30% validation70% testingNormalize all featuresto [0,1]13 inputfeaturesHidden layer6 unitsReLUHidden layer6 unitsReLUOutput1 unitSigmoidCheckingmodel performanceRule Engine CSV Reader Excel Reader Keras NetworkLearner Keras NetworkExecutor Nominal featureconversion PartitioningNormalization Keras Input Layer Keras Dense Layer Keras Dense Layer Keras Dense Layer Scorer

Nodes

Extensions

Links