0 ×

09_​Wide_​and_​Deep_​Learning_​on_​Census_​Dataset

Workflow

Wide and Deep Learning on the Census dataset
This workflow shows one way of applying deep learning to tabular data. The main focus of the workflow lies on data preparation and semi-automatic network creation.
deep learning wide learning tensorflow keras machine learning
2 Model CreationDepending on the kind of input columns a different input layer has to be created.Here we use a network that is inspired by TensorFlow's Wide & Deep Learning Tutorial (https://www.tensorflow.org/tutorials/wide_and_deep) but you can easily replace it with an arbitrary deep learning network of your choice. 3 TrainingTrain for 3 epochs using the Adam optimizer withdefault parameters. Open the learner view to checkout if the learning converged. 5 Deployment 6 Evaluation 1 PreprocessingData preprocessing is always one of the most important steps in data mining but in case of neural networks it is even more important to ensure that the training datais properly prepared.The two sections below give an idea on what to keep in mind for continuous and categorical data. Continuous columnsContinuous columns are already in a format that a neural network canunderstand but it is good practice to normalize the data using e.g. z-scorenormalization. Categorical columnsCategorical data has to be transformed into a numerical representation that aneural network can understand.Here we show two possibilities for such a transformation.- The representation as one-hot vector- The representation as index that is used for an embeddingOpen the wrapped meta node (Ctrl + double click) to check out the two variantsNote that we extract a dictionary for the columns in both cases.We need those to ensure that unseen data is preprocessed exactly the sameway as the training data. 4 Deployment PreprocessingApplies the same preprocessing steps as during trainingpreprocessing. 0 Data SelectionRemove rows containing missing values (forsimplicity) and split the dataset into a trainingand a testing set.You can also load a different dataset andhandle missing values differently. Wide and Deep Learning on the Census DatasetThis workflow shows one way of applying deep learning to tabular data.The main focus of the workflow lies on data preparation and semi-automatic network creation.Semi-automatic because the network structure is created in a data-dependent way while the user has the possibility to specifycertain architectural parameters e.g. the number of hidden layers and the number of neurons per hidden layer.This workflow is heavily influenced by TensorFlow's Wide & Deep Learning tutorial (https://www.tensorflow.org/tutorials/wide_and_deep) and also uses the Census dataset.Please note that both the dataset, as well as the network architecture are just examples and can be switched out to fit yourspecific use-case.In order to run the example, please make sure you have the following KNIME extensions installed:* KNIME Deep Learning - Keras Integration (Labs)* KNIME Deep Learning - TensorFlow Integration (Labs)* KNIME JavaScript Views (Labs)You also need a local Python installation that includes Keras (we recommend version 2.1.6). Please refer to https://www.knime.com/deeplearning#keras for installation recommendations and further information. top continuousbottom categoricalremove rowscontaining missingvalues (other missingvalue handling strategiesare also possible)embeddingselect which columnsshould be encodedas one-hot vectors (top)and for which a embeddingshould be learned (bottom)indicatorconvert to doubletranspose toallow inputselection viacolumn selectionfor performancereasonsAccuracy ~85%Accuracy ~79%select modelcolumnsselecttargetcolumnsplit targetfrom restrejointargetz-scoretrain / test80 / 20only keepcolumns usedfor trainingtrain / validation90 / 10rejointargetAUC ~0.9AUC ~0.8concatenate outputof both modelsNode 188 File Reader Column Splitter Missing Value Convert toValue index DL Network Executor Column Splitter Convert to One-Hot Round Double Keras NetworkLearner Concatenate(Optional in) Create EmbeddingInputs One-Hot Inputs Create ContinuousInputs Transpose Create Deep Model Create ShallowModel Keras to TensorFlowNetwork Converter DL Network Executor Scorer (JavaScript) Scorer (JavaScript) Column Filter Column Selection Column Splitter Convert to One-Hot Joiner Normalizer Collect Columns Partitioning ReferenceColumn Filter Prepare One-HotColumns Prepare EmbeddingColumns Prepare ContinuousColumns Collect Columns Partitioning Joiner PostprocessPrediction ROC Curve(JavaScript) PostprocessPrediction ROC Curve(JavaScript) Keras ConcatenateLayer Keras Dense Layer 2 Model CreationDepending on the kind of input columns a different input layer has to be created.Here we use a network that is inspired by TensorFlow's Wide & Deep Learning Tutorial (https://www.tensorflow.org/tutorials/wide_and_deep) but you can easily replace it with an arbitrary deep learning network of your choice. 3 TrainingTrain for 3 epochs using the Adam optimizer withdefault parameters. Open the learner view to checkout if the learning converged. 5 Deployment 6 Evaluation 1 PreprocessingData preprocessing is always one of the most important steps in data mining but in case of neural networks it is even more important to ensure that the training datais properly prepared.The two sections below give an idea on what to keep in mind for continuous and categorical data. Continuous columnsContinuous columns are already in a format that a neural network canunderstand but it is good practice to normalize the data using e.g. z-scorenormalization. Categorical columnsCategorical data has to be transformed into a numerical representation that aneural network can understand.Here we show two possibilities for such a transformation.- The representation as one-hot vector- The representation as index that is used for an embeddingOpen the wrapped meta node (Ctrl + double click) to check out the two variantsNote that we extract a dictionary for the columns in both cases.We need those to ensure that unseen data is preprocessed exactly the sameway as the training data. 4 Deployment PreprocessingApplies the same preprocessing steps as during trainingpreprocessing. 0 Data SelectionRemove rows containing missing values (forsimplicity) and split the dataset into a trainingand a testing set.You can also load a different dataset andhandle missing values differently. Wide and Deep Learning on the Census DatasetThis workflow shows one way of applying deep learning to tabular data.The main focus of the workflow lies on data preparation and semi-automatic network creation.Semi-automatic because the network structure is created in a data-dependent way while the user has the possibility to specifycertain architectural parameters e.g. the number of hidden layers and the number of neurons per hidden layer.This workflow is heavily influenced by TensorFlow's Wide & Deep Learning tutorial (https://www.tensorflow.org/tutorials/wide_and_deep) and also uses the Census dataset.Please note that both the dataset, as well as the network architecture are just examples and can be switched out to fit yourspecific use-case.In order to run the example, please make sure you have the following KNIME extensions installed:* KNIME Deep Learning - Keras Integration (Labs)* KNIME Deep Learning - TensorFlow Integration (Labs)* KNIME JavaScript Views (Labs)You also need a local Python installation that includes Keras (we recommend version 2.1.6). Please refer to https://www.knime.com/deeplearning#keras for installation recommendations and further information. top continuousbottom categoricalremove rowscontaining missingvalues (other missingvalue handling strategiesare also possible)embeddingselect which columnsshould be encodedas one-hot vectors (top)and for which a embeddingshould be learned (bottom)indicatorconvert to doubletranspose toallow inputselection viacolumn selectionfor performancereasonsAccuracy ~85%Accuracy ~79%select modelcolumnsselecttargetcolumnsplit targetfrom restrejointargetz-scoretrain / test80 / 20only keepcolumns usedfor trainingtrain / validation90 / 10rejointargetAUC ~0.9AUC ~0.8concatenate outputof both modelsNode 188 File Reader Column Splitter Missing Value Convert toValue index DL Network Executor Column Splitter Convert to One-Hot Round Double Keras NetworkLearner Concatenate(Optional in) Create EmbeddingInputs One-Hot Inputs Create ContinuousInputs Transpose Create Deep Model Create ShallowModel Keras to TensorFlowNetwork Converter DL Network Executor Scorer (JavaScript) Scorer (JavaScript) Column Filter Column Selection Column Splitter Convert to One-Hot Joiner Normalizer Collect Columns Partitioning ReferenceColumn Filter Prepare One-HotColumns Prepare EmbeddingColumns Prepare ContinuousColumns Collect Columns Partitioning Joiner PostprocessPrediction ROC Curve(JavaScript) PostprocessPrediction ROC Curve(JavaScript) Keras ConcatenateLayer Keras Dense Layer

Download

Get this workflow from the following link: Download

Resources

Nodes

09_​Wide_​and_​Deep_​Learning_​on_​Census_​Dataset consists of the following 237 nodes(s):

Plugins

09_​Wide_​and_​Deep_​Learning_​on_​Census_​Dataset contains nodes provided by the following 11 plugin(s):