Icon

Building Sentiment Predictor - Deep Learning

Building a Sentiment Analysis Predictive Model - Deep Learning using an RNN
Building a Sentiment Analysis Predictive Model - Deep Learning using an Recurrent Neural Network (RNN)This workflow uses a Kaggle Dataset, including 14K customer tweets towards six US airlines: https://www.kaggle.com/crowdflower/twitter-airline-sentiment. Contributors annotated the valence of the tweet into positive, negative andneutral. Once users are satisfied with the model evaluation, they should export 1) Dictionary, 2) Category to Number Model, 3) Trained Network for deployment to classify non-annotated data. 2. Read annotatedtwitter dataset. 1. Define the Network Architecture The Keras Layer nodes define an LSTM based recurrent neuralnetwork. The network structure can be extended by adding more Keras Layer nodes. 3. Manipulate and Encode DataThe metanode performs an index encoding to encode each word with an index.This blog post describes different encoding options https://www.knime.com/blog/text-encoding-a-reviewIn general recurrent neural networks can handle sequences with different lengths.During training though, all sequences must have the same length. Therefore, themetanode adds zeros to the end of the sequences, so that all sequences have thesame length. This approach is known as zero padding. 4. Train and Apply Network The Keras Network Learner node trains thedefined network. In the configuration window you can define the inputcolumn(s), target column(s), the loss function, and the training parameters,e.g. number of epochs, batch size, and optimizer. The Keras NetworkExecutor node applies the trained network to the input data. In theconfiguration window you can select the input column(s) and define theoutput by clicking on the "add output" button. In this worklflow the softmaxoutput layer is defined as output. This means that the output are theprobabilities for the three classes. The Conda Environment Propagation node ensures the existence of aConda environment with all packages. Another option is to setup yourPython integration to use a Conda environment with all packages asdescribed here: https://docs.knime.com/2019-06/deep_learning_installation_guide/index.html#dl_python_setup 5. Evaluate and Save Trained Network Themetanode Extract Predictions uses theprobabilities produced by the Keras NetworkExecutor node and extracts the class with thehighest probability. Kaggle DatasetN=14640Tweets fromconsumers toairlinesLoss function:Categorical CrossEntropyEpochs: 5080% training20% testingEncode eachclass with an index Output: Softmax layer=> Probability for the different classes74% AccuracyInput: # word in dictionaryOutput: 128 units Softmax with 3 unitsNote: An appropriate output layer for amulticlass classification task is a softmax layer with as many unitsas classes.Units for cellstate: 256Shape: ?Note: Using ? as inputshape allows to handle different sequence lengthsClass withhighest probabilitySave networkSave modelSave dictionarySet up a conda environment: dl_sentiment_kerasCSV Reader Create CollectionColumn Keras NetworkLearner Partitioning Category To Number Keras NetworkExecutor Scorer Keras EmbeddingLayer Keras Dense Layer Keras LSTM Layer Keras Input Layer Index encodingand zero padding Extract Prediction Keras NetworkWriter PMML Writer Table Writer Conda EnvironmentPropagation Building a Sentiment Analysis Predictive Model - Deep Learning using an Recurrent Neural Network (RNN)This workflow uses a Kaggle Dataset, including 14K customer tweets towards six US airlines: https://www.kaggle.com/crowdflower/twitter-airline-sentiment. Contributors annotated the valence of the tweet into positive, negative andneutral. Once users are satisfied with the model evaluation, they should export 1) Dictionary, 2) Category to Number Model, 3) Trained Network for deployment to classify non-annotated data. 2. Read annotatedtwitter dataset. 1. Define the Network Architecture The Keras Layer nodes define an LSTM based recurrent neuralnetwork. The network structure can be extended by adding more Keras Layer nodes. 3. Manipulate and Encode DataThe metanode performs an index encoding to encode each word with an index.This blog post describes different encoding options https://www.knime.com/blog/text-encoding-a-reviewIn general recurrent neural networks can handle sequences with different lengths.During training though, all sequences must have the same length. Therefore, themetanode adds zeros to the end of the sequences, so that all sequences have thesame length. This approach is known as zero padding. 4. Train and Apply Network The Keras Network Learner node trains thedefined network. In the configuration window you can define the inputcolumn(s), target column(s), the loss function, and the training parameters,e.g. number of epochs, batch size, and optimizer. The Keras NetworkExecutor node applies the trained network to the input data. In theconfiguration window you can select the input column(s) and define theoutput by clicking on the "add output" button. In this worklflow the softmaxoutput layer is defined as output. This means that the output are theprobabilities for the three classes. The Conda Environment Propagation node ensures the existence of aConda environment with all packages. Another option is to setup yourPython integration to use a Conda environment with all packages asdescribed here: https://docs.knime.com/2019-06/deep_learning_installation_guide/index.html#dl_python_setup 5. Evaluate and Save Trained Network Themetanode Extract Predictions uses theprobabilities produced by the Keras NetworkExecutor node and extracts the class with thehighest probability. Kaggle DatasetN=14640Tweets fromconsumers toairlinesLoss function:Categorical CrossEntropyEpochs: 5080% training20% testingEncode eachclass with an index Output: Softmax layer=> Probability for the different classes74% AccuracyInput: # word in dictionaryOutput: 128 units Softmax with 3 unitsNote: An appropriate output layer for amulticlass classification task is a softmax layer with as many unitsas classes.Units for cellstate: 256Shape: ?Note: Using ? as inputshape allows to handle different sequence lengthsClass withhighest probabilitySave networkSave modelSave dictionarySet up a conda environment: dl_sentiment_kerasCSV Reader Create CollectionColumn Keras NetworkLearner Partitioning Category To Number Keras NetworkExecutor Scorer Keras EmbeddingLayer Keras Dense Layer Keras LSTM Layer Keras Input Layer Index encodingand zero padding Extract Prediction Keras NetworkWriter PMML Writer Table Writer Conda EnvironmentPropagation

Nodes

Extensions

Links