Icon

Lexicon Text Mining on Titles with VAD

Lexicon Text Mining on Movie Titles with VAD

This workflow can be run by itself, but it is in truth complemetary to a main one, which needs to be executed at least once before this one, at least until the node before the lexicon text mining module and the related table writer, in order to save in the working directory the necessary .table file. The main workflow is present in the same directory of this flow on Knime Hub.

This workflow performs lexicon-based text mining on the movies’ titles using the three features present in the VAD dictionary: Valence, Arousal and Dominance. After adding the title length as a new variable, words are tagged with POS tagger and using the VAD dictionary. Then, in the pre-processing part punctuation is erased, numbers are filtered, everything is converted to lower case and all the titles without any tag are filtered out (for example, titles with only proper names).

After creating the Bag of words (and added a Term column to keep each term without the tag attached), a joiner is used to add to each term tagged its value of Valence, Arousal and Dominance through the VAD dictionary. Eventually, through a GroupBy node, we grouped the term by the movies’ titles and keep the main data about the film and the value of Valence, Arousal and Dominance of the title.

ENRICHMENTEnrichment is useful in adding semantic and syntactic information tothe words contained in a document. Here we are using a Part ofSpeech tagger and the VAD dictionary. VAD DICTIONARYThis is the dictiornary we are going to use for the text analysis based on the title of the movie. TheVAD dictionary contains the values, ranging from 0 to 1, of the following properties: Valence, Arousaland Dominance. On the side, you can find a visualization of the dataset, to make clearer its potential and its features. convert title stringto documentVAD dictionary tags assigned Dataset with VAD valuesVADdictionaryinsert rights namesto the columnspart of speech tags assignedThis node maysuperseed the previoustable reader (only in casethis workflow is called by another)We read textmining.table, a table produced by the main workflow, which needs to be executed once, at least partially (see the description)Main output port for external WorkflowsDictionary VisualizationOutput PortTagClouds Table OutportStrings To Document Dictionary Tagger Preprocessing GroupBy Bag of Words CSV Reader DictionaryVisualization Column Rename Title Length POS Tagger Bar Chart andTag Clouds ContainerInput (Table) Table Reader ContainerOutput (Table) ContainerOutput (Table) ContainerOutput (Table) ENRICHMENTEnrichment is useful in adding semantic and syntactic information tothe words contained in a document. Here we are using a Part ofSpeech tagger and the VAD dictionary. VAD DICTIONARYThis is the dictiornary we are going to use for the text analysis based on the title of the movie. TheVAD dictionary contains the values, ranging from 0 to 1, of the following properties: Valence, Arousaland Dominance. On the side, you can find a visualization of the dataset, to make clearer its potential and its features. convert title stringto documentVAD dictionary tags assigned Dataset with VAD valuesVADdictionaryinsert rights namesto the columnspart of speech tags assignedThis node maysuperseed the previoustable reader (only in casethis workflow is called by another)We read textmining.table, a table produced by the main workflow, which needs to be executed once, at least partially (see the description)Main output port for external WorkflowsDictionary VisualizationOutput PortTagClouds Table OutportStrings To Document Dictionary Tagger Preprocessing GroupBy Bag of Words CSV Reader DictionaryVisualization Column Rename Title Length POS Tagger Bar Chart andTag Clouds ContainerInput (Table) Table Reader ContainerOutput (Table) ContainerOutput (Table) ContainerOutput (Table)

Nodes

Extensions

Links