Icon

Scraping and analyzing Breaking Bad subtitles with Redfield NLP nodes

Scraping Breaking Bad series subtitles with Selenium nodes and analysing with Redfield NLP Nodes extension.

Demo shows how utilize Spacy nodes combined with Knime text processing nodes to do topic modeling, clusterization, dimensionality reduction and visualizing knowledge graph.

The extension can be obtained here: https://nlpnodes.com/






Read and process all files Text processing with Redfield NLP Nodes and Knime Text Processing Nodes Topic modeling, vectorization and clustering Visualization and analysis Scrape the subtitles Bundled ChromiumNavigate to the Seasons overviewpageGet links toall episodesNode 121Navigate to thecurrent episode asprovided by flowvariableFind subtitle rowExtract languageGet entire text contentList scraped sub filesGet season and episodefrom filenameGet season and episodeas columnsKeep only oneentry with highestdownload countClick theDOWNLOADbuttonClose browserExtract numberof downloadsAdd you path hereReplace abbreviationsLeave onlynouns, verbs,adjectivesFilter short wordsIterate byseasonGet number ofepisodes per seasonTopics per sectionAttach season to termsSeason+Episode+TopicNode 239Join topicswith clustersStop wordsdictionaryRemove HTML tagsOnly keep EnglishSort by numberof downloadsRemovenames,locations,organizationsNode 260en_core_web_smConverting stringsto documentsGetnamed entitiesNode 265Node 266Node 267WebDriver Factory Start WebDriver Navigate Extract Attribute Table Row ToVariable Loop Start Navigate Find Elements Extract Attribute Extract Text List Files/Folders String Manipulation Column Expressions Row Filter Click Loop End(deprecated) Quit WebDriver Wait Wait String Manipulation Select folderfor subs Read scraped files String Manipulation Punctuation Erasure Stop Word Filter Stop Word Filter Case Converter Tag Filter N Chars Filter Group Loop Start Create parameters Topic Extractor(Parallel LDA) ConstantValue Column GroupBy Loop End (2 ports)(deprecated) Joiner Clustering analysis Knowledge graph Clustering anddimensionality reduction Table Creator String Manipulation Row Filter Sorter Tag Filter Row Filter Spacy ModelSelector Spacy Tokenizer Spacy NER Spacy POS Tagger Spacy Lemmatizer Spacy Vectorizer Read and process all files Text processing with Redfield NLP Nodes and Knime Text Processing Nodes Topic modeling, vectorization and clustering Visualization and analysis Scrape the subtitles Bundled ChromiumNavigate to the Seasons overviewpageGet links toall episodesNode 121Navigate to thecurrent episode asprovided by flowvariableFind subtitle rowExtract languageGet entire text contentList scraped sub filesGet season and episodefrom filenameGet season and episodeas columnsKeep only oneentry with highestdownload countClick theDOWNLOADbuttonClose browserExtract numberof downloadsAdd you path hereReplace abbreviationsLeave onlynouns, verbs,adjectivesFilter short wordsIterate byseasonGet number ofepisodes per seasonTopics per sectionAttach season to termsSeason+Episode+TopicNode 239Join topicswith clustersStop wordsdictionaryRemove HTML tagsOnly keep EnglishSort by numberof downloadsRemovenames,locations,organizationsNode 260en_core_web_smConverting stringsto documentsGetnamed entitiesNode 265Node 266Node 267WebDriver Factory Start WebDriver Navigate Extract Attribute Table Row ToVariable Loop Start Navigate Find Elements Extract Attribute Extract Text List Files/Folders String Manipulation Column Expressions Row Filter Click Loop End(deprecated) Quit WebDriver Wait Wait String Manipulation Select folderfor subs Read scraped files String Manipulation Punctuation Erasure Stop Word Filter Stop Word Filter Case Converter Tag Filter N Chars Filter Group Loop Start Create parameters Topic Extractor(Parallel LDA) ConstantValue Column GroupBy Loop End (2 ports)(deprecated) Joiner Clustering analysis Knowledge graph Clustering anddimensionality reduction Table Creator String Manipulation Row Filter Sorter Tag Filter Row Filter Spacy ModelSelector Spacy Tokenizer Spacy NER Spacy POS Tagger Spacy Lemmatizer Spacy Vectorizer

Nodes

Extensions

Links