01_Text_Processing

This directory contains 27 workflows.

10_Discover_Secret_Ingredient

This workflow read docx file with recipies, crawls an additional recipie, and extract and compares the used ingredients.

11_Lemmatizer_Preprocessing

A lemmatizer removes inflections, e.g in case of plurals, pronoun case, and verb endings of a word to revert it back to its base form (a lemma). To use the […]

12_DocumentVector_FeatureSpaceAdaption

The workflow shows how to use a Document Vector Adapter node in order to adjust the feature space of a second set of documents to make it identical to the […]

13_DocumentVector_Hashing

This workflows shows an alternative way to execute the Sentiment Analysis example with streaming enabled using the Document Vector Hashing node. The node […]

14_NER_Tagger_Model_Training

This workflows shows how to train a model for named-entity recognition. The workflow starts with reading the file. In this case each row represents a […]

15_RSS_Feed_Reader

The workflow starts with URLs to some RSS news feeds. The news feed is downloaded, parsed and transformed in documents. Names of persons, organizations and […]

16_Tika_Parsing

The goal of the workflow is to show how to parse content of files using Tika nodes, detect the languages of the content using Tika language detector and […]

17_TopicExtraction_with_the_ElbowMethod

This workflow shows how to extract topics from text documents using the Topic Extractor node. It reads textual data from a table (or, alternatively, the […]

18_epub_JPEG_Romeo_Juliet

The challenge here is to blend together text and image data. The text data is in epub format while images are in JPEG format. The goal is to build a […]

19_Analyse_and_Visualize_Job_Postings

The workflow perofrms text processing of the Job Posts dataset (only IT related postings) The upper branch extracts the most frequent required skills and […]

01_​Text_​Processing

01_Text_Processing