Icon

01_​Text_​Processing

This directory contains 27 workflows.

Icon10_​Discover_​Secret_​Ingredient 

This workflow read docx file with recipies, crawls an additional recipie, and extract and compares the used ingredients.

Icon11_​Lemmatizer_​Preprocessing 

A lemmatizer removes inflections, e.g in case of plurals, pronoun case, and verb endings of a word to revert it back to its base form (a lemma). To use the […]

Icon12_​DocumentVector_​FeatureSpaceAdaption 

The workflow shows how to use a Document Vector Adapter node in order to adjust the feature space of a second set of documents to make it identical to the […]

Icon13_​DocumentVector_​Hashing 

This workflows shows an alternative way to execute the Sentiment Analysis example with streaming enabled using the Document Vector Hashing node. The node […]

Icon14_​NER_​Tagger_​Model_​Training 

This workflows shows how to train a model for named-entity recognition. The workflow starts with reading the file. In this case each row represents a […]

Icon15_​RSS_​Feed_​Reader 

The workflow starts with URLs to some RSS news feeds. The news feed is downloaded, parsed and transformed in documents. Names of persons, organizations and […]

Icon16_​Tika_​Parsing 

The goal of the workflow is to show how to parse content of files using Tika nodes, detect the languages of the content using Tika language detector and […]

Icon17_​TopicExtraction_​with_​the_​ElbowMethod 

This workflow shows how to extract topics from text documents using the Topic Extractor node. It reads textual data from a table (or, alternatively, the […]

Icon18_​epub_​JPEG_​Romeo_​Juliet 

The challenge here is to blend together text and image data. The text data is in epub format while images are in JPEG format. The goal is to build a […]

Icon19_​Analyse_​and_​Visualize_​Job_​Postings 

The workflow perofrms text processing of the Job Posts dataset (only IT related postings) The upper branch extracts the most frequent required skills and […]