Icon

Text Processing

This directory contains 12 workflows.

Document Preprocessing 

Document Preprocessing applies a common sequence of preprocessing steps to clean and prepare text for subsequent analysis and comparison with other text. As […]

Document Similarity Learner 

The Document Similarity Learner develops a model for identifying a new documents most similar matches from an existing corpus of documents. It consumes […]

Document Similarity Predictor 

The Document Similarity Predictor applies the model obtained by the Document Similarity Learner to a test document. It computes the cosine similarity […]

Keyword Search 

This component extracts the most relevant English keywords in a corpus (a collection of documents) using three specific techniques: - Topic Extraction […]

News API Advanced Search 

This component allows you to query the News API (newsapi.org) in order to return news articles for specific search terms and parameters. The component can […]

News API Headlines 

This component allows you to query the News API (newsapi.org) in order to return the current top headlines for a particular country. The component can be […]

News API Sources 

This component allows you to query the News API (newsapi.org) in order to return the list of news sources (e.g. TechCrunch) currently registered with the […]

PubMed Document Extractor 

Allows the search for life sciences and biomedical topics in PubMed, which is a free search engine developed and maintained by the National Center for […]

Topic Assigner (STM) 

Use the component to apply the model trained with the 'Topic Extractor (STM)' component. See the other component for more information. This component […]

Topic Extractor (STM) 

The component trains an STM topic model via unsupervised learning. It integrates with the R implementation of Structural Topic Models (STM), following […]