NGram Creator

This node creates ngrams from the documents of the input table and counts their frequencies. It can be specified whether word or character ngrams are created. Furthermore it can be specified whether the output table is of a bag of words like structure or a data table containing ngrams and their frequencies in the corpus and documents.

Options

N
The N value for ngram creation.
NGram type
The type (Word or Character) specifies whether word or character ngrams are created.
Document column
The column containing the input documents to create the ngrams of.
Output table
Specifies the structure of the output data table. The "Ngram frequencies" option creates an output table consisting of all ngrams and their frequencies in the complete corpus, as well as the documents and sentences or words. Please note that this option is memory consuming for a large number of documents.
The "NGram bag of words" option creates an output data table consisting of ngram and document tuples. A tuple represents the occurrence of an ngram in a document. Additionally the frequency column contains the number of occurrences of the ngram in the document. This option can also be applied on a large set of documents.
Number of maximal parallel processes
Specifies the maximal number of parallel processes running for ngram computation.
Number of documents per process
Specifies the number of documents being processed by a single thread.

Input Ports

Icon
The input table containing the documents.

Output Ports

Icon
The output table containing the ngrams and their corresponding frequencies, or a bag of ngrams.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.