Document Similarity Learner

The Document Similarity Learner develops a model for identifying a new documents most similar matches from an existing corpus of documents. It consumes already processed documents (refer to Document Preprocessing Component) as input and provides as output both the corpus of documents and a model for use with the Document Similarity Predictor Component.

Options

Select preprocessed document column
Select the column containing the preprocessed documents.
Select the term column
Select the column containing the terms.
Number of keywords to extract
Number of keywords to extract from each input document.

Input Ports

Icon
Documents which have already been preprocessed (via Document Preprocessing).

Output Ports

Icon
The reference corpus of documents for future comparison with new documents.
Icon
Model for creating document vectors on new documents in the appropriate, compatible format.

Nodes

Extensions

Links