Document Similarity Predictor

The Document Similarity Predictor applies the model obtained by the Document Similarity Learner to a test document. It computes the cosine similarity between the original corpus of documents table and the test document.

Options

Document column from Similarity Learner
The document column to which the similarity is calculated. It is part of the output from the "Document Similarity Learner" Component.
Document Vector column associated with documents
Select the document vector column which is associated to the documents selected beforehand. It is also the output from the "Document Similarity Learner" Component.
Preprocessed document column for similarity prediction
Select the column containing preprocessed documents that will be used to predict the similarity to the original documents.
Term column associated with preprocessed documents
Select the column containing the terms associated with the preprocessed documents.
Minimum similarity
Selects the minimum similarity values you would like to output.
Neighbor count
Selects the number of similar neighboring documents you would like to output.

Input Ports

Icon
An input table containing the original corpus with the related document vectors.
Icon
A model containing node settings as well as column names of the term feature space.
Icon
An input table containing the new test document.

Output Ports

Icon
A table containing the selected similar documents
Icon
A single variable set to the count of matching documents per input document

Nodes

Extensions

Links