Icon

relating_​terms_​with_​their_​stems

Simple WF to keep a mapping of each term in a document to its stem produced by the Snowball Englishstemmer.The Snowball Stemmer node creates a new document with the stems produced by Snowball stemmer inplace of the original terms, but the Bag Of Words Creator node groups equal terms into a single entry.Therefore, we can’t establish a correspondence between original and stem terms from their order in thatnode’s output table because several original terms might have the same stem, which will appear onlyonce.Instead of using the Snowball Stemmer node, in the Java Snippet node we call the Snowball stemmerdirectly for each original term and produce a second column containing its stem. The path to theSnowball.jar file under the Additional Libraries tab of the Java Snippet node might need to be adapted. Node 1000Node 1002Node 1003Node 1004Node 1005Node 1007Node 1012 Term To String Column Rename Table Creator Strings To Document Bag Of WordsCreator Column Filter Java Snippet Simple WF to keep a mapping of each term in a document to its stem produced by the Snowball Englishstemmer.The Snowball Stemmer node creates a new document with the stems produced by Snowball stemmer inplace of the original terms, but the Bag Of Words Creator node groups equal terms into a single entry.Therefore, we can’t establish a correspondence between original and stem terms from their order in thatnode’s output table because several original terms might have the same stem, which will appear onlyonce.Instead of using the Snowball Stemmer node, in the Java Snippet node we call the Snowball stemmerdirectly for each original term and produce a second column containing its stem. The path to theSnowball.jar file under the Additional Libraries tab of the Java Snippet node might need to be adapted. Node 1000Node 1002Node 1003Node 1004Node 1005Node 1007Node 1012 Term To String Column Rename Table Creator Strings To Document Bag Of WordsCreator Column Filter Java Snippet

Nodes

Extensions

Links