Icon

79614 - Count Words

ImportantAlways bare in mind the 90 cm problem .,.. the userin front of the keyboard. Forum Posthttps://forum.knime.com/t/counting-words-in-a-text-string/79614ChallengeCount words ... might sound easy but humangenerated data can have lots of suprises!NoteMind the duplicated whitespace after the first wordand at the very end! Worth to note, there are otherpossible issues like the presence of controlcharacters, that can cause problems too. Options1. Remove all but whitespace and count length of remaining stringProblems- Account for duplicated white spaces in case by removing replacing those byjust one.- Account for leadinig and trailing white spaces2. Count collection size3. Count using GroupBy4. Use Text Processing nodes ImportantDocument title is searched by the TF node to countwords too but not the meta data!https://forum.knime.com/t/text-processing-tf-node-counts-incorrectly/79718/3 Node 1Option 1Count WhitespacesSplit by Spaceinto ListNode 11Clean upWhitespacesNode 14Node 16AbsoluteNode 21Count of rowsNode 23Total Count ofeach word occurenceNode 25Node 26Node 27Node 28Calcrelative shareRelativeNode 31Node 32Node 33Node 34 Table Creator String Manipulation Create one Cellwith line break Cell Splitter Collection Size String Cleaner Strings to Document Bag Of WordsCreator TF Ungroup Extract TableDimension GroupBy GroupBy Column Appender GroupBy Column Renamer Table Rowto Variable Math Formula TF Term to String Value Lookup Value Lookup Strings to Document ImportantAlways bare in mind the 90 cm problem .,.. the userin front of the keyboard. Forum Posthttps://forum.knime.com/t/counting-words-in-a-text-string/79614ChallengeCount words ... might sound easy but humangenerated data can have lots of suprises!NoteMind the duplicated whitespace after the first wordand at the very end! Worth to note, there are otherpossible issues like the presence of controlcharacters, that can cause problems too. Options1. Remove all but whitespace and count length of remaining stringProblems- Account for duplicated white spaces in case by removing replacing those byjust one.- Account for leadinig and trailing white spaces2. Count collection size3. Count using GroupBy4. Use Text Processing nodes ImportantDocument title is searched by the TF node to countwords too but not the meta data!https://forum.knime.com/t/text-processing-tf-node-counts-incorrectly/79718/3 Node 1Option 1Count WhitespacesSplit by Spaceinto ListNode 11Clean upWhitespacesNode 14Node 16AbsoluteNode 21Count of rowsNode 23Total Count ofeach word occurenceNode 25Node 26Node 27Node 28Calcrelative shareRelativeNode 31Node 32Node 33Node 34Table Creator String Manipulation Create one Cellwith line break Cell Splitter Collection Size String Cleaner Strings to Document Bag Of WordsCreator TF Ungroup Extract TableDimension GroupBy GroupBy Column Appender GroupBy Column Renamer Table Rowto Variable Math Formula TF Term to String Value Lookup Value Lookup Strings to Document

Nodes

Extensions

Links