Icon

20200603 Pikairos iterate over a list and optimize speed of the searching

What I have so far:

I have a list of strings (per row around 2000 words)
Columns with words (around 300 words per column) which ones I have to compare with list of strings
Input from the user for number of hits that has to happen

What I want to do?

I want to compare each word from the list of strings with each word from the columns with words. How can I do that?
I want to optimize my search. How? Example: if word on position 133 in list of strings has a match with 53rd word from the column of words I want KNIME to check if user entered one hit has to happen or more hits are required…if just one hit is required I want that KNIME stops iterating over the rest of the list of strings. If number of hits is 2,3,4,5 or more, KNIME should continue iterating until the count of hits is the same that user entered.
How can I do that?

Example of solution for this would be amazing. Thanks in advance.

Kind regards,
Denis

Read sentencesCreate lists from sentencesCreate a COlumnwith RowIDto rememberRowID originof wordsUngroupthe lists keepingtheir RowID originGenerate uniquecolumn of wordswith count asa dictionaryRemove any undesiredcharacter fromsentencesJoin the words from sentenceswith the words in dictionaryKeep onlythe most frequentGroup "detected" wordsby sentence of origin (RowID)to compute what dictiionary wordsappear in sentences and count themI suppose here thatyou count a "same" wordonly once and not every time it appearsif repeated in the sentence.Otherwise change unique count by countand set by list in the aggregationHere comesyour magic numberto decide what sentences are keptbased on theminimum word occurenciesI set it to 4 minimum !Join results withinitial sentencesThe sentences havingat list 4 wordsfrom the dictionaryFile Reader Cell Splitter RowID Ungroup GroupBy String Manipulation Joiner Row Filter GroupBy Row Filter Joiner InteractiveTable (local) Read sentencesCreate lists from sentencesCreate a COlumnwith RowIDto rememberRowID originof wordsUngroupthe lists keepingtheir RowID originGenerate uniquecolumn of wordswith count asa dictionaryRemove any undesiredcharacter fromsentencesJoin the words from sentenceswith the words in dictionaryKeep onlythe most frequentGroup "detected" wordsby sentence of origin (RowID)to compute what dictiionary wordsappear in sentences and count themI suppose here thatyou count a "same" wordonly once and not every time it appearsif repeated in the sentence.Otherwise change unique count by countand set by list in the aggregationHere comesyour magic numberto decide what sentences are keptbased on theminimum word occurenciesI set it to 4 minimum !Join results withinitial sentencesThe sentences havingat list 4 wordsfrom the dictionaryFile Reader Cell Splitter RowID Ungroup GroupBy String Manipulation Joiner Row Filter GroupBy Row Filter Joiner InteractiveTable (local)

Nodes

Extensions

Links