Icon

Search reference in different columns-takbb 2

Adjust the Values in RowFilter to include/excludemore based on"similarity" Add or remove any manual rulesthat you might consider. As anexample Included a rule that saidif City is present in both, but isnot the same, then filter out therow (even though the rest of theaddress may be the same) More problematic might be how to deal with "street number"difference, where the only difference in the strings is the buildingnumber!You may need to apply additional logic, but at least you have areduced data set on which to apply such rules Table 1Table 2Match column valueswe don't care about the "columnName" theycame from as the data is notconsistentCreate our ownrow id seq numberfor future referencingCreate our ownrow id seq numberfor future referencingTurn into "key value pairs"Just to IDsThis is list of "potential" matches, no"definite" matchesPull in original datafrom Table1Pull in original datafrom Table2Bring "potentially matching" data togetherfrom Table1 and Table2Tab1IDTab2IDOrigTab2IDOrigTab1IDUse ngram tofind similarityof addressesoverallUnique concatenated string of alladdress componentsUnique concatenated string of alladdress componentsTidy upKeep those above asimilarity thresholde.g. > 50%Sort into order of table 1Include additional ruleshere to dictate if two addresses matche.g. filter out if city differsTurn into "key value pairs"Include rows that we didn't establish a match forGet unique set from Table1(ignore duplicates caused by potential match to multiple Table2 rows)Count matches for each Tab1-Tab2combinationOnly include wheretables match on more than 1address element? Table Creator Table Creator Joiner Counter Generation Counter Generation Unpivoting Column Filter Joiner Joiner Joiner Column Rename Column Rename Column Rename Column Rename String Similarity Column Aggregator Column Aggregator Column Filter Row Filter Sorter Rule-basedRow Filter Unpivoting Joiner DuplicateRow Filter GroupBy Row Filter Adjust the Values in RowFilter to include/excludemore based on"similarity" Add or remove any manual rulesthat you might consider. As anexample Included a rule that saidif City is present in both, but isnot the same, then filter out therow (even though the rest of theaddress may be the same) More problematic might be how to deal with "street number"difference, where the only difference in the strings is the buildingnumber!You may need to apply additional logic, but at least you have areduced data set on which to apply such rules Table 1Table 2Match column valueswe don't care about the "columnName" theycame from as the data is notconsistentCreate our ownrow id seq numberfor future referencingCreate our ownrow id seq numberfor future referencingTurn into "key value pairs"Just to IDsThis is list of "potential" matches, no"definite" matchesPull in original datafrom Table1Pull in original datafrom Table2Bring "potentially matching" data togetherfrom Table1 and Table2Tab1IDTab2IDOrigTab2IDOrigTab1IDUse ngram tofind similarityof addressesoverallUnique concatenated string of alladdress componentsUnique concatenated string of alladdress componentsTidy upKeep those above asimilarity thresholde.g. > 50%Sort into order of table 1Include additional ruleshere to dictate if two addresses matche.g. filter out if city differsTurn into "key value pairs"Include rows that we didn't establish a match forGet unique set from Table1(ignore duplicates caused by potential match to multiple Table2 rows)Count matches for each Tab1-Tab2combinationOnly include wheretables match on more than 1address element? Table Creator Table Creator Joiner Counter Generation Counter Generation Unpivoting Column Filter Joiner Joiner Joiner Column Rename Column Rename Column Rename Column Rename String Similarity Column Aggregator Column Aggregator Column Filter Row Filter Sorter Rule-basedRow Filter Unpivoting Joiner DuplicateRow Filter GroupBy Row Filter

Nodes

Extensions

Links