Icon

kn_​forum_​42568_​address_​deduplication

Match similar addresses from one list together into similar groups

Match similar addresses from one list together into similar groups

Match similar addresses from one list together into similar groups Loop thru every single line and select all matches based on the normalized string and the Range Filter (0.25 is the default though this may depend on your data) The way you construct the key1 string will influence what happens to thesimilarity search. If you want to stress the street you might double or triplethe entry. If you want to just use the start of the adress string you might doso. The Range filter will influence what string is considered a match You can handle total matches in a separate way create temporary table that would store the lines already processed. If you want to run the Loop again you will have to reset this part! Levenshteindistance only lowercase alphanumeric leftkey1 - construct the string that will be comparedTest_Datenquelle_Duplicate.xlsxData numberData numberNode 64$duplicate-type-classifier$ = "chosen" => TRUE$duplicate-type-classifier$ = "unique" => TRUEkeep onlystring and IDscurrent selectionSTARTtemp_store_processed.tabletemp_store_processed.tabletemp_store_remain.tabletemp_store_remain.tableexclude ProcessedREMAINNode 91Node 92nearest neighbor - indexnearest neighbor - CounterdistanceTRUE => FALSEexclude ProcessedREMAINtemp_store_processed.tabletemp_store_remain.tableexclude ProcessedENDcompareNode 103PROCESSEDREMAINNode 106Node 108temp_store_processed.tableidentify thechildren of parent nodesJoin the ParentInformationsParent ...Parent CounterNode 118Node 119Parent Counterrange-filterrange-filterrange-filterrange-filtermatched_addresses.xlsxaddress_sample.tableSimilarity Search String Manipulation Excel Reader RowID DuplicateRow Filter Number ToString (PMML) Missing Value DuplicateRow Filter Rule-basedRow Filter Column Filter Rule-basedRow Filter Table Row ToVariable Loop Start Counter Generation Table Writer Table Reader Table Writer Table Reader ReferenceRow Filter GroupBy Table Rowto Variable ConstantValue Column ConstantValue Column ConstantValue Column Rule-basedRow Filter ReferenceRow Filter Table Writer Table Writer ReferenceRow Filter Variable Loop End Merge Variables Concatenate Empty Table Switch Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Table Reader DuplicateRow Filter Rule-basedRow Filter Joiner Column Rename Rule Engine Sorter Column Filter Rule Engine Integer SliderConfiguration Variable toTable Row Math Formula Table Rowto Variable Excel Writer Table Writer Match similar addresses from one list together into similar groups Loop thru every single line and select all matches based on the normalized string and the Range Filter (0.25 is the default though this may depend on your data) The way you construct the key1 string will influence what happens to thesimilarity search. If you want to stress the street you might double or triplethe entry. If you want to just use the start of the adress string you might doso. The Range filter will influence what string is considered a match You can handle total matches in a separate way create temporary table that would store the lines already processed. If you want to run the Loop again you will have to reset this part! Levenshteindistance only lowercase alphanumeric leftkey1 - construct the string that will be comparedTest_Datenquelle_Duplicate.xlsxData numberData numberNode 64$duplicate-type-classifier$ = "chosen" => TRUE$duplicate-type-classifier$ = "unique" => TRUEkeep onlystring and IDscurrent selectionSTARTtemp_store_processed.tabletemp_store_processed.tabletemp_store_remain.tabletemp_store_remain.tableexclude ProcessedREMAINNode 91Node 92nearest neighbor - indexnearest neighbor - CounterdistanceTRUE => FALSEexclude ProcessedREMAINtemp_store_processed.tabletemp_store_remain.tableexclude ProcessedENDcompareNode 103PROCESSEDREMAINNode 106Node 108temp_store_processed.tableidentify thechildren of parent nodesJoin the ParentInformationsParent ...Parent CounterNode 118Node 119Parent Counterrange-filterrange-filterrange-filterrange-filtermatched_addresses.xlsxaddress_sample.tableSimilarity Search String Manipulation Excel Reader RowID DuplicateRow Filter Number ToString (PMML) Missing Value DuplicateRow Filter Rule-basedRow Filter Column Filter Rule-basedRow Filter Table Row ToVariable Loop Start Counter Generation Table Writer Table Reader Table Writer Table Reader ReferenceRow Filter GroupBy Table Rowto Variable ConstantValue Column ConstantValue Column ConstantValue Column Rule-basedRow Filter ReferenceRow Filter Table Writer Table Writer ReferenceRow Filter Variable Loop End Merge Variables Concatenate Empty Table Switch Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Table Reader DuplicateRow Filter Rule-basedRow Filter Joiner Column Rename Rule Engine Sorter Column Filter Rule Engine Integer SliderConfiguration Variable toTable Row Math Formula Table Rowto Variable Excel Writer Table Writer

Nodes

Extensions

Links