address deduplication, string similarity and fingerprinting (a collection)
A few links and ressources I collected about address deduplication and string similarity and fingerprinting
A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
https://forum.knime.com/t/namensabgleich/19232/2?u=mlauber71
---------
Mr. Wiswedel is the man if it comes to address dedupe ...
https://forum.knime.com/u/wiswedel/summary
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/05_Indexing_Searching/03_Example_for_Fuzzy_Address_Matching
https://forum.knime.com/t/approach-fuzzy-match-or-supervised-learning/10900
Fingerprinting for addresses
https://forum.knime.com/t/rule-based-filter-question/13419/7?u=mlauber71
Simple Fuzzy Match Example with Levenshtein distance (scottf)
https://forum.knime.com/t/getting-started-with-ml/26531/3?u=mlauber71
https://hub.knime.com/scottf/spaces/Public/latest/ForumWorkflows/2020/09/Simple%20Fuzzy%20Match%20Example
---------
Compare strings by their similarity
https://forum.knime.com/t/comparing-strings/12939/8?u=mlauber71
You have to install Palladian to do that
https://nodepit.com/product/palladian
(is a special installation)
You need this repository
https://download.nodepit.com/palladian/4.2
---------
You can group adresses (and names) by their similarity without a 'ground truth'
https://forum.knime.com/t/how-can-i-define-and-list-the-duplication-in-an-adress-data-set-sucessfully-with-using-string-distances-node-and-similarity-search-node/42568/12?u=mlauber71
https://kni.me/w/a5sHElCCuSKV7j2Q
Fuzzy Address Matching
https://kni.me/w/sZfJYtD2BpTGNWnW
Address Deduplication
https://kni.me/w/QiS--QnukXBeL3mZ
-----------------------------------------------------------------
Additional Python ressources - not yet transfered into a KNIME workflow
Super Fast String Matching in Python
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html
Python - Adress matching I
https://github.com/dedupeio/address-matching
Python - Adress matching II
https://github.com/RobinL/AddressMatcher
libpostal: international street address NLP
https://github.com/openvenues/libpostal
https://datascience.stackexchange.com/questions/10810/how-to-do-postal-addresses-fuzzy-matching
Fuzzy String Matching in Python
https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!