address deduplication, string similarity and fingerprinting (a collection)
A few links and ressources I collected about address deduplication and string similarity and fingerprinting
A meta collection of KNIME ressources for address deduplication or ‘fingerprinting’
https://forum.knime.com/t/namensabgleich/19232/2?u=mlauber71
---------
Mr. Wiswedel is the man if it comes to address dedupe ...
https://forum.knime.com/u/wiswedel/summary
https://hub.knime.com/knime/spaces/Examples/latest/50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data
https://hub.knime.com/knime/spaces/Examples/latest/02_ETL_Data_Manipulation/05_Indexing_Searching/03_Example_for_Fuzzy_Address_Matching
https://forum.knime.com/t/approach-fuzzy-match-or-supervised-learning/10900
Fingerprinting for addresses
https://forum.knime.com/t/rule-based-filter-question/13419/7?u=mlauber71
Simple Fuzzy Match Example with Levenshtein distance (scottf)
https://forum.knime.com/t/getting-started-with-ml/26531/3?u=mlauber71
https://hub.knime.com/scottf/spaces/Public/latest/ForumWorkflows/2020/09/Simple%20Fuzzy%20Match%20Example
---------
Compare strings by their similarity
https://forum.knime.com/t/comparing-strings/12939/8?u=mlauber71
You have to install Palladian to do that
https://nodepit.com/product/palladian
(is a special installation)
You need this repository
https://download.nodepit.com/palladian/4.2
---------
You can group adresses (and names) by their similarity without a 'ground truth'
https://forum.knime.com/t/how-can-i-define-and-list-the-duplication-in-an-adress-data-set-sucessfully-with-using-string-distances-node-and-similarity-search-node/42568/12?u=mlauber71
https://kni.me/w/a5sHElCCuSKV7j2Q
Fuzzy Address Matching
https://kni.me/w/sZfJYtD2BpTGNWnW
Address Deduplication
https://kni.me/w/QiS--QnukXBeL3mZ
-----------------------------------------------------------------
Additional Python ressources - not yet transfered into a KNIME workflow
Super Fast String Matching in Python
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html
Python - Adress matching I
https://github.com/dedupeio/address-matching
Python - Adress matching II
https://github.com/RobinL/AddressMatcher
libpostal: international street address NLP
https://github.com/openvenues/libpostal
https://datascience.stackexchange.com/questions/10810/how-to-do-postal-addresses-fuzzy-matching
Fuzzy String Matching in Python
https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.