The “Location Extractor” node allows to extract mentionings of geographic locations (aka. toponyms) from unstructured English text.
The location extraction algorithm performs various steps for recognizing potential locations within a given text, followed by a disambiguation. The disambiguation step checks hierarchical/contains relations and identifies the correct locations by their proximity to other given locations in the text (for example, it tries to distinguish between Paris, France and Paris, Texas based on the context in the given text input).
Each identified location in the text is returned, multiple occurrences are returned as often as they occur. Extracted locations are classified into the following categories:
Location Type | Explanation, Example |
---|---|
CONTINENT |
e.g. “Asia” |
COUNTRY |
e.g. “Japan” |
CITY |
e.g. “Tokyo” |
ZIP |
Zip code of a city |
STREET |
Name of a street |
STREETNR |
Number of a building within a street |
UNIT |
A political or administrative unit like a federal state, a county, or a city district (e.g. “California”, “Bavaria”, or “Manhattan”) |
REGION |
An area which is independent from or spanning multiple political or administrative units (e.g. “Midwest”) |
POI |
A human-made point of interest or a building, like hotels, museums, universities, monuments, etc. (e.g. “Stanford University” or “Tahrir Square”) |
LANDMARK |
Geographic features like rivers, canyons, lakes, islands, waterfalls, etc. (e.g. “Rocky Mountains”) |
UNDETERMINED |
An undetermined or unknown type |
For each location, geographical coordinates with longitude and latitude values are provided. They are in WGS84 decimal degrees.
In order to use the “Location Extractor”, a “Location Source” (also known as Gazetteer) must be connected to the node’s input port. The Location Source is a database with real world locations and meta information such as alternative names, population counts, coordinates, and hierarchical relations. Currently, there’s the following location sources available:
This node uses Palladian’s location extraction mechanism – for more information see: “To Learn or to Rule: Two Approaches for Extracting Geographical Information from Unstructured Text”; Philipp Katz and Alexander Schill; Proc. of the 11th Australasian Data Mining & Analytics Conference (AusDM 2013).
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension Palladian for KNIME from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.