Location Extractor

Go to Product
This Node Is Deprecated — This version of the node has been replaced with a new and improved version. The old version is kept for backwards-compatibility, but for all new workflows we suggest to use the version linked below.
Go to Suggested ReplacementLocation Extractor

The “Location Extractor” node allows to extract geographic locations from unstructured English text. This node uses Palladian’s location extraction mechanism.

The location extraction algorithm performs various steps for recognizing potential locations within a given text, followed by a disambiguation. The disambiguation step checks hierarchical/contains relations and identifies correct locations by their proximity to other given locations in the text.

Each identified location in the text is returned, multiple occurrences are returned as often as they occur. Extracted locations are classified into the following categories:

  • CONTINENT (e.g. “Asia”)
  • COUNTRY (e.g. “Japan”)
  • CITY (e.g. “Tokyo”)
  • ZIP (ZIP code of a city)
  • STREET (name of a street)
  • STREETNR (number of a building within a street)
  • UNIT (a political or administrative unit like state, county, district)
  • REGION (an area independent from or spanning multiple political or administrative units)
  • POI (a human-made point of interest, like hotels, museums, universities, monuments, etc.)
  • LANDMARK (geographic features like rivers, canyons, lakes, islands, waterfalls, etc.)
  • UNDETERMINED (an undetermined or unknown type)

For each location, geographical coordinates with longitude and latitude values are provided. They are in WGS84 decimal degrees.

Location Source Setup

In order to use the “Location Extractor”, a “Location Source” (also known as Gazetteer) must be configured. The Location Source provides a database with real world locations and meta information such as alternative names, population figures, coordinates, and hierarchical relations. You can select and configure Location Sources in the KNIME Preferences under KNIME → Palladian → Location Extractor.

There are two Location Sources available:

GeoNames: We currently provide a freely usable Location Source for the GeoNames API. This Location Source allows 30,000 REST requests/day, 2,000 REST request/hour. To add the GeoNames source, click the “New…” button, and follow the instructions and the link to create a free GeoNames account. We suggest to enable the option to retrieve location hierarchies to improve the Location Extractor’s results. This however causes an additional API request for every found location.

Local Gazetteer: In case, you want to keep your data private, you’re running out of GeoName’s request limit, or you significantly want to speed up operations, we provide a separate plugin, which allows to setup a local gazetteer on your machine without accessing the Web. Contact us at mail@palladian.ai, if you are interested.

Options

Input
The column in the input table which contains the text.

Input Ports

Icon
Table with a column holding text from which to extract locations.

Output Ports

Icon
Table with rows for each extracted location from the provided text inputs. It provides columns with the normalized name of the location (e.g. the short form “L.A.” occurring in the text is returned by its full form “Los Angeles”), the type of the location, the geographical coordinates in WGS84 decimal degrees, and the population of the location (if applicable).

Popular Predecessors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.