0 ×

Location Extractor

Palladian for KNIME version 2.3.0.202009251618 by palladian.ws; Philipp Katz, Klemens Muthmann, David Urbansky

The “Location Extractor” node allows to extract geographic locations from unstructured English text. This node uses Palladian’s location extraction mechanism.

The location extraction algorithm performs various steps for recognizing potential locations within a given text, followed by a disambiguation. The disambiguation step checks hierarchical/contains relations and identifies correct locations by their proximity to other given locations in the text.

Each identified location in the text is returned, multiple occurrences are returned as often as they occur. Extracted locations are classified into the following categories:

  • CONTINENT (e.g. “Asia”)
  • COUNTRY (e.g. “Japan”)
  • CITY (e.g. “Tokyo”)
  • ZIP (ZIP code of a city)
  • STREET (name of a street)
  • STREETNR (number of a building within a street)
  • UNIT (a political or administrative unit like state, county, district)
  • REGION (an area independent from or spanning multiple political or administrative units)
  • POI (a human-made point of interest, like hotels, museums, universities, monuments, etc.)
  • LANDMARK (geographic features like rivers, canyons, lakes, islands, waterfalls, etc.)
  • UNDETERMINED (an undetermined or unknown type)

For each location, geographical coordinates with longitude and latitude values are provided. They are in WGS84 decimal degrees.

Location Source Setup

In order to use the “Location Extractor”, a “Location Source” (also known as Gazetteer) must be configured. The Location Source provides a database with real world locations and meta information such as alternative names, population figures, coordinates, and hierarchical relations. You can select and configure Location Sources in the KNIME Preferences under KNIME → Palladian → Location Extractor.

There are two Location Sources available:

GeoNames: We currently provide a freely usable Location Source for the GeoNames API. This Location Source allows 30,000 REST requests/day, 2,000 REST request/hour. To add the GeoNames source, click the “New…” button, and follow the instructions and the link to create a free GeoNames account. We suggest to enable the option to retrieve location hierarchies to improve the Location Extractor’s results. This however causes an additional API request for every found location.

Local Gazetteer: In case, you want to keep your data private, you’re running out of GeoName’s request limit, or you significantly want to speed up operations, we provide a separate plugin, which allows to setup a local gazetteer on your machine without accessing the Web. Contact us at mail@palladian.ai, if you are interested.

Options

Input
The column in the input table which contains the text.

Input Ports

Icon
Table with a column holding text from which to extract locations.

Output Ports

Icon
Table with rows for each extracted location from the provided text inputs. It provides columns with the normalized name of the location (e.g. the short form “L.A.” occurring in the text is returned by its full form “Los Angeles”), the type of the location, the geographical coordinates in WGS84 decimal degrees, and the population of the location (if applicable).

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install Palladian for KNIME from the following update site:

KNIME 4.2

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.