Location Extractor

Go to Product

The “Location Extractor” node allows to extract mentionings of geographic locations (aka. toponyms) from unstructured English text.

The location extraction algorithm performs various steps for recognizing potential locations within a given text, followed by a disambiguation. The disambiguation step checks hierarchical/contains relations and identifies the correct locations by their proximity to other given locations in the text (for example, it tries to distinguish between Paris, France and Paris, Texas based on the context in the given text input).

Each identified location in the text is returned, multiple occurrences are returned as often as they occur. Extracted locations are classified into the following categories:

Location Type	Explanation, Example
`CONTINENT`	e.g. “Asia”
`COUNTRY`	e.g. “Japan”
`CITY`	e.g. “Tokyo”
`ZIP`	Zip code of a city
`STREET`	Name of a street
`STREETNR`	Number of a building within a street
`UNIT`	A political or administrative unit like a federal state, a county, or a city district (e.g. “California”, “Bavaria”, or “Manhattan”)
`REGION`	An area which is independent from or spanning multiple political or administrative units (e.g. “Midwest”)
`POI`	A human-made point of interest or a building, like hotels, museums, universities, monuments, etc. (e.g. “Stanford University” or “Tahrir Square”)
`LANDMARK`	Geographic features like rivers, canyons, lakes, islands, waterfalls, etc. (e.g. “Rocky Mountains”)
`UNDETERMINED`	An undetermined or unknown type

For each location, geographical coordinates with longitude and latitude values are provided. They are in WGS84 decimal degrees.

In order to use the “Location Extractor”, a “Location Source” (also known as Gazetteer) must be connected to the node’s input port. The Location Source is a database with real world locations and meta information such as alternative names, population counts, coordinates, and hierarchical relations. Currently, there’s the following location sources available:

The “GeoNames Location Source” connects to the GeoNames REST API.
The “Local Location Source” is a locally hosted database.

This node uses Palladian’s location extraction mechanism – for more information see: “To Learn or to Rule: Two Approaches for Extracting Geographical Information from Unstructured Text”; Philipp Katz and Alexander Schill; Proc. of the 11th Australasian Data Mining & Analytics Conference (AusDM 2013).

Options

Input

The column in the input table which contains the text.

Disambiguation

The disambiguation method to use. Currently the following methods are supported:

ML (730-docs-10T): Machine-learning based disambiguation trained on 730 documents from different datasets using a sohpisticated set of features.
ML (TUD-Loc-2013-10T): Machine-learning based disambiguation trained on the TUD-Loc-2013 dataset using a sophisticated set of features.
Heuristic: Disambiguation based on several rules.

Minimum Trust

Trust probability threshold between 0 … 1. It allows to regulate the Precision/Recall tradeoff. The lower the value, the more locations will be extracted, but the higher the probably for invalid extractions. With increasing threshold, less locations will be extracted, but with a higher probability that all of them are correct.

Output

Specify how the extracted locations should be mapped to column values:

Rows: Create one new row for each location found in the text. In case there is more than one location, this will append multiple rows per input, or no row in case no match was found.
Rows or Missing: Same as “Rows”, but append a row with missing value cells when no location was found for an input row.
JSON: Append a JSON array with the rextracted locations and detailed location information.

Output Column Prefix (*)

Set a prefix for the appended column names.

Input Ports

: Connector to a Location Source.
: Table with a column holding text from which to extract locations.

Output Ports

: Table with rows for each extracted location from the provided text inputs. It provides columns with the normalized name of the location (e.g. the short form “L.A.” occurring in the text is returned by its full form “Los Angeles”), the type of the location, the geographical coordinates in WGS84 decimal degrees, and the population of the location (if applicable).

Popular Predecessors

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Go to Product

Installation

To use this node in KNIME, install the extension Palladian for KNIME from the below update site following our NodePit Product and Node Installation Guide:

v5.12

A zipped version of the software site can be downloaded here.

Plugin provider: palladian.ws

Plugin version: 3.4.0.202601041906

On NodePit since: 2026-07-07

Last update: 2026-08-01

Tags: Streamable

KNIME versions: Since v4.4

NodePit ExclusiveOnly available on NodePit

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!