0 ×

URL Normalizer

Palladian for KNIME version 2.3.0.202009251618 by palladian.ws; Philipp Katz, Klemens Muthmann, David Urbansky

This node allows to transform URLs to a canonicalized representation, e. g. for matching them in data mining scenarios. It is not intended, to produce equally working URLs in all cases. The following steps are performed by this node:

  • Lower case URL
  • Add http:// protocol, if not present
  • Transform https:// to http:// protocol
  • Remove session IDs from URLs
  • Normalize relative path components (e. g. "..")
  • Remove trailing slashes from URLs
  • Remove “index.htm*” part from URLs
  • Sort query parameters alphabetically

Options

URL
The column which holds the URLs to process.

Input Ports

Icon
Table which contains a column with URLs to process.

Output Ports

Icon
Input table with appended column for canonicalized URLs.

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install Palladian for KNIME from the following update site:

KNIME 4.2

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.