URL Normalizer

Go to Product

This node allows to transform URLs to a canonicalized representation, e. g. for matching them in data mining scenarios. It is not intended, to produce equally working URLs in all cases. The following steps are performed by this node:

  • Lower case URL
  • Add http:// protocol, if not present
  • Transform https:// to http:// protocol
  • Remove session IDs from URLs
  • Normalize relative path components (e. g. "..")
  • Remove trailing slashes from URLs
  • Remove “index.htm*” part from URLs
  • Sort query parameters alphabetically


The column which holds the URLs to process.

Input Ports

Table which contains a column with URLs to process.

Output Ports

Input table with appended column for canonicalized URLs.


This node has no views


  • No workflows found

Further Links


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.