This node is currently not available in KNIME v5.5 — instead we’re showing this page for KNIME v5.1. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

Clean HTML Retriever

This node takes URL from a column, retrieves its content (assuming to be in HTML form) for parsing. If HTML content is available in another column, it can take HTML content directly instead of pulling from URL. HTML content is then parsed and cleaned up using HtmlCleaner to output in XHTML form. The result can be configured to output in either String for XML type.

Options

URL Column Name: URL column name
Content Column Name: Content column name. If available, the node will use this content instead of pulling from URL.
Output Column Name: Column name of the resulting parsed XHTML content, default name is "XHTML".
Output result as XML: Output result as String or XML type. XML type is useful when this node is part of XML analysis workflow.
User agent: User agent to be used in header for HTTP request.
Number of retries: Number of retries after a failure per URL requests.
Make absolute URLs: Convert all relative URLs in the documents into absolute URLs.

Input Ports

: An input table that contains URL / content columns

Output Ports

: An output table URL and XHTML results

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension MMI Data Analytics Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.1

A zipped version of the software site can be downloaded here.

Plugin provider: MMI Agency

Plugin version: 0.0.16.v202406140551

On NodePit since: 2023-10-15

Last update: 2025-07-24

Tags: Streamable

KNIME versions: v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!