HTML Parser

This HTML parser is based on Validator.nu.

Quotation from the web page: The Validator.nu HTML Parser is an implementation of the HTML5 parsing algorithm in Java. The parser is designed to work as a drop-in replacement for the XML parser in applications that already support XHTML 1.x content with an XML parser and use SAX, DOM or XOM to interface with the parser. Low-level functionality is provided for applications that wish to perform their own IO and support document.write() with scripting. The parser core compiles on Google Web Toolkit and can be automatically translated into C++. (The C++ translation capability is currently used for porting the parser for use in Gecko.)

You can supply input to this node in a variety of formats:

HTTP Result cells which you obtained with the “HTTP Retriever” node
Binary data cells
String cells which contain a local file: URL; note that although technically possible, it is not recommended to input http: or https: URLs directly into the parser. Use the “HTTP Retriever” for downloading instead and input the HTTP Results into this node to guarantee proper encoding.
String cells which contain the raw markup (e.g. <html><head> […] )

“Infotising”: This node is only intended for “static” HTML structures. If you need to work with interactive web pages and web apps which are dynamically generated client-side in the browser, have a look at our plugin “Selenium Nodes”.

Options

Input: Column in the input table which holds the data to parse.
Drop input column: Enable to exclude the input column from the result table.
Make absolute URLs: When enabled, all relative URLs in the document are converted to absolute ones. This simplifies/permits further processing steps with the URLs obtained from the document.

Input Ports

: Input table containing (X)HTML data to be parsed.

Output Ports

: Output table with parsed (X)HTML documents appended. In case, a document could not be parsed, a “missing value” is appended.

Popular Predecessors

Popular Successors

XPath13 %
XPath (deprecated)4 %
Cell Splitter3 %
Interactive Table3 %
Document Viewer3 %
Show all 194 recommendations

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Go to Product

Installation

To use this node in KNIME, install the extension Palladian for KNIME from the below update site following our NodePit Product and Node Installation Guide:

v5.12

A zipped version of the software site can be downloaded here.

Plugin provider: palladian.ws

Plugin version: 3.4.0.202601041906

On NodePit since: 2026-07-07

Last update: 2026-07-20

Tags: Streamable

KNIME versions: Since v3.6

NodePit ExclusiveOnly available on NodePit

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!