Content Retriever (Labs)

Allows to retrieve the HTML from the browser session for further processing in KNIME.

Options

Retrieve tag type

Select the type of element that will be extracted from the page.

Available options:

  • Page: Extract the whole page as XML.
  • Link: Extract all anchor (<a>) elements and their href attribute from the page.
  • Paragraph: Extract all paragraph (<p>) elements and their inner text from the page.
  • Button: Extract all button (<button>) elements and their text from the page.
  • Image: Extract all image (<image>) elements and their alt-text from the page.
  • Heading: Extract all heading (<h1-6>) elements and their text from the page
  • Table: Extract all table (<table>) elements from the page
  • Unordered list: Extract all unordered list (<ul>) elements from the page
  • Ordered list: Extract all ordered list (<ul>) elements from the page
  • Page Title: Refreshes the current session.

HTML output

Retrieval delay (seconds)

Specifies the delay time until retrieving the HTML. This will prolong the execution for that exact amount of time.

Input Ports

Icon

Information about the current session state

Output Ports

Icon

Outputs the current browser content in a HTML column

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.