HTTP Retriever

Go to Product

This node allows to perform different HTTP methods: GET, POST, HEAD, PUT, DELETE, PATCH. The node allows to transfer content, which must be given as binary or string data. It handles cookies and allows to specify arbitrary HTTP headers.

Results of the “HTTP Retriever” node are provided as “HTTP Result” cell type. The HTTP Result type bundles the actual binary content of the result, status code, and all HTTP response headers. In case you want to extract header information from an HTTP Result, use the “HTTP Result Data Extractor” node.

Form-encoded requests

In case you want to send content with your HTTP requests (typically for POST and PUT), you can select a Binary cell as input in the HTTP Retriever node’s configuration. For building a form-encoded request, use the “Form Encoded HTTP Entity Creator” node, where you can transform string columns to encoded key-value data. Do not forget to specify HTTP entity content type in HTTP Retriever’s configuration afterwards.

Multipart-encoded requests

Multipart-encoded requests can be created using the “Multipart Encoded HTTP Entity Creator” node. It requires one or more binary input columns and creates (1) a combined multipart-encoded column, (2) column with the content type header, including the delimiter. Connect an HTTP Retriever node and select appended binary column as HTTP entity input, and use a flow variable to set the proper HTTP entity content type.

Cookies

Cookies which are created during the node’s execution are output to the HTTP Retriever’s second output port. In case you want to send cookies with a request, use the second (optional) input port of the HTTP Retriever node. When performing sequential requests with multiple HTTP Retriever nodes, you can simply chain the cookie in- and out-ports to hand the cookies through the workflow.

“Infotising”: This node is only intended for “static” HTML structures. If you need to work with interactive web pages and web apps which are dynamically generated client-side in the browser, have a look at our plugin “Selenium Nodes”.

Options

General

URL input
The column in the input table which contains the URLs to retrieve.
HTTP method input
The (optional) column in the input table which contains the HTTP methods to execute. In case no column is selected, a GET will be assumed for all URLs. Supported methods are: GET, POST, HEAD, PUT, DELETE, PATCH.
HTTP entity input
The (optional) column in the input table which contains the HTTP entity to send. The data must be supplied as binary object or string cell.
HTTP entity content type
The content type of the HTTP entity.
Maximum file size
The limit for file sizes in kilobytes when downloading. After the limit is reached, the download is cancelled.

Headers

Header columns
Select string columns in the input table which will be sent as HTTP headers.

Advanced

# retries after error
The number of retries to perform for each URL in case the retrieval fails. A value of zero means no retrying at all.
Connection timeout (ms)
The timeout in milliseconds to wait for a connection to be established.
Socket timeout
The timeout in milliseconds to wait for new packets to arrive.
User agent
The HTTP user agent string.
Accept all certificates
If enabled, all SSL certificates will be accepted. Caution: Only enable this option if you really know what you are doing!
Fail on non-success HTTP status code
Fail execution, in case an HTTP status of >= 400 is returned. (this is a logical error, so no retries are executed).
Fail on network error
Fail execution in case a network error occurs (and the specified # of retries have been reached)

Proxy

Use custom proxy server
Enable this to override the default proxy server (as configured in KNIME’s preferences → General → Network Connections). This is useful, if you want to use a proxy server only for specific requests.
Host
Hostname or IP address of the proxy
Port
The port of the proxy
Username (*)
Username for proxy authentication
Password (*)
Password for proxy authentication

Input Ports

Icon
Table with HTTP URLs to be retrieved. Optionally a column with a HTTP method to perform for each URL.
Icon
Table with cookies to use during execution.

Output Ports

Icon
Table with downloaded HTTP Results.
Icon
Table with cookies which were set during execution, in case a cookie table was provided as input, new cookies will be appended, existing cookies replaced.

Popular Successors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.