Web Log Reader

This Node Is Deprecated — This version of the node has been replaced with a new and improved version. The old version is kept for backwards-compatibility, but for all new workflows we suggest to use the version linked below.

This node reads Apache log files.

Source selection

You can select one or more sources to read from. A source can be one of the following:

Local file
e.g. /var/log/apache2/access.log
Local directory; all files inside the directory which match the pattern (see below) are read
e.g. /var/log/apache2
URL denoting a file; all supported protocols are possible, e.g http , ftp , or sftp (if you have installed the SSH extension)
e.g. sftp://apache:password@host/var/log/apache2/access.log
URL denoting a directory; this is only supported for ftp and sftp and the URL must end with ;type=d. Recursive reading of sub-directories is not supported. You must make sure that either directory does not contain any sub directories or that you exclude any subdirectories using the directory contents pattern. Otherwise you may get an error while reading.
e.g. sftp://user:password@host/var/log/apache2/;type=d

When reading all files in a directory, you can specify a regular expression (not a wildcard expression!) to which the files in the directory must match.

Input format

Now you have to select the format of the log files. First you need to specify which locale is used on the server. This is necessary for parsing dates since e.g. the month names are different in different locales. Since almost all log files are created with an english locale, the default is en_US. You only need to change this if your webserver uses a different locale when writing the log files. The next step is to specify the date format used in the log file. Again you only need to change this, if you are using a non-standard date format. The format specification is not identical to the one in the Apache configuration but instead uses the Java syntax. Take a look at the Javadoc for SimpleDateFormat for details. The last piece is the actual format specification of a complete log line. This format is identical to the one in the Apache configuration, i.e. you can simply copy it from there. The full syntax is given in the Apache documentation . The most commonly used fields are:

%b - the size of the response in bytes
%h - the clients IP address or name
%{foo}i - the value of the request header foo
%l - the remote logname, if ident is used
%r - the request itself
%s - the HTTP status code
%t - the request's timstamp
%u - the remote user, if authentication is used
%v - the virtual host this request was sent to
%0 - this special field can be used to process unknown fields

The input fields contains the two most commonly used format, common and combined. If you click on Analyze log the first line of the first file is read and analyzed with the given format. If the format matches, you will get a preview of the columns and types in the table below. The types and columns names are hard-coded and cannot be changed.

Filtering

In the second tab you can specify time ranges for requests you want to included in the output. All request outside the specified range are filtered out. The start date is inclusive whereas the end date is exclusive. For example, to filter all request from March 2013, you would specify 01.03.2013 00:00:00 as start date and 01.04.2013 00:00:00 as end date. If you are using flow variables to specify dates, you must use the date format as it is used in the log file and specified in the dialog.

Options

Input options

Source files: A list of input files, directories, or URLs. See explanation above for details
Directory contents pattern: Pattern that files inside a directory must match in order to get processes. Note that this is a regular expression and not a wildcard expression.
Locale: The locale used in the log files for dates
Date/time format: The format used for the request timestamps. For details see above.
Log format: The log file format specification. This is identical to the one used in the Apache configuration.
Split request field: If selected, the request field (%r) will be split into three columns, one for the HTTP method (equal to %m), one for the requested URI, and one for the protocol (equal to %H). You need to ensure, however that %m and %H are not part of the log format themselves.

Filter options

Start date: The start date of the requests which should be included in the output. The start date is inclusive.
End date: The end date of the requests which should be included in the output. The start date is exclusive.

Input Ports

This node has no input ports

Output Ports

: Data table read from the file
: Request line that could not be parsed

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Webanalytics from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191419

On NodePit since: 2025-07-02

Last update: 2025-08-09

Tags: StreamableDeprecated

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!