Apache Tika is a library that is mainly used to detect document types
and extract textual contents and metadata from various file formats.
Internally, Tika delegates all the parsing and detecting works to
various existing document parsers and document type detection
libraries. Tika provides a single generic API as a universal type
detector and content extractor for many file formats. For more
information about Tika, please
check the
Tika website
.
This node allows parsing of any kind of documents that are supported
by Tika. The type of the files can be selected in the configuration
dialog. Users have the choice between selecting the file extensions,
or the
MIME-types. What kind of information that are to be extracted
from the file
(metadata and content) can also be selected in the
dialog.
If possible, user can also extract files that are embedded in
the input
files, such as attachments in E-mails, etc, and store them
in a
specified directory.
Authentication setting is also provided to
parse any encrypted files.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.