Strings To Document

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

Converts the specified strings to documents. For each row a document will be created and attached to that row. The strings of the specified columns will be used as title, authors, and full text. Furthermore the defined category, source, type, and date will be set.

Options

Use title from column
If checked, the string values of the specified column will be used as title.
Title
The column containing the string which is used as title. (if "Use title from column" is checked, otherwise a default title will be generated).
Use author(s) from column
If checked, the string values of the specified column will be used as author(s).
Authors
The column containing the string which is split up and used as author names. (if "Use authors from column" is checked, otherwise the values of the labels "Default author first name" and "Default author last name" will be set as author names).
Author name separator
The string separating the author names.
Default author first name
The default author first name if author first name is missing.
Default author last name
The default author last name if author last name is missing.
Full text
The column containing the string which is used as text.
Document source
The source which is set to all documents (if "Use sources from column" is not checked).
Use sources from column
If checked, the string values of the specified column will be used as document sources.
Document source column
The column containing the string used as source. No source is set for missing values.
Document category
The category which is set to all documents (if "Use categories from column" is not checked).
Use categories from column
If checked, the string values of the specified column will be used as document categories.
Document category column
The column containing the string used as category. No category is set for missing values.
Document type
The type which is set to all documents.
Publication date
The publication date as which is set to all documents (if "Use publication date from column" is not checked). The date has to be formatted like "dd-mm-yyy". Two digits to specify the day, two to specify the month and four to specify the year. The specified date has to be a valid date.
Use publication date from column
If checked, the string value of the specified column will be used as document publication date. This node also allows Date columns as an input columns.
Publication date column
The column containing the string which is used as publication date. (if "Use publication date from column" is checked, otherwise the current date from "Publication date" field is set as date). The date has to be formatted like "dd-mm-yyy".
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

Icon
An input data table containing string cells.ht

Output Ports

Icon
An output table containing the strings of the data of the input table as well as the created documents in an additional column.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.