Strings to Document

Converts the specified strings to documents. For each row a document will be created and attached to that row. The strings of the specified columns will be used as title, authors, and full text. Furthermore the defined category, source, type, and date will be set.

Options

Title
The selection to define whether to set the content of a column, the row ids or empty strings as title.
Title column
The column containing the string which is used as title. (if "Use title from column" is checked, otherwise a default title will be generated).
Full text
The column containing the string which is used as text.
Document source
The source which is set to all documents (if "Use sources from column" is not checked).
Use sources from column
If checked, the string values of the specified column will be used as document sources.
Document source column
The column containing the string used as source. No source is set for missing values.
Document category
The category which is set to all documents (if "Use categories from column" is not checked).
Use categories from column
If checked, the string values of the specified column will be used as document categories.
Document category column
The column containing the string used as category. No category is set for missing values.
Use author(s) from column
If checked, the string values of the specified column will be used as author(s).
Authors column
The column containing the author's names as a string which is split by the separation string. The string contained in the specified columns should follow this pattern: FirstName LastName, FirstName LastName, ... (comma as separation string) Second names will be appended to the first name.
Author name separator
The string separating the author names contained in the authors column.
Default author first name
The default author first name if "use author(s) from column" is unchecked.
Default author last name
The default author last name if "use author(s) from column" is unchecked.
Document type
The type which is set to all documents.
Date
The publication date as which is set to all documents (if "Use publication date from column" is not checked).
Use publication date from column
If checked, the date value of the specified column will be used as document publication date.
Publication date column
The column containing the date which is used as publication date. If "Use publication date from column" is checked, otherwise the current date from "Publication date" field is set as date. Note: The column type must be Date only. To convert the date & time type to date only, consider using "Modify time" node to remove time. If the column has the legacy date & time type, consider using the node "Legacy Date&Time To Date&Time" for conversion.
Document column
Specify the name of the document column to be created.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

Icon
An input data table containing string cells.ht

Output Ports

Icon
An output table containing the strings of the data of the input table as well as the created documents in an additional column.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.