This node splits large texts into smaller overlapping chunks.
Text chunking is a technique for splitting larger documents into smaller paragraphs. The chunks overlap to contain a piece of the context. Chunk size and overlap can be configured.
For generic texts, the node will try to keep semantic relations by prioritizing to place sentences within a paragraph in the same chunk. If a specific programming or formatting language is specified, the node considers language-specific syntax when splitting the document.
Select the column containing the documents to be chunked.
Specify the maximum chunk size.
Specify by how many characters the chunks should overlap.
Select whether the document will be split based on separators for generic text or code/markup.
Available options:
Select the language that will be considered when splitting the text.
Available options:
Select whether the chunks should replace the original column or be appended to the table in a new column.
Available options:
Provide the name of the new column containing the chunks.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Python Extension Development (Labs) from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.