StanfordNLP Open Information Extractor

Extracts relation triplets contained in sentences of a document. While the StanfordNLP Relation Extractor node extracts pre-defined types of relations between two named-entities, this node extracts entailed clauses which then are reduced to their main statement and split into subject, predicate and object.

The node can be used in two different ways by either checking the Apply preprocessing option or not. If the option is selected, the node takes care of part-of-speech (POS) and named-entity (NE) tagging as well as lemmatizing. Stanford CoreNLP standard settings are used in this case. However, tags are not applied to the documents, since the preprocessing is only applied internally. If the option is unchecked, it is necessary to provide a column with (at least POS) tagged documents as well as a column containing lemmatized documents. Lemmatized documents consist of terms that were converted to their canonical, dictionary or citation form.
Note: Creating the same pipeline by using KNIME's Stanford nodes with default settings will not necessarily lead to the same results as using the Apply preprocessing option, since KNIME is using the Penn-Treebank (PTB) tag set. This tag set uses the SYM tag for any kind of punctuation and quotation marks. However, Stanford CoreNLP uses a modified version of the PTB tag set to distinguish these symbols, since they are important for dependency parsing and natural logic annotation.

The node creates four new columns: a subject column, an object column, a predicate column and a column containing the confidence for the relation between the subject and the object. It is possbile that the node cannot extract a relation from a document, because it only extracts positive sequences. For example, for the sentence "No house cats have rabies. no relation is extracted. A detailed explanation of how clauses are extracted and shortened and how the confidence is calculated can be found in this paper from the StanfordNLP group.

Note: Open Information Extraction is a computationally expensive operation. For the usage of this node it is recommended to run KNIME with at least 4GB of heap space. To increase the heap space, change the -Xmx setting in the knime.ini file.

This node is based on Stanford CoreNLP 3.9.1.
For more information about StanfordNLP and Open Information Extraction, click here.

Options

Document column
The document column to use.
Note: If the Apply preprocessing option is unchecked, the documents have to be tagged by a part-of-speech tagger. Named-entity tagging is recommended as well, but optional.
Lemmatized document column
The document column containing the lemmatized documents.
Note: If the Apply preprocessing option is checked, this option is not necessary.
Apply preprocessing
If checked, part-of-speech tagging, named-entity tagging and lemmatizing will be done by this node. These tasks are applied internally and do not affect the documents in the document column. Extracting tags and lemmas produced by checking this option is not possible.
Number of threads
The number of threads to use.
Results as lemma
If checked, results will be returned as lemma.
Resolve co-references
If checked, co-reference solution will be applied. Pronomial mentions will be replaced with their canonical mention in the text.
Affinity probability cap
The affinity value above which confidence of the extraction is regarded as 1.0.
Strict triple extraction
If checked, extract triples only if they consume the entire fragment. This is useful for ensuring that only logically warranted triples are extracted, but puts more burden on the entailment system to find minimal phrases.
Always extract nominal relations
If checked, extract nominal relations always and not only when a named entity tag warrants it. This greatly overproduces such triples, but can be useful in certain situations.

Input Ports

Icon
The input table which contains the documents and lemmatized documents (if needed).

Output Ports

Icon
The output table which contains data from the input table, extracted relations and a relation confidence.

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.