StanfordNLP NE Learner

The StanfordNLP NE Learner creates a conditional random field model based on documents and a dictionary with entities that occur in the documents. The chosen tag and the used dictionary will be saved internally, so they can be used by the StanfordNLP NE tagger to tag new documents and validate the model. If you want to use the model externally, the model file can be found at your workflow directory:

/%KNIMEWORKSPACE%/%WORKFLOW%/StanfordNLP NE Learner(##)/port_1/object/portobject.zip

You can select the document column and the dictionary column to train your model with. It is possible to use multi-term entities within the dictionary. There is also a tab in the dialog to specify the learner properties. Currently, there are only a few options, since the number of parameters is pretty huge. So please contact us, if there are important/highly used parameters, we should integrate!

NOTE: If you are interested in the StanfordNLP toolkit, please visit http://nlp.stanford.edu/software/. Some of the following property descriptions are taken from the NERFeatureFactory class from StanfordNLP. Please look into it for further information.

Options

Learner options

Document column
The document column to train the model with.
String column
The string column containing the entities to train the model with.
Tag type
The tag type to train the model with. This information will be used, if you forward the model to the Stanford NLP NE tagger.
Tag value
The tag value to train the model with. This information will be used, if you forward the model to the Stanford NLP NE tagger.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Learner Properties

maxLeft
The maximum context of class features used.
Use Class Feature
Include a feature for the class (as a class marginal). Puts a prior on the classes which is equivalent to how often the feature appeared in the training data.
Use Word
Gives you feature for word.
Use NGrams
Make features from letter n-grams, i.e., substrings of the word.
No Mid NGrams
Do not include character n-gram features for n-grams that contain neither the beginning or end of the word.
Max NGram Length
If this number is positive, n-grams above this size will not be used in the model.
Use Prev
Enables previous features.
Use Next
Enables next features.
Use Disjunctive
Include in features giving disjunctions of words anywhere in the left or right disjunctionWidth words (preserving direction but not position).
Use Sequences
Does not use any class combination features if this is false.
Use Prev Sequences
Does not use any class combination features using previous classes if this is false.
Use Type Seqs
Use basic zeroeth order word shape features.
Use Type Seqs2
Add additional first and second order word shape features
Use Type YSeqs
Some first order word shape patterns.
Word Shape
Either "none" for no wordShape use, or the name of a word shape function
Case Sensitivity
Select to handle the words from the dictionary in a case sensitive manner.

Input Ports

Icon
The input table containing the documents to train the model with.
Icon
The input dictionary containing known single- and/or multi-term entities to train the model.

Output Ports

Icon
The StanfordNLP NE model.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.