0 ×

Text Classifier Model Pruner

Palladian for KNIME version 2.3.0.202009251618 by palladian.ws; Philipp Katz, Klemens Muthmann, David Urbansky

The “Text Classifier Model Pruner” allows to reduce the size of a text classification model by applying different pruning methods. On the one hand, low frequency terms which were encountered during training can be removed. Our experience shows, that setting this value e.g. to “2” roughly reduces the number of terms in the model to half the amount without significantly harming classification quality (your mileage may vary).

On the other hand, an information-gain-based pruning strategy is available, which scores the terms and their associated category probabilities. A good explanation of the information gain method can be found in “A Comparative Study on Feature Selection in Text Categorization”, Yiming Yang and Jan O. Pedersen, 1997.

Options

Minimum term count
The minimum count (i.e. the number of training documents in which it has to appear) to keep a term, set to one to keep all terms.
Minimum information gain
The minimum information gain to keep a term. Set to zero to keep all terms.

Input Ports

Icon
The model data of the trained classifier.

Output Ports

Icon
The pruned model, where terms not satisfying the given properties have been removed.

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install Palladian for KNIME from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.