The “Text Classifier Model Pruner” allows to reduce the size of a text classification model by applying different pruning methods. On the one hand, low frequency terms which were encountered during training can be removed. Our experience shows, that setting this value e.g. to “2” roughly reduces the number of terms in the model to half the amount without significantly harming classification quality (your mileage may vary).
On the other hand, an information-gain-based pruning strategy is available, which scores the terms and their associated category probabilities. A good explanation of the information gain method can be found in “A Comparative Study on Feature Selection in Text Categorization”, Yiming Yang and Jan O. Pedersen, 1997.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.