This node induces a classification decision tree in main memory.
The target attribute must be nominal. The other attributes used for
decision making can be either nominal or numerical. Numeric splits
are always binary (two outcomes), dividing the domain in two partitions at a
given split point. Nominal splits can be either binary (two outcomes) or
they can have as many outcomes as nominal values. In the
case of a binary split the nominal values are divided into two subsets.
The algorithm provides two quality measures for split calculation;
the gini index and the gain ratio. Further, there exist a
post pruning method to reduce the tree size and increase prediction
accuracy. The pruning method is based on the minimum
description length principle.
The algorithm can be run in multiple threads, and thus, exploit multiple processors or cores.
Most of the techniques used in this decision tree implementation can be found in "C4.5 Programs for machine learning", by J.R. Quinlan and in "SPRINT: A Scalable Parallel Classifier for Data Mining", by J. Shafer, R. Agrawal, M. Mehta (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.152&rep=rep1&type=pdf)
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.