JRip (3.7)

This class implements a propositional rule learner*, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W

Cohen as an optimized version of IREP.

The algorithm is briefly described as follows:

Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e.

100% accurate).The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:

Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage:after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2.

But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule.Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed.

The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.

3.Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it.

and add resultant ruleset to RS.

ENDDO

Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly.This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation.

Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

Details please see:

William W.Cohen: Fast Effective Rule Induction.

In: Twelfth International Conference on Machine Learning, 115-123, 1995.

PS.We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets.

In all these aspects it seems to be quite comparable to the original ripper implementation.However, we didn't consider memory consumption optimization in this implementation.

(based on WEKA 3.7)

For further options, click the 'More' - button in the dialog.

All weka dialogs have a panel where you can specify classifier-specific parameters.

(*) RULE LEARNER is a registered trademark of Minitab, LLC and is used with Minitab's permission.

Options

JRip Options

F: Set number of folds for REP One fold is used as pruning set. (default 3)

N: Set the minimal weights of instances within a split. (default 2.0)

O: Set the number of runs of optimizations. (Default: 2)

D: Set whether turn on the debug mode (Default: false)

S: The seed of randomization (Default: 1)

E: Whether NOT check the error rate>=0.5 in stopping criteria (default: check)

P: Whether NOT use pruning (default: use pruning)

Select target column
Choose the column that contains the target variable.
Preliminary Attribute Check

The Preliminary Attribute Check tests the underlying classifier against the DataTable specification at the inport of the node. Columns that are compatible with the classifier are marked with a green 'ok'. Columns which are potentially not compatible are assigned a red error message.

Important: If a column is marked as 'incompatible', it does not necessarily mean that the classifier cannot be executed! Sometimes, the error message 'Cannot handle String class' simply means that no nominal values are available (yet). This may change during execution of the predecessor nodes.

Capabilities: [Nominal attributes, Binary attributes, Unary attributes, Empty nominal attributes, Numeric attributes, Date attributes, Missing values, Nominal class, Binary class, Missing class values] Dependencies: [] min # Instance: 3

Command line options

It shows the command line options according to the current classifier configuration and mainly serves to support the node's configuration via flow variables.

Additional Options

Select optional vector column
If the input table contains vector columns (e.g. double vector), the one to use can be selected here. This vector column will be used as attributes only and all other columns, except the target column, will be ignored.
Keep training instances
If checked, all training instances will be kept and stored with the classifier model. It is useful to calculate additional evaluation measures (see Weka Predictor) that make use of class prior probabilities. If no evaluation is performed or those measures are not required, it is advisable to NOT keep the training instances.

Input Ports

Icon
Training data

Output Ports

Icon
Trained model

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

Weka Node View
Each Weka node provides a summary view that provides information about the classification. If the test data contains a class column, an evaluation is generated.

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.