Icon

kn_​example_​rule_​induction_​weka_​hotspot_​and_​yacaree_​rules

Rule Induction with Weka Rule Nodes and Yacaree Associator

Rule Induction with Weka Rule Nodes and Yacaree Associator

Weka Hot Spot Rules, right click on magnifying glass "View: Weka Node View"
The use of the Rule finders requires some configuration and some reading about the settings and some experimentation

https://forum.knime.com/t/need-idea-in-cluster-analysis-problem/15540/12?u=mlauber71

————
I used an older example I had from a Kaggle DS. It is just for illustration the values do not make much sense with regards to a sequence.

These basic differences

Weka HotSpot can deal with strings and numbers and would accept a Target (in your case if you want to differentiate between Errors and Non-Errors). This might potentially handle your duration values.

Tertius would work with strings and you could or could not set a class (Target).

GeneralizedSequentialPatterns allows to specify a sequencing ID, you might be able to use your data structure with the event_id

PredictiveApriori and FilteredAssociator are additional methods; please read about their capabilities I am not an expert in that regard.

Yacaree is special in two regards: it does not use the variables with the Var-Name and then the value but just the sequence of values that have to stand for themselves, and it considerers sequences before and after - from a few experiments it might be that it is influenced by the different number of events that might lead to an Error; could be it works best with a fixed set of sequences

All this nodes have quite some possibilities to configure them; typically some threshold for confidence (reliability of the rule), some minimum coverage (a rule only applying to a small set might be skipped). Please read about the implications and bring them together with your data. Toy around with them and gain experience.

Please also note. These nodes might need quite some calculation power especially if you have large data sets.

THX to D. Gutmann for useful hints





Rule Induction with Weka Rule Nodes and Yacaree AssociatorWeka Hot Spot Rules, right click on magnifying glass "View: Weka Node View"The use of the Rule finders requires some configuration and some reading about the settings and some experimentationhttps://forum.knime.com/t/need-idea-in-cluster-analysis-problem/15540/12?u=mlauber71 Hot Spot========Total population: 80024 instancesTarget attribute: TargetTarget value: 1 [value count in total population: 60987 instances (76.21%)]Minimum value count for segments: 2001 instances (2.5% of total population)Maximum branching factor: 4Maximum rule length: unboundedMinimum improvement in target: 1%[v31=B, v50 > 3.8148]: 2473 ==> [Target=1]: 2421 <conf:(0.98)> lift:(1.28) lev:(0.01) conv:(11.1) [v31=B, v66=C, v50 > 0.9743]: 2094 ==> [Target=1]: 2024 <conf:(0.97)> lift:(1.27) lev:(0.01) conv:(7.02) [v31=B, v24=C, v50 > 1.9309]: 2165 ==> [Target=1]: 2092 <conf:(0.97)> lift:(1.27) lev:(0.01) conv:(6.96) [v31=B, v24=C, v12 <= 8.6066]: 2132 ==> [Target=1]: 2045 <conf:(0.96)> lift:(1.26) lev:(0.01) conv:(5.76) [v50 > 4.6086]: 2129 ==> [Target=1]: 2039 <conf:(0.96)> lift:(1.26) lev:(0.01) conv:(5.57) [v31=B, v66=C]: 2373 ==> [Target=1]: 2268 <conf:(0.96)> lift:(1.25) lev:(0.01) conv:(5.33) [v38 > 0, v50 > 1.0505]: 2173 ==> [Target=1]: 2067 <conf:(0.95)> lift:(1.25) lev:(0.01) conv:(4.83) [v31=B, v24=C]: 3147 ==> [Target=1]: 2983 <conf:(0.95)> lift:(1.24) lev:(0.01) conv:(4.54) [v31=B, v12 <= 6.7088]: 2187 ==> [Target=1]: 2052 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.83) [v38 > 0, v14 > 12.3789]: 2145 ==> [Target=1]: 2009 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.72) [v38 > 0, v10 > 1.291]: 2142 ==> [Target=1]: 2006 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.72) [v31=C, v50 > 0.5649]: 2150 ==> [Target=1]: 2008 <conf:(0.93)> lift:(1.23) lev:(0) conv:(3.58) [v38 > 0, v21 > 7.3516]: 2172 ==> [Target=1]: 2019 <conf:(0.93)> lift:(1.22) lev:(0) conv:(3.36) [v31=C, v14 > 11.4385]: 2174 ==> [Target=1]: 2013 <conf:(0.93)> lift:(1.21) lev:(0) conv:(3.19) [v38 > 0]: 3200 ==> [Target=1]: 2943 <conf:(0.92)> lift:(1.21) lev:(0.01) conv:(2.95) [v31=C]: 2480 ==> [Target=1]: 2272 <conf:(0.92)> lift:(1.2) lev:(0) conv:(2.82) [v31=B]: 13362 ==> [Target=1]: 12207 <conf:(0.91)> lift:(1.2) lev:(0.03) conv:(2.75) Associator ModelApriori=======Minimum support: 0.65 (2601 instances)Minimum metric <confidence>: 0.9Number of cycles performed: 7Best rules found: 1. v31=A 3126 ==> v3=C 3126 <conf:(1)> lift:(1.03) lev:(0.02) [91] conv:(91.41) 2. v31=A v74=B 3112 ==> v3=C 3112 <conf:(1)> lift:(1.03) lev:(0.02) [91] conv:(91) 3. v75=D 2650 ==> v71=F 2650 <conf:(1)> lift:(1.51) lev:(0.22) [894] conv:(894.81) 4. v71=F 2650 ==> v75=D 2650 <conf:(1)> lift:(1.51) lev:(0.22) [894] conv:(894.81) 5. v74=B v75=D 2633 ==> v71=F 2633 <conf:(1)> lift:(1.51) lev:(0.22) [889] conv:(889.07) 6. v71=F v74=B 2633 ==> v75=D 2633 <conf:(1)> lift:(1.51) lev:(0.22) [889] conv:(889.07) 7. v31=A 3126 ==> v74=B 3112 <conf:(1)> lift:(1) lev:(0) [2] conv:(1.09) 8. v3=C v31=A 3126 ==> v74=B 3112 <conf:(1)> lift:(1) lev:(0) [2] conv:(1.09) 9. v31=A 3126 ==> v3=C v74=B 3112 <conf:(1)> lift:(1.03) lev:(0.02) [93] conv:(7.19)10. v3=C 3884 ==> v74=B 3863 <conf:(0.99)> lift:(1) lev:(0) [0] conv:(0.93) GeneralizedSequentialPatterns=============================Number of cycles performed: 8Total number of frequent sequences: 319Frequent Sequences Details (filtered):- 1-sequences[1] <{C}> (2)[2] <{A}> (2)[3] <{F}> (2)[4] <{F}> (2)[5] <{B}> (2)[6] <{D}> (2)[7] <{C}> (2)[8] <{A}> (2)[9] <{X}> (2)- 2-sequences[1] <{C,A}> (2)[2] <{C,F}> (2)[3] <{A,F}> (2)[4] <{C,F}> (2)[5] <{A,F}> (2)[6] <{F,F}> (2)[7] <{C,B}> (2)[8] <{A,B}> (2)[9] <{F,B}> (2)[10] <{F,B}> (2)[11] <{C,D}> (2)[12] <{A,D}> (2)[13] <{F,D}> (2)[14] <{F,D}> (2)[15] <{B,D}> (2)[16] <{C,C}> (2)[17] <{A,C}> (2)[18] <{F,C}> (2)[19] <{F,C}> (2)[20] <{B,C}> (2)[21] <{D,C}> (2)[22] <{C,A}> (2)[23] <{A,A}> (2)[24] <{F,A}> (2)[25] <{F,A}> (2)[26] <{B,A}> (2)[27] <{D,A}> (2)[28] <{C,A}> (2)[29] <{C,X}> (2)[30] <{A,X}> (2)[31] <{F,X}> (2)[32] <{B,X}> (2)[33] <{D,X}> (2)[34] <{A,X}> (2) Tertius======= 1. /* 0.199648 0.561110 */ v31 = A and v74 = B ==> Target = 0 2. /* 0.197489 0.564359 */ v31 = A ==> Target = 0 3. /* 0.192488 0.017746 */ v3 = C and v79 = B ==> Target = 1 4. /* 0.190781 0.018495 */ v79 = B ==> Target = 1 5. /* 0.181542 0.008998 */ v3 = C and v31 = B ==> Target = 1 6. /* 0.181008 0.009498 */ v31 = B ==> Target = 1 7. /* 0.179278 0.015246 */ v3 = C and v79 = B and v113 = _NA_ ==> Target = 1 8. /* 0.178699 0.072232 */ v110 = B ==> Target = 1 9. /* 0.177191 0.015996 */ v79 = B and v113 = _NA_ ==> Target = 110. /* 0.176697 0.336166 */ v74 = B and v110 = A ==> Target = 0Number of hypotheses considered: 86772Number of hypotheses explored: 5447 PredictiveApriori===================Best rules found: 1. v24=C v66=A v110=B 269 ==> v113=_NA_ 269 acc:(0.99494) 2. v24=C v79=E 262 ==> v113=_NA_ 262 acc:(0.99493) 3. v24=C v31=A v110=B 251 ==> v113=_NA_ 251 acc:(0.99493) 4. v24=D v79=E 285 ==> v113=_NA_ 284 acc:(0.99484) 5. v24=C v75=B v110=B 175 ==> v113=_NA_ 175 acc:(0.9948) Yacaree=============================C T_0 DI -> A D (supp: 27, conf: 0.692, cboost: 2.072)B C E F -> DI T_1 (supp: 20, conf: 0.714, cboost: 1.887)B C E J -> DI T_1 (supp: 18, conf: 0.667, cboost: 1.761)A R C -> E T_1 (supp: 20, conf: 0.800, cboost: 1.667)CF C -> A E T_1 (supp: 8, conf: 0.800, cboost: 1.600)A C G CN -> T_0 (supp: 13, conf: 0.722, cboost: 1.599)C DI -> B T_1 (supp: 259, conf: 0.742, cboost: 1.599)AW -> A C E T_1 (supp: 158, conf: 0.709, cboost: 1.580)[..] JRIP rules:===========(v50 <= 0.554617) and (v110 = A) and (v114 <= 15.860638) => Target=0 (364.0/132.0)(v50 <= 1.223682) and (v129 <= 0) and (v50 <= 0.523806) and (v79 = E) and (v82 >= 3.62763) => Target=0 (63.0/24.0) => Target=1 (3574.0/663.0)Number of Rules : 3 Basic Illustration of Borgelt by AlexanderFillbrunnhttps://forum.knime.com/t/association-rule-for-a-b-but-not-the-other-way-around/22740/2?u=mlauber71 list of transactionsconvert transactionsinto list1 = [A,B,C]with explicit Targetand numeric andstring vars=> decide if you want tomax or minimize your targetString without explicit Targetstring onlyString without explicit Targetset no of rules you wantString attributeswithout explicit TargetString with explicit Targetset which index the target isfilter columnsfilter outTargetwith explicit Targetand numeric andstring vars=> possible to save as Weka Modeldata_reduced_70.tablejrip_rules.zip Association RuleLearner (Borgelt) Table Creator GroupBy HotSpot (3.7) GeneralizedSequentialPatterns(3.7) Column Filter PredictiveApriori(3.7) FilteredAssociator(3.7) Tertius (3.7) Column Filter Column Filter Yacaree JRip (3.7) Table Reader Weka ClassifierWriter (3.7) Rule Induction with Weka Rule Nodes and Yacaree AssociatorWeka Hot Spot Rules, right click on magnifying glass "View: Weka Node View"The use of the Rule finders requires some configuration and some reading about the settings and some experimentationhttps://forum.knime.com/t/need-idea-in-cluster-analysis-problem/15540/12?u=mlauber71 Hot Spot========Total population: 80024 instancesTarget attribute: TargetTarget value: 1 [value count in total population: 60987 instances (76.21%)]Minimum value count for segments: 2001 instances (2.5% of total population)Maximum branching factor: 4Maximum rule length: unboundedMinimum improvement in target: 1%[v31=B, v50 > 3.8148]: 2473 ==> [Target=1]: 2421 <conf:(0.98)> lift:(1.28) lev:(0.01) conv:(11.1) [v31=B, v66=C, v50 > 0.9743]: 2094 ==> [Target=1]: 2024 <conf:(0.97)> lift:(1.27) lev:(0.01) conv:(7.02) [v31=B, v24=C, v50 > 1.9309]: 2165 ==> [Target=1]: 2092 <conf:(0.97)> lift:(1.27) lev:(0.01) conv:(6.96) [v31=B, v24=C, v12 <= 8.6066]: 2132 ==> [Target=1]: 2045 <conf:(0.96)> lift:(1.26) lev:(0.01) conv:(5.76) [v50 > 4.6086]: 2129 ==> [Target=1]: 2039 <conf:(0.96)> lift:(1.26) lev:(0.01) conv:(5.57) [v31=B, v66=C]: 2373 ==> [Target=1]: 2268 <conf:(0.96)> lift:(1.25) lev:(0.01) conv:(5.33) [v38 > 0, v50 > 1.0505]: 2173 ==> [Target=1]: 2067 <conf:(0.95)> lift:(1.25) lev:(0.01) conv:(4.83) [v31=B, v24=C]: 3147 ==> [Target=1]: 2983 <conf:(0.95)> lift:(1.24) lev:(0.01) conv:(4.54) [v31=B, v12 <= 6.7088]: 2187 ==> [Target=1]: 2052 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.83) [v38 > 0, v14 > 12.3789]: 2145 ==> [Target=1]: 2009 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.72) [v38 > 0, v10 > 1.291]: 2142 ==> [Target=1]: 2006 <conf:(0.94)> lift:(1.23) lev:(0) conv:(3.72) [v31=C, v50 > 0.5649]: 2150 ==> [Target=1]: 2008 <conf:(0.93)> lift:(1.23) lev:(0) conv:(3.58) [v38 > 0, v21 > 7.3516]: 2172 ==> [Target=1]: 2019 <conf:(0.93)> lift:(1.22) lev:(0) conv:(3.36) [v31=C, v14 > 11.4385]: 2174 ==> [Target=1]: 2013 <conf:(0.93)> lift:(1.21) lev:(0) conv:(3.19) [v38 > 0]: 3200 ==> [Target=1]: 2943 <conf:(0.92)> lift:(1.21) lev:(0.01) conv:(2.95) [v31=C]: 2480 ==> [Target=1]: 2272 <conf:(0.92)> lift:(1.2) lev:(0) conv:(2.82) [v31=B]: 13362 ==> [Target=1]: 12207 <conf:(0.91)> lift:(1.2) lev:(0.03) conv:(2.75) Associator ModelApriori=======Minimum support: 0.65 (2601 instances)Minimum metric <confidence>: 0.9Number of cycles performed: 7Best rules found: 1. v31=A 3126 ==> v3=C 3126 <conf:(1)> lift:(1.03) lev:(0.02) [91] conv:(91.41) 2. v31=A v74=B 3112 ==> v3=C 3112 <conf:(1)> lift:(1.03) lev:(0.02) [91] conv:(91) 3. v75=D 2650 ==> v71=F 2650 <conf:(1)> lift:(1.51) lev:(0.22) [894] conv:(894.81) 4. v71=F 2650 ==> v75=D 2650 <conf:(1)> lift:(1.51) lev:(0.22) [894] conv:(894.81) 5. v74=B v75=D 2633 ==> v71=F 2633 <conf:(1)> lift:(1.51) lev:(0.22) [889] conv:(889.07) 6. v71=F v74=B 2633 ==> v75=D 2633 <conf:(1)> lift:(1.51) lev:(0.22) [889] conv:(889.07) 7. v31=A 3126 ==> v74=B 3112 <conf:(1)> lift:(1) lev:(0) [2] conv:(1.09) 8. v3=C v31=A 3126 ==> v74=B 3112 <conf:(1)> lift:(1) lev:(0) [2] conv:(1.09) 9. v31=A 3126 ==> v3=C v74=B 3112 <conf:(1)> lift:(1.03) lev:(0.02) [93] conv:(7.19)10. v3=C 3884 ==> v74=B 3863 <conf:(0.99)> lift:(1) lev:(0) [0] conv:(0.93) GeneralizedSequentialPatterns=============================Number of cycles performed: 8Total number of frequent sequences: 319Frequent Sequences Details (filtered):- 1-sequences[1] <{C}> (2)[2] <{A}> (2)[3] <{F}> (2)[4] <{F}> (2)[5] <{B}> (2)[6] <{D}> (2)[7] <{C}> (2)[8] <{A}> (2)[9] <{X}> (2)- 2-sequences[1] <{C,A}> (2)[2] <{C,F}> (2)[3] <{A,F}> (2)[4] <{C,F}> (2)[5] <{A,F}> (2)[6] <{F,F}> (2)[7] <{C,B}> (2)[8] <{A,B}> (2)[9] <{F,B}> (2)[10] <{F,B}> (2)[11] <{C,D}> (2)[12] <{A,D}> (2)[13] <{F,D}> (2)[14] <{F,D}> (2)[15] <{B,D}> (2)[16] <{C,C}> (2)[17] <{A,C}> (2)[18] <{F,C}> (2)[19] <{F,C}> (2)[20] <{B,C}> (2)[21] <{D,C}> (2)[22] <{C,A}> (2)[23] <{A,A}> (2)[24] <{F,A}> (2)[25] <{F,A}> (2)[26] <{B,A}> (2)[27] <{D,A}> (2)[28] <{C,A}> (2)[29] <{C,X}> (2)[30] <{A,X}> (2)[31] <{F,X}> (2)[32] <{B,X}> (2)[33] <{D,X}> (2)[34] <{A,X}> (2) Tertius======= 1. /* 0.199648 0.561110 */ v31 = A and v74 = B ==> Target = 0 2. /* 0.197489 0.564359 */ v31 = A ==> Target = 0 3. /* 0.192488 0.017746 */ v3 = C and v79 = B ==> Target = 1 4. /* 0.190781 0.018495 */ v79 = B ==> Target = 1 5. /* 0.181542 0.008998 */ v3 = C and v31 = B ==> Target = 1 6. /* 0.181008 0.009498 */ v31 = B ==> Target = 1 7. /* 0.179278 0.015246 */ v3 = C and v79 = B and v113 = _NA_ ==> Target = 1 8. /* 0.178699 0.072232 */ v110 = B ==> Target = 1 9. /* 0.177191 0.015996 */ v79 = B and v113 = _NA_ ==> Target = 110. /* 0.176697 0.336166 */ v74 = B and v110 = A ==> Target = 0Number of hypotheses considered: 86772Number of hypotheses explored: 5447 PredictiveApriori===================Best rules found: 1. v24=C v66=A v110=B 269 ==> v113=_NA_ 269 acc:(0.99494) 2. v24=C v79=E 262 ==> v113=_NA_ 262 acc:(0.99493) 3. v24=C v31=A v110=B 251 ==> v113=_NA_ 251 acc:(0.99493) 4. v24=D v79=E 285 ==> v113=_NA_ 284 acc:(0.99484) 5. v24=C v75=B v110=B 175 ==> v113=_NA_ 175 acc:(0.9948) Yacaree=============================C T_0 DI -> A D (supp: 27, conf: 0.692, cboost: 2.072)B C E F -> DI T_1 (supp: 20, conf: 0.714, cboost: 1.887)B C E J -> DI T_1 (supp: 18, conf: 0.667, cboost: 1.761)A R C -> E T_1 (supp: 20, conf: 0.800, cboost: 1.667)CF C -> A E T_1 (supp: 8, conf: 0.800, cboost: 1.600)A C G CN -> T_0 (supp: 13, conf: 0.722, cboost: 1.599)C DI -> B T_1 (supp: 259, conf: 0.742, cboost: 1.599)AW -> A C E T_1 (supp: 158, conf: 0.709, cboost: 1.580)[..] JRIP rules:===========(v50 <= 0.554617) and (v110 = A) and (v114 <= 15.860638) => Target=0 (364.0/132.0)(v50 <= 1.223682) and (v129 <= 0) and (v50 <= 0.523806) and (v79 = E) and (v82 >= 3.62763) => Target=0 (63.0/24.0) => Target=1 (3574.0/663.0)Number of Rules : 3 Basic Illustration of Borgelt by AlexanderFillbrunnhttps://forum.knime.com/t/association-rule-for-a-b-but-not-the-other-way-around/22740/2?u=mlauber71 list of transactionsconvert transactionsinto list1 = [A,B,C]with explicit Targetand numeric andstring vars=> decide if you want tomax or minimize your targetString without explicit Targetstring onlyString without explicit Targetset no of rules you wantString attributeswithout explicit TargetString with explicit Targetset which index the target isfilter columnsfilter outTargetwith explicit Targetand numeric andstring vars=> possible to save as Weka Modeldata_reduced_70.tablejrip_rules.zipAssociation RuleLearner (Borgelt) Table Creator GroupBy HotSpot (3.7) GeneralizedSequentialPatterns(3.7) Column Filter PredictiveApriori(3.7) FilteredAssociator(3.7) Tertius (3.7) Column Filter Column Filter Yacaree JRip (3.7) Table Reader Weka ClassifierWriter (3.7)

Nodes

Extensions

Links