Content uploaded by Nadeem Javaid
Author content
All content in this area was uploaded by Nadeem Javaid on Feb 22, 2022
Content may be subject to copyright.
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Rule-based Classifier
Edited by : Arooba Saeed
(SP20-RSE-006)
Comsens lab, Department of Computer Science
Under the Supervision: Dr. Nadeem Javaid
Professor, Department of Computer Science
COMSATS University Islamabad, Islamabad Pakistan
1
Rule-based Classifier
2
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Classify records by using a collection of “if…then…” rules
Rule: (Condition) y , where
Condition is a conjunctions of attributes
y is the class label
LHS: rule antecedent or pre-condition
RHS: rule consequent
Rules : R=(R1R2 … Rk)
Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
https://slideplayer.com/slide/3333694/
Example of Rule-based Classifier
3
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) "Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
https://slideplayer.com/slide/3333694/
Figure.1: Vertebrates Dataset
Application of Rule-Based Classifier
4
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
A rule r covers a record x if the attributes of the record satisfy the condition of the rule. Rule r is
also said to be triggered or fired whenever it covers a given record.
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
https://slideplayer.com/slide/3333694/
Rule Coverage and Accuracy
5
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Coverage of a rule:
Fraction of records that satisfy the
antecedent of a rule
Accuracy of a rule:
Fraction of records that satisfy both
the antecedent and consequent of a
rule
(Status=Single) No
Coverage = 40%, Accuracy = 50%
Tid Refund Marital
Status
Taxable
Income Class
1Yes Single 125K No
2No Married 100K No
3No Single 70K No
4Yes Married 120K No
5No Divorced 95K Yes
6No Married 60K No
7Yes Divorced 220K No
8No Single 85K Yes
9No Married 75K No
10 No Single 90K Yes
https://slideplayer.com/slide/3333694/
Figure. 2: Taxable Income Dataset
How does Rule-based Classifier Work?
6
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
https://slideplayer.com/slide/3333694/
Characteristics of Rule-Based Classifier
7
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Mutually exclusive rules
Classifier contains mutually exclusive rules if no two rules are triggered by the same
record.
Every record is covered by at most one rule
Exhaustive rules
Classifier has exhaustive coverage if it accounts for every possible combination of attribute
values
Each record is covered by at least one rule
https://slideplayer.com/slide/3333694/
From Decision Trees To Rules
8
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Rules are mutually exclusive and exhaustive.
Rule set contains as much information as the tree.
Classification Rules
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
https://slideplayer.com/slide/3333694/
Rules Can Be Simplified
9
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Initial Rule: (Refund=No) (Status=Married) No
Simplified Rule: (Status=Married) No Tid Refund Marital
Status
Taxable
Income
Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
https://slideplayer.com/slide/3333694/
Figure. 2: Taxable Income Dataset
Effect of Rule Simplification
10
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Rules are no longer mutually exclusive
A record may trigger more than one rule
Solution?
Ordered rule set
Unordered rule set – use voting schemes
Rules are no longer exhaustive
A record may not trigger any rules
Solution?
Use a default class
https://slideplayer.com/slide/3333694/
Ordered Rule Set
11
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Rules are rank ordered according to their priority
An ordered rule set is known as a decision list
When a test record is presented to the classifier
It is assigned to the class label of the highest ranked rule it has triggered
If none of the rules fired, it is assigned to the default class
https://slideplayer.com/slide/3333694/
Rule Ordering Schemes
12
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Rule-based ordering
Individual rules are ranked based on their quality
Class-based ordering
Rules that belong to the same class appear together
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
https://slideplayer.com/slide/3333694/
Building Classification Rules
13
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Direct Method:
Extract rules directly from data
e.g.: RIPPER, CN2, Holte’s 1R
Indirect Method:
Extract rules from other classification models (e.g. decision trees, neural networks, etc).
e.g: C4.5rules
https://slideplayer.com/slide/3333694/
Direct Method: Sequential Covering
14
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Start from an empty rule
Grow a rule using the Learn-One-Rule function
Remove training records covered by the rule
Repeat Step (2) and (3) until stopping criterion is met
(i) Original Data (ii) Step 1
https://slideplayer.com/slide/3333694/ Figure. 3: Example
Example of Sequential Covering
15
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
(iii) Step 2 (iv) Step 3
https://slideplayer.com/slide/3333694/
Figure. 4: Example
Aspects of Sequential Covering
16
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
There are many aspects of sequential covering:
Rule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
Rule Growing:
In rule growing, there are two common strategies:
(a) General-to-specific (b) Specific-to-general
https://slideplayer.com/slide/3333694/ Figure. 5: General to Specific
Rule Growing (Examples)
17
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
CN2 Algorithm:
Start from an empty conjunct: {}
Add conjuncts that minimizes the entropy measure: {A}, {A,B}, …
Determine the rule consequent by taking majority class of instances covered by the rule
RIPPER Algorithm:
Start from an empty rule: {} => class
Add conjuncts that maximizes FOIL’s information gain measure:
R0: {} => class (initial rule)
R1: {A} => class (rule after adding conjunct)
Gain(R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ] (1)
Where, t: number of positive instances covered by both R0 and R1
p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
https://slideplayer.com/slide/3333694/
Instance Elimination
18
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Why do we need to eliminate instances?
Otherwise, the next rule is identical to previous rule
Why do we remove positive instances?
Ensure that the next rule is different
Why do we remove negative instances?
Prevent underestimating accuracy of rule
Compare rules R2 and R3 in the diagram
https://slideplayer.com/slide/3333694/
Figure. 6: Instances Elimination
Rule Evaluation
19
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Metrics:
Accuracy (2)
Laplace (3)
M-estimate (4)
n : Number of instances covered by rule
nc : Number of instances covered by rule
k : Number of classes
p : Prior probability
https://slideplayer.com/slide/3333694/
Stopping Criterion and Rule Pruning
20
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Stopping criterion
Compute the gain
If gain is not significant, discard the new rule
Rule Pruning
Similar to post-pruning of decision trees
Reduced Error Pruning:
Remove one of the conjuncts in the rule
Compare error rate on validation set before and after
pruning
If error improves, prune the conjunct
https://slideplayer.com/slide/3333694/
Direct Method: RIPPER (1/3)
21
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
For 2-class problem, choose one of the classes as positive class, and the other as negative class
Learn rules for positive class
Negative class will be default class
For multi-class problem
Order the classes according to increasing class prevalence (fraction of instances that belong to a
particular class)
Learn the rule set for smallest class first, treat the rest as negative class
Repeat with next smallest class as positive class
Growing a rule:
Start from empty rule
Add conjuncts as long as they improve FOIL’s information gain
Stop when rule no longer covers negative examples
Prune the rule immediately using incremental reduced error pruning
https://slideplayer.com/slide/3333694/
Direct Method: RIPPER (2/3)
22
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Measure for pruning:
v = (p-n)/(p+n) (5)
p: number of positive examples covered by the rule in the validation set
n: number of negative examples covered by the rule in the validation set
Pruning method: delete any final sequence of conditions that maximizes v
Building a Rule Set:
Use sequential covering algorithm
Finds the best rule that covers the current set of positive examples
Eliminate both positive and negative examples covered by the rule
Each time a rule is added to the rule set, compute the new description length
stop adding new rules when the new description length is d bits longer than the smallest
description length obtained so far
https://slideplayer.com/slide/3333694/
Direct Method: RIPPER (3/3)
23
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Optimize the rule set:
For each rule r in the rule set R
Consider 2 alternative rules:
Replacement rule (r*): grow new rule from scratch
Revised rule(r’): add conjuncts to extend the rule r
Compare the rule set for r against the rule set for r* and r’
Choose rule set that minimizes MDL principle
Repeat rule generation and rule optimization for the remaining positive examples
https://slideplayer.com/slide/3333694/
Indirect Methods
24
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Extract rules from an unpruned decision tree
https://slideplayer.com/slide/3333694/
Direct Method: RIPPER
25
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
For each rule, r: A y,
consider an alternative rule r’: A’ y where A’ is obtained by removing one of the conjuncts in A
Compare the pessimistic error rate for r against all r’s
Prune if one of the r’s has lower pessimistic error rate
Repeat until we can no longer improve generalization error
Instead of ordering the rules, order subsets of rules (class ordering)
Each subset is a collection of rules with the same rule consequent (class)
Compute description length of each subset
Description length = L(error) + g L(model) (5)
g is a parameter that takes into account the presence of redundant attributes in a rule set
(default value = 0.5)
https://slideplayer.com/slide/3333694/
Advantages
26
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
As highly expressive as decision trees
Easy to interpret
Easy to generate
Can classify new instances rapidly
Performance comparable to decision trees
https://slideplayer.com/slide/3333694/
27
Department of Computer Science, COMSATS University Islamabad, Islamabad-Pakistan
Thank You !!!