Content uploaded by Doreen Ying Ying Sim

Author content

All content in this area was uploaded by Doreen Ying Ying Sim on Jun 13, 2019

Content may be subject to copyright.

Pushing Constraints by Rule-Driven Pruning Techniques in

Non-Uniform Minimum Support for Predicting Obstructive Sleep Apnea

SIM Doreen Ying Ying1,a*, TEH Chee Siong1,b

and ISMAIL Ahmad Izuanuddin2,c

1Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kuching,

Sarawak, Malaysia

2Respiratory Medicine Unit, Department of Respiratory Medicine, UiTM Medical Specialist Centre,

Faculty of Medicine, Universiti Teknologi MARA, Selangor, Malaysia.

adsdoreenyy@gmail.com, bcsteh@unimas.my, cizuanuddin@salam.uitm.edu.my

Keywords: Boosted Association-Ruled Pruned Decision Tree; pushed minimum support;

minimum confidence constraints; association rules; subtree replacement and subtree raising;

frequent itemset(s); Adaptive Apriori.

Abstract. Boosted Association-Ruled Pruned Decision Tree (ARP-DT), the improved version of

the Boosted Decision Tree algorithm, was developed by using association-ruled pre- and post-

pruning techniques with referring to the pushed minimum support and minimum confidence

constraints as well as the association rules applied. The novelty of the Association-Ruled pruning

techniques applied mainly embark on the pre-pruning techniques through researching on the

maximum number of decision tree splitting, as well as the post-pruning techniques involving

subtree replacement and subtree raising. The applied association rules (ARs) augment the mining of

frequent itemset(s) or interesting itemset(s) such that appropriate pre-pruning or subtree pruning

techniques can be applied before AdaBoost ensemble is implemented. The ARs applied involve the

Adaptive Apriori (AA) augmented rule definitions and theorem as stated in this research focuses on

the characteristics of the datasets accessed so as to streamline the rule-driven pruning techniques on

the Boosting algorithms developed for predicting Obstructive Sleep Apnea (OSA). There is a

significant improvement in the prediction accuracies when comparing the classical boosting

algorithms and Boosted ARP-DT being applied to the OSA datasets and those online databases

from University of California Irvine (UCI) data repositories.

Introduction

Strong rules are not necessarily interesting [6-8]. But, interesting rules are often not being

supported by frequent itemsets. Hence, mining frequent itemsets and using association rules to

construct a prediction system for Obstructive Sleep Apnea (OSA) based on the raw data collected in

Malaysia is still yet to be researched upon [1-4]. Pruning techniques applied in this research mainly

involve in the association rules, especially those augmented by Adaptive Apriori (AA), applied.

These include pushing constraints of the minimum coverage- driven and accuracy-driven limit of

the decision trees developed for the datasets of Obstructive Sleep Apnea and other online databases

in the UCI data repositories. These pruning techniques were applied during or after decision trees

have been developed but before AdaBoosting. The improved prediction algorithms developed (i.e.,

Boosted ARP-DT) embarks on the pre- and/or post-pruning techniques augmented by the AA

properties of the datasets accessed in this research.

Problem Statements

The problem statements of this research study are elaborated as below. This is to show the

reasons why to solve these problem statements, pushing constraints by using rule-driven pruning

techniques based on a non-uniform minimum support is implemented in this research.

Applied Mechanics and Materials Online: 2019-06-10

ISSN: 1662-7482, Vol. 892, pp 210-218

doi:10.4028/www.scientific.net/AMM.892.210

© 2019 Trans Tech Publications Ltd, Switzerland

All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans

Tech Publications Ltd, www.scientific.net. (#500384697-13/06/19,05:08:30)

(1) In classical Boosted Decision Tree algorithms, the problem is that the default setting will be

applied. That is, the default value of the maximum number of tree splitting is as follows: n-1 for

the maximum number of decision splits or ‘MaxNumSplit’, (n = training tree level) - maximum

number of tree splits is size(X, 1)-1, i.e. number of training tree level minus 1. However, this is

based on a uniform minimum support, which its problem is that it either misses interesting

patterns which have low support or suffers from the bottleneck of itemset generation caused by a

low minimum support.

(2) The classical association mining is based on a uniform minimum support. This is an Apriori

property. Again, the problem is that Apriori property misses interesting patterns which often

occur at varied levels of support. Novelty of this research is to apply Adaptive Apriori (AA)

which violates the Apriori property by defining pushed minimum support, i.e. Pminsup, for each

schema with respect to the preservation of uniform minimum support.

(3) Support constraints (SCs) specification is usually user-defined. Again, the problem is that it is

based on a minimum support or confidence range [0..1]. This is to specify general constraints on

minimum support or confidence [1, 7-9, 12-14]. Since frequent item-sets, regardless of being in

AA or Apriori property, estimate the joint probabilities of event recognition [5-9, 12-13]. So,

mining frequent item-sets has to be better augmented by Association Rule Mining (ARM). In

this case, pushing constraints by rule-driven pruning techniques based on a non-uniform

minimum support is implemented in this research to solve this problem statement.

Pushed Minimum Constraints and Support Constraint Specifications

AA, like Apriori, serves as a strategy to discover monotonous pattern(s) from dataset(s). AA is the

key to mine frequent item-sets which are to group earlier these common monotonous item-sets

generated [1, 3, 9-10]. AA is adopted to avoid decision trees from unnecessarily splitting too many

times and thereby incurring over-fitting of trees. This research adopts ARM definitions and

theorems [7-11] based on the characteristics of dataset(s) before applying AdaBoost ensemble.

Table 1. Meaning of each notation used in this research to describe the schema (s) of a decision tree

Notations

Meaning of each notation

s

a node or a schema named as s

s’

set of left siblings of s

minsup(s)

minimum support of s

σ

(s)

set of Support Constraints (SCs)

Pminsup (s)

pushed minimum support of s

Sminsup (s)

lowest pushed minimum support

Below is a series of ARM definitions illustrating how the AA properties are contemplated.

Definition 1 (Association Rules) [9]: For each pair of frequent itemsets FIS and FIS’, such that

FIS

⊂

FIS’, the following two association rules (AR) apply:-

I. if sup(FIS’)/sup(FIS)

≥

minconf, FIS

→

FIS’- FIS, or Type I AR is constructed.

Monotonicity or upward closure properties exist where subtree raising is applied.

II. if sup(FIS’)/sup(FIS’-FIS)

≥

minconf, FIS’-FIS

→

FIS, or Type II AR is constructed. Anti-

monotonicity or downward closure properties where subtree replacement is usually applied.

Definition 2 (Redundant Rules) [7-10]: A rule rr in a dataset S is removable if there is a certain

or some rules such that rr’ in S that is more universal (or less specific) and tiered in a higher level

than rr, i.e. in notation this is expressed as rr

U rr’ and rr

T rr’.

Definition 3 (Pushed Minimum Support) [1, 9, 14]: Let Pminsup be a function from the schema

of schema enumerated tree T to [0..1] satisfying the following:

Applied Mechanics and Materials Vol. 892 211

(1) Total Coverage: For every schema s in T such that minsup(s) is defined, Pminsup ≤ minsup(s);

(2) Uniform-minimum-support-like: For a schema s and its generating schemas s1 and s2,

whenever an itemset I of s is frequent (Pminsup), so are the I1 of s1 and I2 of s2;

(3) Maximal Pushed Minimum Support: Pminsup is maximal with respect to the Total Coverage

and Uniform-minimum-support-like.

Theorem (Pushed Minimum Support) [1, 7-9, 14]: Consider an ancestor or parent node, p, and

its offspring node, c, in any dataset where AA properties are to be contemplated. The following

properties [7-9] are considered to be equivalent while applying rule-driven pruning techniques:

I. p has a left same tree-level node, p’, and that Sminsup (p’) < Sminsup (p);

II. p has a left same tree-level node, p’, and that Sminsup (p’) is pruned in subtree (p);

III. Pminsup (p) < Pminsup (c).

Research Hypotheses. This research hypothesizes that if a dataset has most or all of its attribute(s)

having non-uniform minimum support, frequent itemset(s) can be mined by assessing the pushed

minimum constraints through Association Rule Mining (ARM), so that over-fitting or under-fitting

of decision trees can be avoided by applying pre-pruning techniques, and/or by applying post-

pruning techniques which all these can be augmented by the AA properties.

Table 2 shows the minimum support in each of the Support Constraints (SCs) at each bin

assigned for the OSA investigated variables, i.e. (1) bilateral Tonsils’ Size or TS; (2) Mallampati

Score, i.e. MP; (3) Neck Circumference or NC; (4) Epworth Sleepiness Scale or ESS; (5) Morbid

Obesity or MO; (6) Diabetes Mellitus or DM.

Table 2. Pushing minimum support (minsup) at each SC and the attributes in

each bin of the OSA dataset

Bin(s) in the

SCs as stated

A specification

minsup(s) as at SC

Each SC’s attribute(s)

in each bin assigned

1. SC0()

≥

0.86

ESS for B0

2. SC

1

(B

1

,

B3)

≥

0.12

TS & MP for B

1

or NC & MO for B3

3. SC2 (B3)

≥

0.17

NC & MO for B3

4. SC3 (B2)

≥

0.47

DM for B2

Table 2 shows the pushing minimum support at each support constraint (SC) and the OSA

attributes in each assigned bin (i.e. B0, B1, B2 and B3) at the specified SC.

Enumeration-Based Specification on the Improved Algorithms developed. Decision trees were

developed under the category of Classification and Regression Trees (CART). These trees were then

boosted by GentleBoost (for two-class datasets) or AdaBoostM2 (for more than two-class datasets

such as the Iris dataset), 200 iterations, 15-fold cross validations, by using MATLAB(R2016a)

software. By applying pushed minimum constraints, Eq.1 is adopted to compute the support count,

σ

(X), of an itemset X. This is an important property that refers to the number of transactions that

contain a particular itemset. The symbol |

•

| in Eq.1 denotes the number of elements in a set [11-14,

16].

σ

(X) = |{ti | X

⊆

ti , ti

∈

T }| (1)

An association rule is an implication expression of the form X

→

Y, where X and Y are disjoint

itemsets, i.e. X

∩

Y = 0. So, the strength of an association rule can be measured in terms of its

support (refer to Eq.2) [1-3, 5-6] and confidence (refer to Eq.3) [1-3, 5-6, 15-16].

Support, s(X

→

Y) =

N

YX )( ∪

σ

(2)

212 Computational Science and Engineering

Confidence, c(X

→

Y) =

)

(

)

(

X

YX

σ

σ

∪

(3)

Fig. 1. A schema enumeration tree for one of the OSA datasets collected, marked with Sminsup/Pminsup

The AA rule-driven pre- and post-pruning techniques work to improve the prediction accuracies

of the algorithms developed. Post-pruning includes weight vectors which are derived by the (i)

minimum instance count of a node; and (ii) the Sminsup and Pminsup of each feature as assessed from

those instances which are part(s) of the common pattern(s) derived from the dataset(s). Fig. 1

shows the schema enumerated tree with the four bins illustrated as circles in yellow marked with

Sminsup/Pminsup where each specification for these support constraints is tally with each listed in Table

2. Fig. 1 shows each specified minimum support (Sminsup) and the pushed minimum support (Pminsup).

Fig. 2. Subtree replacement and subtree raising as post-pruning for the algorithms developed

Fig. 2 shows that by applying the ARM Definition 1 with Type I and II ARs, post-pruning

techniques are applied to prune the subtrees.Narrative description of Boosted ARP-DT is as below.

Applied Mechanics and Materials Vol. 892 213

Improved Algorithms: Boosted ARP-DT:

OSA datasets (or datasets from UCI online databases)

AA properties to assess the characteristics of dataset(s)

Decision Trees (DT) development and completion

AdaBoost ensemble Boosted ARP-DT

Improved Algorithms: Boosted ARP-DT (in narrative pseudo-codes):

Inputs: (Pruning augmented by Association Rules or ARs)

Method Portion I: (early tree-stopping techniques)

(1) Create a root node labeled as N;

(2) IF instances D are all of same class, C, THEN

return N as a leaf node labeled with the class (i.e.

pruning by applying ARM Definition 1);

ELSE proceed growing the tree;

(3) IF feature-list is empty THEN

return N as a leaf node labeled with the majority class in D; //majority voting

(4) Apply pre-pruning (D, feature-list) to determine the optimal features to be selected;

(5) Label node N with Gini index splitting criterion;

(6) IF Dj is empty THEN attach a leaf with majority class to node N;

ELSE attach the node returned by Step (5) above (Dj, feature-list) to node N;

(7) return N;

Product I: Pruned tree with number of instances as prePN.

Method Portion II: (post tree development techniques)

Partition the data into training dataset and testing dataset. Testify the completed decision tree in

terms of measuring the error reduction after pruning leaf node(s).

Do while (number of errors in testing dataset <= number of errors in training dataset)

Do for (each leaf node, l, in the tree)

(1) temporarily replace the sub-tree below the l node with a leaf node according to the

current majority class at that node (this increases errors since initial tree has no error); (2) measure

the average error reduced per leaf in the testing dataset; (3) calculate the number of errors for each

node if collapsed to a leaf or leaves; (4) permanently remove the leaf node(s) that produce(s) the

maximum reduction in errors in the testing dataset.

End For

End Do

Characteristics of the datasets are being assessed based on Adaptive Apriori.

Post tree development techniques applied mainly by subtree replacement and/or

subtree raising i.e. ARM Def. 1(I), 2 and 3.

Early tree-stopping techniques applied by estimating the maximum number of tree

splitting, i.e. ARM Def. 1(II), and 3.

214 Computational Science and Engineering

Product II: Pruned tree with number of instances as posPN.

Explanation of Method Portion I and Portion II: Pruned tree(s) from Products I and II are input

into AdaBoost ensemble. The finalized number after the pruning deriving from prePN and posPN,

has its finalized number of instances labeled as FN. Eventually, the pruned decision tree(s) will be

input to the boosting algorithms.

1. Attributes and features: input dataset with variables {(x1, y1), …, (xFN, yFN)} where xi є

X, yi є Y = {-1, +1}; for tree splitting, the starting default setting of tree splitting is ϗ NumSplit

= 1, till the maximum number of tree splitting is ϗMaxNumSplit, while each tree level per-

increase of ϗ is ϗ NumSplit+1.

2. Weight setting: The weights of the training dataset,

wi

1

= 1/FN, where i = 1, 2, …, FN.

3. Do for i = 1, 2, …, FN

(a) Train ARP-DT component classifier, ht,

(b) Compute the re-substitution error of

ht : εt =

∑=

N

i

t

i

w

1

, yi ≠ ht (xi),

(c) If εt > 0.5, exit the looping cycle;

Else, ϗ NumSplit = ϗ NumSplit + 1, to (d);

(d) Set the weight of ARP-DT component

classifier: ht = αt =

−

ε

ε

t

t

1

ln

2

1

(e) Update weights of OSA training samples:

wt

i

1+

=

C

xh

y

w

t

it

i

t

t

i

))(exp(

α

−

where Ct is the normalization constant, and

∑=

+

N

i

t

i

w

1

1

=1

End Do

4. Product executed: the largest weighted classifier from the boosted pruned trees chosen, f

(x) = sign

()

∑

=

T

t

tt

x

h

1

)(

α

END

Experimental Results

All the tables in this section show the experimental results of all the pruning techniques

augmented by ARM which are implemented to three OSA datasets and seven UCI online databases.

Table 3 shows the ARM definition(s) and/or theorem applied for each dataset. Table 4 shows the

prediction accuracies of Boosted DT and Boosted ARP-DT as well as improvements achieved.

Table 5 shows the particular type(s) of pre- and/or post-pruning technique(s) applied to each dataset

and the p-value for each improvement achieved. The experimental results of each dataset as well as

the statistical significance for the different types of pre-pruning and/or post-pruning techniques

applied are shown in Table 4 and Table 5.

Applied Mechanics and Materials Vol. 892 215

Table 3. Prediction accuracies of Boosted ARP-DT in OSA datasets and UCI online databases

Table 4. Prediction Accuracies of Boosted DT and Boosted ARP-DT as well as improvement

achieved

asignificant at P <0.05; b significant at P <0.001; c significant at P <0.0001

Table 5. Pre- and post-pruning technique(s) applied and each p-value for the improved accuracies

achieved

OSA datasets

/ UCI online databases from

UCI data repositories

ARM pre-pruning (Max Number of Tree Splitting or

MaxNumSplit or MNS) and/or ARM weight vector for

post- pruning

p-value for the improved

prediction

by Boosted ARP-DT

1. OSA (dataset 1)

ARM weight vector MNS,ARM weight vector

0.0002

2. OSA (dataset 2)

MNS,ARM weight vector

0.0001

3. OSA (dataset 3)

MNS,ARM weight vector

<0.0001

4. Diabetes(Pima)

MNS,ARM weight vector

0.0001

5. Heart disease

MNS,ARM weight vector

0.0004

6. Liver disorder

MNS,ARM weight vector

0.0004

7. Titanic

MNS,ARM weight vector

0.0005

8. Ionosphere

MNS,ARM weight vector

0.0001

9. Hepatitis

MNS,ARM weight vector

0.0004

10. Iris

MNS

0.0001

To show what can be summarized based on the results stipulated in Tables 4 and 5, Fig. 3 shows

that the profound prediction improvement can be achieved by the proposed ARM augmented

algorithms, i.e. Boosted ARP-DT, as opposed to the classical Boosted DT.

OSA datasets /

UCI online

databases

Number of

instances

or tuples

Number of

features in

zeach dataset

Association Rules stated

Definitions or Theorem applied (by

ARM)

1. OSA (dataset 1)

200

12

Def. 1(II) & 2

2. OSA (dataset 2)

270 8

Def. 1(I) & 3

3. OSA (dataset 3)

270

9

Def. 1(I),2 & 3, ARM Theorem

4. Diabetes(Pima) 768 8 Def. 1(II) & 2

5. Heart disease

270 13 Def. 1(I) & 3

6. Liver disorder

345

6

Def. 1(I, II) & 3

7. Titanic

1309

5

Def. 1(I), 2 & 3, ARM Theorem

8. Ionosphere 351 32 Def. 1(I) & 3

9. Hepatitis 155 19

Def. 1(I, II) & 3, ARM Theorem

10. Iris

150

4

Def. 1(I), 2 & 3, ARM Theorem

OSA datasets /

UCI online databases

Prediction Accuracies

Improvement achieved (%)

Boosted DT

Boosted ARP-DT

1. OSA (dataset 1)

0.9400

0.9850

4.50%

b

2. OSA (dataset 2) 0.7704 0.8259 5.55%

b

2. OSA (dataset 3)

0.8185

0.9519

13.34% c

4. Diabetes(Pima) 0.6970 0.7591 6.21%

b

5. Heart disease

0.7700 0.8074 3.74%

a

6. Liver disorder

0.6782

0.7159

3.77% a

7. Titanic

0.7815

0.8121

3.06%

a

8. Ionosphere

0.8720

0.9430

7.10%

b

9. Hepatitis

0.8323

0.8710

3.87%a

10. Iris

0.9067

0.9600

5.33%b

216 Computational Science and Engineering

95% confidence interval and p-values at one-tail (right-tail) t-test

Fig. 3. ARM augments pre- and post-pruning on the decision trees developed in the Boosted

ARP-DT

Conclusion

Research hypotheses as stated have been proven to be valid since ARM and AA can be used to

augment the association-ruled pre- and post-pruning techniques on decision trees developed for

datasets not having prominent uniform minimum support for their attributes. Tabulated research

findings from Tables 4 and 5 showed significant improvements achieved by Boosted ARP-DT

through one-tailed t-tests, p-values and levels of statistical significance. Boosted ARP-DT,

regardless of using AA pre- and/or post-pruning or AA weighting vector augmented by AA

properties, its prediction accuracies are better than Boosted DT. As analyzed in Tables 4 and 5,

Boosted ARP-DT algorithm promotes subtree raising and subtree replacement through analyzing

the monotonicity properties as well as AA features of the datasets. These avoid unnecessary

splitting of decision trees, and the like. Therefore, Boosted ARP-DT, as opposed to Boosted DT,

gives better prediction accuracies to the OSA datasets, and also to other datasets not having

prominent uniform minimum support for their attributes (as shown above).

Rule-driven pruning is done based on the AA

properties of the datasets accessed.

Applied Mechanics and Materials Vol. 892 217

Acknowledgment

To obtain the raw data of OSA patients’ records, formal Research and Ethics Committee Approval

on medical ground was acquired from Universiti Teknologi MARA, Selangor, Malaysia. This

research is supported by research grant scheme from MOHE, Universiti Malaysia Sarawak

(Reference: FRGS/ICT02(01)/1077/2016(23)).

References

[1] D.Y.Y. Sim, C.S. Teh, A. I. Izuanuddin, Improved boosting algorithms by pre-pruning and

associative rule mining on decision trees for predicting obstructive sleep apnea, Adv. Sci. Lett.

23(11) (2017) 11593-11598.

[2] D.Y.Y. Sim, C.S. Teh, P.K. Banerjee, Prediction model by using Bayesian and cognition-

driven techniques: a study in the context of obstructive sleep apnea, Proceeding of the 9th

International Conference on Cognitive Sciences, Malaysia, Procedia – Social and Behavioral

Sciences. 97 (2013) 528-537.

[3] D.Y.Y. Sim, C.S. Teh, A.I. Izuanuddin, Adaptive apriori and weighted association rule mining

on visual inspected variables for predicting Obstructive Sleep Apnea (OSA), Aus. J.

Intelligent Inform. Process. Sys. 14(2) (2014) 39-45.

[4] D.Y.Y. Sim, C.S. Teh, A.I. Izuanuddin, Improved boosted decision tree algorithms by

adaptive apriori and post-pruning for predicting obstructive sleep apnea, Adv. Sci. Lett. 24(3)

(2018) 1680-1684.

[5] J. Furnkranz, Pruning algorithms for rule learning, Machine Learning. 27(1) (1997) 139-172.

[6] A. K. Das, Mining rare item sets using both top down and bottom up approach, Int. J. Comp.

Sci. Inform. Technol. 7(3) (2016) 1607-1614.

[7] J. Han, M. Kamber, J. Pei, Data mining concepts and techniques, third ed., Elsevier,

Morgan Kaufmann, USA (2012) 17-27, 248-273, 461-488.

[8] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, Proceeding of

the 2000 ACM SIGMOD International Conference on Data Mining, New York, USA, CM

Press, (2000) 1-12.

[9] K. Wang, Y. He, J. Han, Pushing support constraints into association rules mining. IEEE

Trans. Knowledge Data Eng. 15(3) (2003) 642-658.

[10] K. Wang, S. Zhou, S. Liew, Building hierarchical classifiers using class proximity, Proceeding

of the 25th International Conference on Very Large Data Bases, San Francisco, USA, Morgan

Kaufmann, (1999) 363-374.

[11] A.S. Galathiya, A.P. Ganatra, C.K. Bhensdadia, Improved decision tree induction algorithm

with feature selection, cross validation, model complexity and reduced error pruning, Int. J.

Comp. Sci. Inform. Technol. 3(2) (2012) 3427-3431.

[12] S. K. Pal, P. Mitra, Pattern recognition algorithms for data mining, Chapman & Hall, Florida,

USA, CRC Press LLC, 2004, pp. 165-168, 170-174.

[13] [13] G. Hari Prasad, J. Nagamuneiah, A strategy for initiate support check into frequent

itemset mining, Int. J. Adv. Res. Comp. Sci. Software Eng. 2(7) (2012) 43-48.

[14] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large

databases, Proceeding of the 1993 ACM SIGMOD International Conference on Management

of Data, Washington, USA, ACM (1993) 207-216.

[15] L. Breiman, Population theory for boosting ensembles, Annals of Statistics. 32(1) (2004) 1-11.

[16] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees,

Wadsworth, Belmont, USA, 1984, pp. 5-12, 22-39.

218 Computational Science and Engineering