Uni-class pattern-based classification model..
ABSTRACT This paper presents a model of a supervised machine learning approach for classification of a dataset. The model extracts a set of patterns common in a single class from the training dataset according to the rules of the pattern-based subspace clustering technique. These extracted patterns are used to classify the objects of that class in the testing dataset. The user-defined threshold dependence problem in this clustering technique has been resolved in the proposed model. Also this model solve the curse of dimensionality problem without the need of using a separate dimensionality reduction method. Another distinguishing point in this model is its dependence on the variation of the values of relative features among different objects. Experimental results on synthetic and real life datasets show that this approach is more efficient and effective than the existing techniques.
-
Citations (0)
-
Cited In (0)
Page 1
Uni-Class Pattern-based Classification Model
Mostafa A. Salama
Faculty of Computers and Information
British University in Egypt, Cairo, Egypt
Email: mostafa.salama@gmail.com
Aboul Ella Hassanien, Aly A. Fahmy
Faculty of Computers and Information
Cairo University. Cairo, Egypt
Email: aboitcairo,aly.fahmy@gmail.com
Abstract—This paper presents a model of a supervised
machine learning approach for classification of a dataset. The
model extracts a set of patterns from the training dataset
according to the rules of the pattern-based subspace clustering
technique. These extracted patterns are used to classify the
objects in the testing dataset. The user-defined threshold
dependence problem in this clustering technique has been
resolved in the proposed model. Also this model solve the curse
of dimensionality problem without the need of using a separate
dimensionality reduction method. Another distinguishing point
in this model is its dependence on the variation of the values of
relative features among different objects. Experimental results
on synthetic and real life datasets show that this approach is
more efficient and effective than the existing techniques.
I. INTRODUCTION
Data mining tasks can be classified into two categories:
descriptive like clustering techniques and predictive like
classification techniques. Many attempts have been made in
the last decades to design hybrid systems for pattern classi-
fication by combining different and individual classification
techniques [1]. Recently, a number of approaches adopted
a semi-supervised model for classification [2]. Also the
movement from clustering to classification model appears
like in supervised learning vector quantization (LVQ), which
is based on a standard and unsupervised self organizing maps
(SOM) [3], [4]. The approach proposed in this paper is a
classification model that is based on a standard clustering
technique which is a pattern-based clustering technique. The
new factor is that the input training data have associated
class information (labels). There are many techniques that
are based on clustering but they failed to support some of
the requirements of classification that is summarized in this
paper into the following seven requirements.
1) Continuous Data: Some Classification techniques like
Decision trees tend to perform better when dealing with
discrete/categorical features [5]. It is often necessary
to transform a continuous attribute into a categorical
attributes using discretization process [6].
2) The curse of dimensionality: High number of features
may contain irrelevant or redundant features to the
classification [7].
3) Multivariate feature selection problem: Univariate mod-
els assumes that the features are independent [8]. Bayes
belief models, as an example, will be computationally
intractable unless an independence among features as-
sumption (often not true) is imposed [9].
4) The dependence on threshold: An inappropriate user
defined threshold value may result in too many or too
few patterns, with no coverage guarantees [10].
5) Missing Data: There exist many techniques to manage
data with missing items, but no one is absolutely better
than the others, as Allison says, ”the only really good
solution to the missing data problem is not to have any”
[11].
6) Normality of input: PCA and Bayesian network as-
sumes the normality of the input which is considered
as a limitation for many datasets.
7) Rule Extraction: This may be difficult in some classi-
fication models like in neural network.
The proposed model depends on pattern-based subspace
clustering [12] to handle most of the requirements of
classification.
The rest of this paper is organized as follows. Section II
explains the pattern-based subspace clustering model. Sec-
tion III shows the proposed model. Results and comparisons
with other classifiers are illustrated in section IV. Conclusion
and future work is discussed in section V.
II. PATTERN-BASED SUBSPACE CLUSTERING MODEL
Pattern-based clustering is a kind of subspace clustering
algorithm which extracts a subset of the input dataset that
have similar patterns. Most of the subspace clustering algo-
rithms rely on a certain distance function to to capture the
similarity among objects. In high dimensional space, the dis-
tance between any pair of objects is nearly the same.Where
as Pattern-based clustering is effective in discovering such
clusters.
To illustrate this clustering algorithm, we give an example
in Figure (1). Table I is a dataset that consists of five
objects with five attributes. Figure (1a) shows the values of
the objects in full space (five attributes), where no obvious
pattern is visible. However, if we just select attributes {A, B,
D, E} as in Figure (1b) for objects 2, 3, 5, we can observe
the following pattern: for all the three objects, from attribute
A to attributes B; D and E, the values first go down, and
Page 2
Table I
DATASET OF 5 OBJECTS
Attribute
Obj1
Obj2
Obj3
Obj4
Obj5
A
80
35
98
63
56
B
36
18
84
86
40
C
55
26
45
72
50
D
38
38
100
55
63
E
42
17
80
83
40
then up and finally down. We can assign these three objects
into the same subspace cluster as they show similar pattern.
Likewise, similar patterns may exist with other objects in
other subspaces [12].
(a) Data in full space(b) Pattern in subspace
Figure 1.An Example of pattern-based clustering
To tell whether two objects in an input domain of objects
D exhibit a coherent pattern in a given subspace S, it is
essential to describe how coherent the objects are on these
attributes. The following definition serve this purpose.
Given two objects u, and v ∈ D and we say that there exists
a coherent pattern between u and v in subspace S, if formula
(1) and (2) are true.
∀i,j ∈ S,dij= (ui− uj) − (vi− vj) ≤ δ
∀i,j ?∈ S,dij= (ui− uj) − (vi− vj) > δ
(1)
(2)
Subspace S is defined by the set of bounded dimensions
(or subspaces), in which objects u and v have a similar
shifting pattern. That is to say, if the rank of the two objects
on two arbitrary attributes in S is less than a user-defined
threshold ”delta δ”, we say that the two objects have a
coherent pattern, as illustrated in Figure (2). The minimal
variation of object v on attributes i and j is ∆, while the
maximal variation of u is ∆ + δ, and the difference is less
than δ. If all pairs of attributes in S satisfy this condition of
variation, then objects u and v have coherent pattern.
Figure 2.A coherent pattern between two objects
III. PATTERN BASED CLASSIFICATION PROPOSED MODEL
The model is a classification technique that detect the
objects of a certain class from a group of objects according
to specific patterns. It is based on the frequent pattern-based
subspace clustering technique [12]. The selected data set is
of a given and known class. So, we will use the definition
of the Pattern-based subspace cluster, but in a reverse order.
Since the objects are in the same class, we are sure that
there is a coherent pattern in this dataset. However, we are
going to use the definition to extract the set of attributes S,
subspace, that includes the coherent pattern, i.e. pattern [i,j],
meaning that the change from feature i to feature j is the
nearly similar for all objects in the same class. This change
should be less than a specific value δ which is defined as
the maximum difference between two objects of the same
class in the value of change between two features i and
j. The classification of new objects is done according to
each value of the delta, such that the delta with minimum
classification error is considered as the threshold. The steps
of the proposed model in details:
1) Use the training dataset to evaluate the patterns that
exist in a group of objects of the same class. In
the frequent Pattern-based subspace cluster, the set of
objects is unknown and the set of attributes is given
while in this definition the set of attributes is unknown
and the set of objects is given [12]. The definition will
be changed as follows: Given two objects u and v in
D, and these two objects are in a class 1, we have the
i, j attributes, dijis calculated as follows,
∀u,v ∈ S,dij= |ui− uj| − |vi− vj|
If the trend of change from uito ujis opposite to that
from vito vj, for example if ui¿ ujwhile vi¡ vjthen
this ij pattern is excluded by setting the dijby -1.
2) Find the maximum dij values in a matrix of size i x
j calculated in step (1) where there is a difference dij
value for each pair of objects in the same class. Then
sort these dij values (in an ascending order) for all
patterns ij in a single dimension array after removing
the dij values of -1. Figure (2a) shows the sorted dij
(3)
Page 3
values (patterns of class 1), the pattern 9-15, dij has
nearly a zero value, which means all objects in class
1 has no difference between the values of attribute [9]
and attribute [15].
(a) Sorted Delta values extracted
patterns
(b) Error % of the delta values of
each pattern
Figure 3. Pattern Selection of a Gene dataset
3) In the testing phase, the model starts by selecting the
first dij in the sorted list. Then it uses the original
definition of the pattern-based subspace cluster in equa-
tion (1) to classify the testing objects according to
the pattern of the features i and j, where this pattern
corresponds to the dij value. Then calculate the error
percentage of the classified objects.
4) Then the model repeats the test for the first two patterns
using the second delta dij in the sorted list as the
threshold. And again it calculates the error percentage
of the classified objects. The test is repeated until
the all the delta values in the sort list used or the
error percentage is zero. After this step, the resulted
δ that leads to the minimum error percentage will be
considered as the threshold used for classification, this
appears in Figure (3b). And the selected patterns [ij
features] will be used for classification rather than using
all features.
5) In the classification phase, One object from a single
class, x, in the training dataset is selected as object
u, Then for each object v in the testing dataset, If this
object satisfies the rule defined in equations (1) and (2),
then this object is classified to be in class x, otherwise
it is discarded.
IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
The model is applied on two different dataset to perform
Gene classification and Thrombosis disease classification).
A. Gene Classification
Data mining methods have been widely applied in bioin-
formatics to analyze gene functions, gene regulations, cellu-
lar processes, and subtypes of cells. Gene Classification is
one important issue in gene expression data analysis because
it is a basis for prediction of the function of unknown genes
and much work on gene microarray analysis indicates that
high correlation may exist between gene expression patterns
and diseases patterns [13]. In fact, the expression levels of
two closely related genes may rise and fall synchronously
in response to some environmental stimuli. Unfortunately,
conventional distance functions cannot model this similarity
effectively since the expression levels may not be close in
most cases. Thus it is natural to apply the pattern-based
clustering analysis to microarray data [14]. We perform an
experimental study on the efficiency and effectiveness of
our model with a yeast gene expression matrix with 17
conditions,attributes or features, and 40 genes where this
dataset is a part of a bigger matrix in [14]. We know
previously 22 genes that have the similar characteristics, ten
of these set is going to be used as a training dataset. And the
testing dataset contains 31 objects, that includes 12 objects
of the required class to be tested. The sorted patterns resulted
in the training phase are shown in Figure (3a). Only the first
35 patterns ([ij] features) are selected out of 161 patterns,
as the error reaches a minimum value (16.6%) when using
these patterns only as shown in Figure (3b). The threshold in
this case is the delta of value (110.0) of the pattern {2,15}.
The results appears from our model is as follows:
• Minimum Error Percentage is 16.6% for the following
patterns of a delta = 110.0: [9, 15] [14, 15] [9, 14] [5,
14] [5, 15] [5, 9] [15, 16] [14, 16] [10, 15] [10, 14] [9,
16] [9, 10] [3, 15] [3, 14] [3, 10] [3, 9] [3, 5] [4, 14]
[4, 5] [6, 14] [4, 6] [6, 10] [5, 16] [5, 10] [4, 15] [4,
9] [13, 15] [11, 15] [10, 12] [9, 13] [9, 11] [4, 11] [3,
6] [3, 4] [2, 15]
• Correlated objects in class 1 are: 0, 1, 3, 5, 8, 9, 2, 4,
6, 23, 10, 7.
• The classification time is 0.02 seconds.
B. Thrombosis disease Classification
On the other hand, we test our model on a database
collected at Chiba University hospital from the outpatient
clinic of the hospital on collagen diseases (are auto-immune
diseases). A thrombosis is one of the most important and
sever complications in collagen diseases. It is important
to detect and predict the possibilities of its occurrence.
Domain experts are very much interested in discovering
regularities behind patients’ observations [15]. Thrombosis
has four main levels or degrees, which are 0 (negative or
no thrombosis), 1 (positive and the most severe one), 2
(positive and sever) and 3 (positive and mild). We perform an
experimental study on 2 sets only of data which are of 0 and
1 degrees of thrombosis. The training phase will be applied
on 12 cases, of 16 attributes, of patients of thrombosis of
degree 1 to extract the patterns. The testing phase will be
applied on 5 cases, that includes 3 cases of the required
class, level one, to be tested. The sorted patterns resulted in
the training phase are shown in Figure (4a). Only the first
17 patterns ([ij] features), as the error reaches a minimum
value (33.3%) when using these patterns only as shown in
Figure (4b). The threshold in this case is the delta of value
Page 4
(7.3) of the pattern {5, 7}. The results appears from our
model is as follows:
• Minimum Error Percentage is 33.3% for the following
patterns of a delta = 7.3 [9, 12] [8, 12] [5, 9] [5, 8] [4,
5] [4, 9] [4, 12] [4, 8] [6, 13] [12, 13] [6, 8] [8, 13]
[6, 9] [9, 13] [5, 13] [4, 13] [5, 7]
• Correlated objects in class 1 are: 0, 2.
• The classification time is 0.02 seconds.
(a) Sorted Delta values extracted
patterns
(b) Error % of the delta values of
each pattern
Figure 4.Pattern Selection of a Thrombosis dataset
C. Comparison with other classification models
The goal of any classification model is to generate more
certain, precise and accurate results in a good performance.
A comparison in table 1 of the accuracy of prediction
(error percentage) and the time of classification is applied
six different models including the proposed model. Then
analyze the results of these models according to the six
requirements discussed in this paper. Weka software [16] had
been used run the models in this comparison and showed the
following error percentage and running time.
Table II
COMPARISON BETWEEN DIFFERENT CLASSIFICATION MODELS
ACCORDING TO ERROR PERCENTAGES AND RUNNING TIME IN SECONDS
Dataset Gene.
Error%
16.6
Thromb.
Error%
33
Gene
Time
0.012
Thromb.
Time
0.02Proposed
Model
Classification
Via Clustering
IB1
BFTree
Decision Table
Na¨ ıve
Bayes
Bayes
Network
920 0.01 0.01
66
100
66
75
100
80
80
100
0.0
0.01
0.04
0.0
0.0
0.01
0.04
0.0
75 800.010.0
The results are analyzed with respect to the classification
requirements discussed above in the following items:
• Decision trees (BFTree and Decision table) also show
a large error percentatges (between 66% and 100%),
as they tend to perform better when dealing with
discrete/categorical features. On the other hand the
proposed model has the ability the ability to deal with
both continuous and discrete features.
• No feature reduction method is required before classi-
fying data using the proposed model. The model uses
the correlated features only in the classification.
• Bayes belief models like Bayesian Network (BN) and
Na¨ ıve Bayesian Network show large error percentages
(100%, 75%) in case of thrombosis and gene classifica-
tion respectively due to the ”independence assumption”
as declared above, where absence of correlation among
features is assumed while data in both cases existed
as shown before. The proposed model has a better
performance due to its ability to deal with continuous
multivariate (correlated) features.
• The proposed model uses a forward feature selection
method for the determination of the threshold value
without the dependence on used-defined threshold. This
was one of the challenges that face the model due to
its existence in the pattern-based subspace clustering
technique.
• The model uses a complete data set in the training phase
while in testing the classification depends only on the
features selected (informative features). If the missing
data is not informative (not of the selected features),
then it won’t affect the classification and it will be
simply ignored.
• Principal Component Analysis (PCA) is one of the
multivariate feature reduction methods but it has a limi-
tation (assumptions) which is the normality of variables
[9]. No previous assumption is required in this proposed
model. Also Instance-Based Learner (IB1) normalizes
its attributes ranges, processes instances incrementally
using simple normalized Euclidean distance (similarity)
function. IB1 is a lazy model-based algorithm focuses
its effort on classifying the particular event in question,
shows high error percentage in both cases[17].
An interesting conclusion that appears in this paper and
in [18] that the trend of classification error percentage
decreases a certain number of features is used (most relevant
features) then starts to increase again as shown in figure 5.
This trend appears in this paper experimentally in figures
(3.b) and (4.b).
Page 5
Figure 5.
Number of features selected
The change of the error rate in classification according to the
V. CONCLUSIONS AND FUTURE WORKS
There are an extremely large number of literatures on clas-
sification learning, including the use of clustering to augment
classification. But most of these methods have deficiencies
in handling all of these features altogether. Although or
perhaps because many methods of ensemble creation have
been proposed, there is as yet no clear picture of which
method is better. The proposed model handles most of the
discussed requirements of data classification techniques in a
single model. Some models use the clustering algorithm like
pattern based clustering model to select a part of the features
in high dimension data, then uses the selected features in a
known classification model like decision trees and Bayesian
networks. However the proposed model uses a concept in
the clustering model itself to make the classification of the
data. A comparison is made with six different classification
models according to the accuracy of performance and the
time of learning. Our model proves its efficiency and its
competency according these criteria. Missing or incomplete
data is a usual drawback in many real-world applications of
pattern classification. Our planned future work is to find an
appropriate missing data imputation for informative data in
our model and to find a way for rule extraction. Also rule
extraction is considered as a challenge in this model that
should be considered later.
REFERENCES
[1] Ashish Ghosh, B. Uma Shankar and Saroj K. Meher, ”A
novel approach to neuro-fuzzy classification”, Neural Net-
works, vol. 22, no. 1, pp. 100-109, 2009.
[2] Yun-Chao Gong, Chuan-Liang Chen, ”Semi-supervised
method for gene expression data classification with gaussian
and harmonic Functions”, Pattern Recognition ICPR 2008.
19th International Conference, pp. 1-4, 2008.
[3] T. Padma, Madhavi Latha and K. Jayakumar, ”Decision
making algorithm through LVQ neural network for ECG
Arrhythmias”, ICBME 2008, Proceedings 23, pp. 85-88,
2008.
[4] Abderrahmane Boubezoul, Sebastien Paris and Mustapha
Ouladsine, ”Application of the cross entropy method to the
GLVQ algorithm”, Pattern Recognition, vol. 41, pp. 3173-
3178, 2008.
[5] Thair Nu phyu, ”Survey of classification techniques in data
mining”, Proceedings of the International MultiConference
of Engineers and Computer Scientists, IMECS 2009, Hong
Kong, vol. 1, pp 727-731, 2009.
[6] Nikolaos Mastrogiannis, Basilis Boutsinas and Ioannis Gi-
annikos, ”A method for improving the accuracy of data
mining classification algorithms”, Computers & Operations
Research, vol. 36, no. 10, pp. 2829-2839, 2009.
[7] C. Shang and Q. Shen, ”Aiding classification of gene ex-
pression data with feature selection: a comparative study”,
Computational Intelligence Research, vol. 1, pp. 68-76, 2006.
[8] Carmen Lai, Marcel J. T. Reinders and Lodewyk F. A.
Wessels, ”Random subspace method for multivariate feature
selection”, Pattern Recognition Letters, vol. 10, pp.1067-
1076, 2006.
[9] Pierre Geurts, ”Pattern extraction for time series classifi-
cation”, Proceedings of the 5th European Conference on
Principles of Data Mining and Knowledge Discovery, pp.
115 - 127, 2001.
[10] Vanling Li and Li Song, ”Threshold determining method for
feature selection”, Proceedings of the 2009 Second Inter-
national Symposium on Electronic Commerce and Security,
vol. 2, pp. 273-277, 2009.
[11] Pedro J. Garcia-Laencina, Jose-Luis Sancho-Gomez and Ani-
bal R. Figueiras-Vidal, ”Pattern classification with missing
data: a review”, Neural Computing and Applications, vol.
19, pp. 263-282, 2009.
[12] Jihong Guan, Yanglan Gan, Hao Wang. ”Discovering pattern-
based subspace clusters by pattern tree”, Knowledge-Based
Systems, vol. 22, pp. 569-579, 2009.
[13] X. Zhang and W. Wang, ”Mining coherent patterns from
heterogeneous microarray”, Proceedings of the 15 th ACM
International Conference on Information and Knowledge
Management, vol. 17, pp. 838-839, 2006.
[14] Biclustering
http://arep.med.harvard.edu/biclustering/
of ExpressionData
[15] Chiba University hospital DataBase http://lisp.vse.cz/pkdd99/
[16] Weka:
http://www.cs.waikato.ac.nz/ml/weka/
Data Mining Software injava,
[17] Ian H. Witten and Frank E. (2005) Data Mining: Practical
machine learning tools and techniques, 2nd Edition, Morgan
Kaufmann, San Francisco, 2005.
[18] A.G.K. Janecek, W.N. Gansterer, M. Demel and G.F. Ecker,
”On the relationship between feature selection and classifi-
cation accuracy”, JMLR Workshop and Conference Proceed-
ings, vol. 4, pp. 90-105, 2008.