Content uploaded by Marcin Michalak
Author content
All content in this area was uploaded by Marcin Michalak
Content may be subject to copyright.
Foundations of Rough Biclustering
Marcin Michalak
Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland
Marcin.Michalak@polsl.pl
Abstract. Amongst the algorithms for biclustering using some rough
sets based steps none of them uses the formal concept of rough bicluster
with its lower and upper approximation. In this short article the new
foundations of rough biclustering are described. The new relation βgen-
erates β−description classes that build the rough bicluster defined with
its lower and upper approximation.
Keywords: rough sets, biclustering, upper and lower approximation.
1 Introduction
Two significant branches of data analysis deals with the problem of finding values
of unknown function on the basis of known values for some training data points.
In the case of continuous function we say that it is the regression task and in the
case of discrete function the problem is known as the classification task. In this
second example we need to know the finite set of values, taken by the unknown
dependance. Sometimes the cardinality of the set is unknown and that is the
moment when the cluster analysis is performed.
The typical onedimensional cluster analysis gives the answer for two ques-
tions: how many groups are in the data and which training object belongs to
which group (category, class, cluster). We may obtain the complete division of
objects into classes (for example k-means algorithm [7]) or we may also have
some ungroupped objects considered as the kind of background or noise (DB-
SCAN [5]). But if we consider the noise as the class the both kinds of results are
equivalent.
The extension of the clustering notion is biclustering. This problem was intro-
duced in early 70’s last century [6] and deals with the problem of grouping both
subsets of rows and subsets of columns from the twodimensional matrices. Since
then many biclustering algorithms were developed, especially for the purpose of
microarray data analysis [13,11,1].
Almost ten years after the biclustering beginnings Pawlak introduced the
rough sets theory as the different formal description of inexactness [9,10]. In this
approach every set may be described with its lower and upper approximation.
Lower approximation points which objects surely belong to the considered set
and the complement of the upper approximation points which object surely not
belong to the considered set. In the language of rough sets theory the set is exact
iff its lower and upper approximations are equal.
L. Rutkowski et al. (Eds.): ICAISC 2012, Part II, LNCS 7268, pp. 144–151, 2012.
c
Springer-Verlag Berlin Heidelberg 2012
Foundations of Rough Biclustering 145
There are attempts to apply the rough sets theory to the biclustering. Some of
them are based on the generalisation of k-means algorithm [12,4]. But there are no
complete formal definition of rough bicluster on the basis of its approximations.
In this article the new complex rough approach for the biclustering is pro-
posed. It starts with the generalisation (unification) of object and attribute no-
tations. Then the special relation between matrix cells is defined (β−relation).
This relation generates the set of β−description classes which sums and inter-
sections give (respectively) upper and lower approximation of rough biclusters
what fulfils the Pawlak definition of the rough set. The short example of finding
rough biclusters in the discrete value matrix is also shown.
2 Rough Sets Based Description of Biclustering
Let us consider the binary matrix Mwith rrows and ccolumns. As the matrix
can be rotated by 90 degrees without any loss of information we see that notions
of ,,row” and ,,column” are subjective. From this point of view two other notions
(already introduced in previous papers [8]) are more useful: feature and co-
feature. If we consider feature as a row then every column is called co-feature
and in the opposite way: if we consider feature as a column then every row is
called co-feature. The set of features will be denoted as Fand features will be
denoted as f. The set of co-features will be denoted as F∗and co-features will
be denoted as f∗. Generally all notions without the asterisk will be connected
with features and all analogically notions for co-features will be marked with
the asterisk. M{f, f ∗}is the value in the matrix Mbut it depends on the user
assumptions whether M{f, f ∗}=M(f, f∗)orM{f, f∗}=M(f∗,f).
c1c2c3
r11 2 3
r24 5 6
f∗
1f∗
2f∗
3
f11 2 3
f24 5 6
f1f2f3
f∗
11 2 3
f∗
24 5 6
Originaltable Rowsasfeatures Rowsasco-features
Fig. 1. Illustration of features and co-features
2.1 β−Relation
Let us consider the draft of β−relation β⊆(F×F
∗)2that joins cells with the
same value. It may be written non formally in the intuitive way as:
(f, f∗)βv(g, g∗)⇔M{f, f ∗}=M{g, g∗}=v
where v∈Vand Vis the set of values from the matrix M. In the case where
Mis a binary one v=0orv=1.
Now it is the time to precise this definition. We want this relation to be
v−reflexive, symmetric and v−transitive. The v−reflexivity of this relation is
implied by the equality M{f, f∗}=M{f, f ∗}=vand means that cell is in the
β−relation with itself. The symmetry implies from the symmetry of the equality
146 M. Michalak
M{f, f∗}=M{g, g∗}. The definition of v−transitivity is more complicated but
is also intuitive. This property is the basis of the biclustering relation.
Let us start for the single cell from the matrix M{f, f ∗}=v. For every
g∗∈F
∗that M{f, g∗}=vwe will claim that (f, f∗)βv(f, g∗). Also for every
g∈F that M{g, f ∗}=vwe will claim that (f, f∗)βv(g, f ∗).
If we want two pairs (f, f∗),(g, g∗), where f=gand f∗=g∗(two different
cells from the matrix M)tobeintherelationβ⊆(F×F
∗)2we have to satisfy
the following condition:
(f, f∗)βv(g, g∗)|f=g,f ∗=g∗⇔(f, f ∗)βv(g, f∗)∧(f, f ∗)βv(f, g∗)∧
∧(f, g∗)βv(g, g∗)∧(g, f∗)βv(g, g∗)
Now let us extend the definition of the relation βvfor some subsets of features
and co-features. Subset of features F⊆Fand F∗⊆F
∗are in the relation
β⊆(2F×2F∗)2iff every feature f∈Fis in the relation βvwith every co-
feature f∗∈F∗.
Fβ
vF∗⇔∀
f∈F∀f∗∈F∗fβvf∗
So, if there is a set of features F⊆Fand a subset of co-features F∗⊆F
∗and
Fβ
vF∗then the cell C=M{k, k∗}will be in the relation βvwith F×F∗iff:
∀P∈F×F∗Pβ
vC
what will be written in the shorten way as:
(F×F∗)βvC
The final definition of v−transitivity has a form as follows:
(a, a∗)βv(b, b∗)∧(b, b∗)βv(c, c∗)∧({a, b}×{a∗,b
∗})βv(c, c∗)⇒(a, a∗)βv(c, c∗)
2.2 β−Description Class
Apart from v−symmetry, reflexivity and v−transitivity the β−relation has an-
other notion analogical for the equivalence relation: β−description class will be
defined similarly as the equivalence class. The one thing that differs β−description
class from the equivalence class is that every cell may have at least one but not
only the one class. β−description class is defined as an ordered pair of subsets of
Fand F∗as follows:
[(f, f∗)]βv=(F, F ∗),F⊆F,F∗⊆F
∗
iff:
i) (f, f∗)βv(F×F∗)
ii) ∀e/∈F∀e∗/∈F∗¬[(f, f ∗)βv(F∪{e})×(F∗∪{e∗})]
Foundations of Rough Biclustering 147
Table 1 . Left: {f2,f
3,f
4}×{f∗
2,f∗
3,f∗
4,f∗
5}is not in the relation β1with {f6,f∗
7}.
Right: {f2,f
3,f
4}×{f∗
2,f∗
3,f∗
4,f∗
5}β1{f6,f∗
7}.
f1f2f3f4f5f6f7
f∗
10 0 0 0 0 0 0
f∗
201 1 1 0 0 0
f∗
301 1 1 0 0 0
f∗
401 1 1 0 0 0
f∗
501 1 1 0 0 0
f∗
60 0 0 0 0 0 0
f∗
70 0 0 0 0 10
f∗
80 0 0 0 0 0 0
f1f2f3f4f5f6f7
f∗
10 0 0 0 0 0 0
f∗
201 1 1 010
f∗
301 1 1 010
f∗
401 1 1 010
f∗
501 1 1 010
f∗
60 0 0 0 0 0 0
f∗
701 1 1 010
f∗
80 0 0 0 0 0 0
In other words it may be said that the β−description class are the largest (in the
sense of inclusion) subset of features and co-features which Cartesian product
gives cells that are all in the βvrelation with the given one cell. Now we see why
it is possible for the cell to have more than one β−description class. The set of
all β−description classes from matrix Mwill be called the dictionary: DM.
Let us consider the following relation R(Dv
M)⊆D
v
M×Dv
M.Twoβ−description
classes d1=(F1,F∗
1),d
2=(F2,F∗
2) are in the relation R(Dv
M)whenatleastone
of the following conditions is satisfied:
i) F1∩F2=∅∧F∗
1∩F∗
2=∅
ii) ∃d3∈DMd1R(Dv
M)d3∧d3R(Dv
M)d2
This relation is the equivalence relation: the symmetry and the reflexivity are
given by the first condition and the transitivity is given by the second condition.
The partition of Rintroduced by this relation will be denoted as:
Πv
M={π1,π
2,··· ,π
p}
This means that πiis the set of β−description classes. For every πitwo sets
may be defined, connected with predecessors and successors of pairs that are
elements of πi.
p(πi)={F⊆F:∃F∗⊆F ∗(F, F ∗)∈πi}
s(πi)={F∗⊆F
∗:∃F⊆F(F, F ∗)∈πi}
2.3 Rough Biclusters
Now we are able to define the rough sets based approach for the biclustering
problem. Every πigenerates the one rough bicluster biin the following way:
i) lower bound: bi=(
p(πi),s(πi))
ii) upper bound: bi=(
p(πi),s(πi))
Bicluster biwill be rough in the case when the sum and the join of the πiwill
be different or exact otherwise. The only possibility for the bito be exact is that
card(πi)=1.
148 M. Michalak
3 Case Study
Let us consider the following discrete value matrix Mpresented in the
Table 2. It contains ten rows and ten columns. Arbitrary rows are considered as
co-features (that is why their labels are with asterisks) and columns are consid-
ered as features.
Table 2 . Matrix M
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
We are interested in finding biclusters of some subset of matrix values: biclus-
ters of ones and twos. In the first step we are looking for β1−description classes.
We obtain the dictionary D1
Mthat contain two classes:
D1
M={({f8,f
9,f
10},{f∗
1,f∗
2,f∗
3,f∗
4,f∗
5}),({f6,f
7,f
8,f
9,f
10},{f∗
1,f∗
2,f∗
3})}
Both classes are shown in Tables 3(a) and 3(b).
As we see the partition of R(D1
M) has only one element (both β1−description
classes has non-empty intersection), so we obtain just one rough bicluster and it
form is:
b1=({f8,f
9,f
10},{f∗
1,f∗
2,f∗
3})b1=({f6,f
7,f
8,f
9,f
10},{f∗
1,f∗
2,f∗
3,f∗
4,f∗
5})
Table 3 . The dictionary D1
M
(a) First β1−description class.
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
(b) Second β1−description class.
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
Foundations of Rough Biclustering 149
Now let us see the dictionary D2
M–italsohastwoβ−description classes
(Tables 4(a) and 4(b)).
Table 4 . The dictionary D2
M
(a) First β2−description class.
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
(b) Second β2−description class.
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
Also the partition of R(D2
M) has only one element and the rough bicluster
has the form:
b2=({f3},{f∗
8})b2=({f1,f
2,f
3,f
4,f
5,f
7,f
8,f
9},{f∗
6,f∗
7,f∗
8,f∗
9,f∗
10})
The last one table (Table 5) shows all rough biclusters. Lower approximations
are marked with the darker background and upper approximations are marked
with the lighter background.
Table 5 . Rough biclusters
f1f2f3f4f5f6f7f8f9f10
f∗
10 0 0 0 0 1 1 1 1 1
f∗
20 0 0 0 0 1 1 1 1 1
f∗
30 0 0 0 0 1 1 1 1 1
f∗
40 0 0 0 0 0 0 1 1 1
f∗
50 0 0 0 0 0 0 1 1 1
f∗
60 0 2 2 2 0 0 0 0 0
f∗
70 0 2 2 2 0 0 0 0 0
f∗
82 2 2 2 2 0 2 2 2 0
f∗
92 2 2 0 0 0 2 2 2 0
f∗
10 2 2 2 0 0 0 2 2 2 0
4 Rough and Exact Biclusters
We may see that from the formal point of view every β−description class may be
also considered as the bicluster. Building rough biclusters from the partition of
the dictionary gives the opportunity of generalisation and limitation the number
150 M. Michalak
of biclusters. It should depend from the user whether combine β−description
classes into rough biclusters or just use exact ones. It also should be marked
that if πiin the partition of the relation R(Dv
M) has the only one element the
bicluster generated from this element will be also exact.
5 Conclusions
This article brings the new look for the rough description of the biclustering
problem. In the opposition to other biclustering algorithms referring to the rough
sets theory, this rough biclustering approach gives the formal definition of rough
bicluster. The short example described in this article shows also two levels of
interpreting the bicluster in the data: from the one hand we use the definition of
rough bicluster and the possibility of generalisation (biclusters with their lower
and upper approximation) and from the other hand we may stop the analysis
at the step where the β−description classes are generated. This is the choice
between lower number of more general inexact biclusters or bigger number of
small exact ones.
Further works will focus on finding the algorithm of generating β−description
classes what will make it possible to apply rough biclustering approach to the
real data sets. If we consider the wide applicability of biclustering algorithm,
especially the medical and bioinformatical ones, the potential of rough bicluster
becomes really impressive.
Acknowledgements. This work was supported by the European Community
from the European Social Fund.
References
1. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. of the 8th Int.
Conf. on Intell. Syst. for Mol. Biol., pp. 93–103 (2000)
2. Emilyn, J.J., Ramar, K.: Rough Set Based Clustering of Gene Expression Data: A
Survey. Int. J. of Eng. Sci. and Technol. 2(12), 7160–7164 (2010)
3. Emilyn, J.J., Ramar, K.: A Rough Set Based Gene Expression Clustering Algo-
rithm. J. of Comput. Sci. 7(7), 986–990 (2011)
4. Emilyn, J.J., Ramar, K.: A Rough Set Based Novel Biclustering Algorithm for
Gene Expression Data. In: Int. Conf. on Electron. Comput. Techn., pp. 284–288
(2011)
5. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Dis-
covering Clusters in Large Spatial Databases with Noise. In: Proc. of 2nd Int. Conf.
on Knowl. Discov. and Data Min., pp. 226–231 (1996)
6. Hartigan, J.A.: Direct Clustering of a Data Matrix. J. Am. Stat. Assoc. 67(337),
123–129 (1972)
7. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Ob-
servations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., pp. 281–297
(1967)
Foundations of Rough Biclustering 151
8. Michalak, M., Stawarz, M.: Generating and Postprocessing of Biclusters from
Discrete Value Matrices. In: Jedrzejowicz, P., Nguyen, N.T., Hoang, K. (eds.)
ICCCI 2011, Part I. LNCS, vol. 6922, pp. 103–112. Springer, Heidelberg (2011)
9. Pawlak, Z.: Rough Sets. J. of Comput. and Inf. Sci. 5(11), 341–356 (1982)
10. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer
Academic Publishing (1991)
11. Pensa, R., Boulicaut, J.F.: Constrained Co-clustering of Gene Expression Data. In:
Proc. SIAM Int. Conf. on Data Min., SDM 2008, pp. 25–36 (2008)
12. Wang, R., Miao, D., Li, G., Zhang, H.: Rough Overlapping Biclustering of Gene
Expression Data. In: Proc. of the 7th IEEE Int. Conf. on Bioinforma. and Bioeng.
(2007)
13. Yang, E., Foteinou, P.T., King, K.R., Yarmush, M.L., Androulakis, I.P.: A Novel
Non-overlapping biclustering Algorithm for Network Generation Using Living Cell
Array Data. Bioinforma. 17(23), 2306–2313 (2007)