Conference PaperPDF Available

Foundations of Rough Biclustering

Authors:

Abstract

Amongst the algorithms for biclustering using some rough sets based steps none of them uses the formal concept of rough bicluster with its lower and upper approximation. In this short article the new foundations of rough biclustering are described. The new relation β generates β−description classes that build the rough bicluster defined with its lower and upper approximation.
Foundations of Rough Biclustering
Marcin Michalak
Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland
Marcin.Michalak@polsl.pl
Abstract. Amongst the algorithms for biclustering using some rough
sets based steps none of them uses the formal concept of rough bicluster
with its lower and upper approximation. In this short article the new
foundations of rough biclustering are described. The new relation βgen-
erates βdescription classes that build the rough bicluster defined with
its lower and upper approximation.
Keywords: rough sets, biclustering, upper and lower approximation.
1 Introduction
Two significant branches of data analysis deals with the problem of finding values
of unknown function on the basis of known values for some training data points.
In the case of continuous function we say that it is the regression task and in the
case of discrete function the problem is known as the classification task. In this
second example we need to know the finite set of values, taken by the unknown
dependance. Sometimes the cardinality of the set is unknown and that is the
moment when the cluster analysis is performed.
The typical onedimensional cluster analysis gives the answer for two ques-
tions: how many groups are in the data and which training object belongs to
which group (category, class, cluster). We may obtain the complete division of
objects into classes (for example k-means algorithm [7]) or we may also have
some ungroupped objects considered as the kind of background or noise (DB-
SCAN [5]). But if we consider the noise as the class the both kinds of results are
equivalent.
The extension of the clustering notion is biclustering. This problem was intro-
duced in early 70’s last century [6] and deals with the problem of grouping both
subsets of rows and subsets of columns from the twodimensional matrices. Since
then many biclustering algorithms were developed, especially for the purpose of
microarray data analysis [13,11,1].
Almost ten years after the biclustering beginnings Pawlak introduced the
rough sets theory as the different formal description of inexactness [9,10]. In this
approach every set may be described with its lower and upper approximation.
Lower approximation points which objects surely belong to the considered set
and the complement of the upper approximation points which object surely not
belong to the considered set. In the language of rough sets theory the set is exact
iff its lower and upper approximations are equal.
L. Rutkowski et al. (Eds.): ICAISC 2012, Part II, LNCS 7268, pp. 144–151, 2012.
c
Springer-Verlag Berlin Heidelberg 2012
Foundations of Rough Biclustering 145
There are attempts to apply the rough sets theory to the biclustering. Some of
them are based on the generalisation of k-means algorithm [12,4]. But there are no
complete formal definition of rough bicluster on the basis of its approximations.
In this article the new complex rough approach for the biclustering is pro-
posed. It starts with the generalisation (unification) of object and attribute no-
tations. Then the special relation between matrix cells is defined (βrelation).
This relation generates the set of βdescription classes which sums and inter-
sections give (respectively) upper and lower approximation of rough biclusters
what fulfils the Pawlak definition of the rough set. The short example of finding
rough biclusters in the discrete value matrix is also shown.
2 Rough Sets Based Description of Biclustering
Let us consider the binary matrix Mwith rrows and ccolumns. As the matrix
can be rotated by 90 degrees without any loss of information we see that notions
of ,,row” and ,,column” are subjective. From this point of view two other notions
(already introduced in previous papers [8]) are more useful: feature and co-
feature. If we consider feature as a row then every column is called co-feature
and in the opposite way: if we consider feature as a column then every row is
called co-feature. The set of features will be denoted as Fand features will be
denoted as f. The set of co-features will be denoted as Fand co-features will
be denoted as f. Generally all notions without the asterisk will be connected
with features and all analogically notions for co-features will be marked with
the asterisk. M{f, f }is the value in the matrix Mbut it depends on the user
assumptions whether M{f, f }=M(f, f)orM{f, f}=M(f,f).
c1c2c3
r11 2 3
r24 5 6
f
1f
2f
3
f11 2 3
f24 5 6
f1f2f3
f
11 2 3
f
24 5 6
Originaltable Rowsasfeatures Rowsasco-features
Fig. 1. Illustration of features and co-features
2.1 βRelation
Let us consider the draft of βrelation β(F×F
)2that joins cells with the
same value. It may be written non formally in the intuitive way as:
(f, f)βv(g, g)M{f, f }=M{g, g}=v
where vVand Vis the set of values from the matrix M. In the case where
Mis a binary one v=0orv=1.
Now it is the time to precise this definition. We want this relation to be
vreflexive, symmetric and vtransitive. The vreflexivity of this relation is
implied by the equality M{f, f}=M{f, f }=vand means that cell is in the
βrelation with itself. The symmetry implies from the symmetry of the equality
146 M. Michalak
M{f, f}=M{g, g}. The definition of vtransitivity is more complicated but
is also intuitive. This property is the basis of the biclustering relation.
Let us start for the single cell from the matrix M{f, f }=v. For every
g∈F
that M{f, g}=vwe will claim that (f, f)βv(f, g). Also for every
g∈F that M{g, f }=vwe will claim that (f, f)βv(g, f ).
If we want two pairs (f, f),(g, g), where f=gand f=g(two different
cells from the matrix M)tobeintherelationβ(F
)2we have to satisfy
the following condition:
(f, f)βv(g, g)|f=g,f =g(f, f )βv(g, f)(f, f )βv(f, g)
(f, g)βv(g, g)(g, f)βv(g, g)
Now let us extend the definition of the relation βvfor some subsets of features
and co-features. Subset of features F⊆Fand F⊆F
are in the relation
β(2F×2F)2iff every feature fFis in the relation βvwith every co-
feature fF.
vF⇔∀
fFfFvf
So, if there is a set of features F⊆Fand a subset of co-features F⊆F
and
vFthen the cell C=M{k, k}will be in the relation βvwith F×Fiff:
PF×F
vC
what will be written in the shorten way as:
(F×F)βvC
The final definition of vtransitivity has a form as follows:
(a, a)βv(b, b)(b, b)βv(c, c)({a, b}×{a,b
})βv(c, c)(a, a)βv(c, c)
2.2 βDescription Class
Apart from vsymmetry, reflexivity and vtransitivity the βrelation has an-
other notion analogical for the equivalence relation: βdescription class will be
defined similarly as the equivalence class. The one thing that differs βdescription
class from the equivalence class is that every cell may have at least one but not
only the one class. βdescription class is defined as an ordered pair of subsets of
Fand Fas follows:
[(f, f)]βv=(F, F ),F⊆F,F⊆F
iff:
i) (f, f)βv(F×F)
ii) e/Fe/F¬[(f, f )βv(F∪{e})×(F∪{e})]
Foundations of Rough Biclustering 147
Table 1 . Left: {f2,f
3,f
4}×{f
2,f
3,f
4,f
5}is not in the relation β1with {f6,f
7}.
Right: {f2,f
3,f
4}×{f
2,f
3,f
4,f
5}β1{f6,f
7}.
f1f2f3f4f5f6f7
f
10 0 0 0 0 0 0
f
201 1 1 0 0 0
f
301 1 1 0 0 0
f
401 1 1 0 0 0
f
501 1 1 0 0 0
f
60 0 0 0 0 0 0
f
70 0 0 0 0 10
f
80 0 0 0 0 0 0
f1f2f3f4f5f6f7
f
10 0 0 0 0 0 0
f
201 1 1 010
f
301 1 1 010
f
401 1 1 010
f
501 1 1 010
f
60 0 0 0 0 0 0
f
701 1 1 010
f
80 0 0 0 0 0 0
In other words it may be said that the βdescription class are the largest (in the
sense of inclusion) subset of features and co-features which Cartesian product
gives cells that are all in the βvrelation with the given one cell. Now we see why
it is possible for the cell to have more than one βdescription class. The set of
all βdescription classes from matrix Mwill be called the dictionary: DM.
Let us consider the following relation R(Dv
M)⊆D
v
M×Dv
M.Twoβdescription
classes d1=(F1,F
1),d
2=(F2,F
2) are in the relation R(Dv
M)whenatleastone
of the following conditions is satisfied:
i) F1F2=∅∧F
1F
2=
ii) d3∈DMd1R(Dv
M)d3d3R(Dv
M)d2
This relation is the equivalence relation: the symmetry and the reflexivity are
given by the first condition and the transitivity is given by the second condition.
The partition of Rintroduced by this relation will be denoted as:
Πv
M={π1
2,···
p}
This means that πiis the set of βdescription classes. For every πitwo sets
may be defined, connected with predecessors and successors of pairs that are
elements of πi.
p(πi)={F⊆F:F⊆F (F, F )πi}
s(πi)={F⊆F
:FF(F, F )πi}
2.3 Rough Biclusters
Now we are able to define the rough sets based approach for the biclustering
problem. Every πigenerates the one rough bicluster biin the following way:
i) lower bound: bi=(
p(πi),s(πi))
ii) upper bound: bi=(
p(πi),s(πi))
Bicluster biwill be rough in the case when the sum and the join of the πiwill
be different or exact otherwise. The only possibility for the bito be exact is that
card(πi)=1.
148 M. Michalak
3 Case Study
Let us consider the following discrete value matrix Mpresented in the
Table 2. It contains ten rows and ten columns. Arbitrary rows are considered as
co-features (that is why their labels are with asterisks) and columns are consid-
ered as features.
Table 2 . Matrix M
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
We are interested in finding biclusters of some subset of matrix values: biclus-
ters of ones and twos. In the first step we are looking for β1description classes.
We obtain the dictionary D1
Mthat contain two classes:
D1
M={({f8,f
9,f
10},{f
1,f
2,f
3,f
4,f
5}),({f6,f
7,f
8,f
9,f
10},{f
1,f
2,f
3})}
Both classes are shown in Tables 3(a) and 3(b).
As we see the partition of R(D1
M) has only one element (both β1description
classes has non-empty intersection), so we obtain just one rough bicluster and it
form is:
b1=({f8,f
9,f
10},{f
1,f
2,f
3})b1=({f6,f
7,f
8,f
9,f
10},{f
1,f
2,f
3,f
4,f
5})
Table 3 . The dictionary D1
M
(a) First β1description class.
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
(b) Second β1description class.
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
Foundations of Rough Biclustering 149
Now let us see the dictionary D2
M–italsohastwoβdescription classes
(Tables 4(a) and 4(b)).
Table 4 . The dictionary D2
M
(a) First β2description class.
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
(b) Second β2description class.
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
Also the partition of R(D2
M) has only one element and the rough bicluster
has the form:
b2=({f3},{f
8})b2=({f1,f
2,f
3,f
4,f
5,f
7,f
8,f
9},{f
6,f
7,f
8,f
9,f
10})
The last one table (Table 5) shows all rough biclusters. Lower approximations
are marked with the darker background and upper approximations are marked
with the lighter background.
Table 5 . Rough biclusters
f1f2f3f4f5f6f7f8f9f10
f
10 0 0 0 0 1 1 1 1 1
f
20 0 0 0 0 1 1 1 1 1
f
30 0 0 0 0 1 1 1 1 1
f
40 0 0 0 0 0 0 1 1 1
f
50 0 0 0 0 0 0 1 1 1
f
60 0 2 2 2 0 0 0 0 0
f
70 0 2 2 2 0 0 0 0 0
f
82 2 2 2 2 0 2 2 2 0
f
92 2 2 0 0 0 2 2 2 0
f
10 2 2 2 0 0 0 2 2 2 0
4 Rough and Exact Biclusters
We may see that from the formal point of view every βdescription class may be
also considered as the bicluster. Building rough biclusters from the partition of
the dictionary gives the opportunity of generalisation and limitation the number
150 M. Michalak
of biclusters. It should depend from the user whether combine βdescription
classes into rough biclusters or just use exact ones. It also should be marked
that if πiin the partition of the relation R(Dv
M) has the only one element the
bicluster generated from this element will be also exact.
5 Conclusions
This article brings the new look for the rough description of the biclustering
problem. In the opposition to other biclustering algorithms referring to the rough
sets theory, this rough biclustering approach gives the formal definition of rough
bicluster. The short example described in this article shows also two levels of
interpreting the bicluster in the data: from the one hand we use the definition of
rough bicluster and the possibility of generalisation (biclusters with their lower
and upper approximation) and from the other hand we may stop the analysis
at the step where the βdescription classes are generated. This is the choice
between lower number of more general inexact biclusters or bigger number of
small exact ones.
Further works will focus on finding the algorithm of generating βdescription
classes what will make it possible to apply rough biclustering approach to the
real data sets. If we consider the wide applicability of biclustering algorithm,
especially the medical and bioinformatical ones, the potential of rough bicluster
becomes really impressive.
Acknowledgements. This work was supported by the European Community
from the European Social Fund.
References
1. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. of the 8th Int.
Conf. on Intell. Syst. for Mol. Biol., pp. 93–103 (2000)
2. Emilyn, J.J., Ramar, K.: Rough Set Based Clustering of Gene Expression Data: A
Survey. Int. J. of Eng. Sci. and Technol. 2(12), 7160–7164 (2010)
3. Emilyn, J.J., Ramar, K.: A Rough Set Based Gene Expression Clustering Algo-
rithm. J. of Comput. Sci. 7(7), 986–990 (2011)
4. Emilyn, J.J., Ramar, K.: A Rough Set Based Novel Biclustering Algorithm for
Gene Expression Data. In: Int. Conf. on Electron. Comput. Techn., pp. 284–288
(2011)
5. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Dis-
covering Clusters in Large Spatial Databases with Noise. In: Proc. of 2nd Int. Conf.
on Knowl. Discov. and Data Min., pp. 226–231 (1996)
6. Hartigan, J.A.: Direct Clustering of a Data Matrix. J. Am. Stat. Assoc. 67(337),
123–129 (1972)
7. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Ob-
servations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., pp. 281–297
(1967)
Foundations of Rough Biclustering 151
8. Michalak, M., Stawarz, M.: Generating and Postprocessing of Biclusters from
Discrete Value Matrices. In: Jedrzejowicz, P., Nguyen, N.T., Hoang, K. (eds.)
ICCCI 2011, Part I. LNCS, vol. 6922, pp. 103–112. Springer, Heidelberg (2011)
9. Pawlak, Z.: Rough Sets. J. of Comput. and Inf. Sci. 5(11), 341–356 (1982)
10. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer
Academic Publishing (1991)
11. Pensa, R., Boulicaut, J.F.: Constrained Co-clustering of Gene Expression Data. In:
Proc. SIAM Int. Conf. on Data Min., SDM 2008, pp. 25–36 (2008)
12. Wang, R., Miao, D., Li, G., Zhang, H.: Rough Overlapping Biclustering of Gene
Expression Data. In: Proc. of the 7th IEEE Int. Conf. on Bioinforma. and Bioeng.
(2007)
13. Yang, E., Foteinou, P.T., King, K.R., Yarmush, M.L., Androulakis, I.P.: A Novel
Non-overlapping biclustering Algorithm for Network Generation Using Living Cell
Array Data. Bioinforma. 17(23), 2306–2313 (2007)
... In this paper the new algorithm of biclustering the binary or the discrete value matrix is presented. It is derived from the Ward's idea of hierarchical clustering [14] and the previous approach of rough biclustering [7,6]. It is based on the assumption that the set of exact biclusters that cover all of the considered data is given and then its generalisation takes the place. ...
Conference Paper
Full-text available
The article presents the new algorithm for hierarchical biclustering: HiBi. It is dedicated to the analysis of the discrete data. The algorithm uses the set of exact biclusters as the input. In this approach results of exact biclustering algorithm eBi are used as the input. As a result of combining biclusters into the more general one, HiBi gives the set of inexact biclusters. The algorithm is hierarchical so the final result can be chosen after the algorithm performance. All experiments were performed on artificial datasets.
Article
We introduce a multidimensional multiblock clustering (MDMBC) algorithm in this paper. MDMBC can generate overlapping clusters with similar values along clusters of dimensions. The parsimonious binary vector representation of multidimensional clusters lends itself to the application of efficient meta-heuristic optimization algorithms. In this paper, a hill-climbing (HC) greedy search algorithm has been presented that can be extended by several stochastic and population-based meta-heuristic frameworks. The benefits of the algorithm are demonstrated in a bi-clustering benchmark problem and in the analysis of the Leiden higher education ranking system, which measures the scientific performance of 903 institutions along four dimensions of 20 indicators representing publication output and collaboration in different scientific fields and time periods.
Conference Paper
Full-text available
The article presents the new algorithm of biclustering, based on the rough biclustering foundations. Each rough bicluster is considered as the ordered pair of its lower and upper approximation. Notions of lower and upper bicluster approximation are derived from the Pawlak rough sets theory. Every considered discrete value in the data can be covered with more than one rough bicluster. The presented algorithm is hierarchical, so the number of biclusters can be controlled by the user.
Article
Full-text available
Problem statement: Microarray technology helps in monitoring the expre ssion levels of thousands of genes across collections of related sa mples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets wa s to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed fo r identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene pa tterns. The proposed novel clustering technique (RCGED) provides an efficient way of finding the hi dden and unique gene expression patterns. It overcomes the restriction of one object being place d in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent becaus e it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with R ough Fuzzy K Means algorithm.
Article
Full-text available
Microarray technology has emerged as a boon to simultaneously monitor the expression levels of thousands of genes across collections of related samples. The main goal in the analysis of large and heterogeneous gene expression datasets is to identify groups of genes that get expressed in a set of experimental conditions. Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. This paper proposes to develop a novel biclustering technique (RBGED) that is based on rough set theory. This algorithm simultaneously clusters both the rows and columns of a data matrix. The advantage is that it overcomes the restriction of one object belonging to only one cluster. This algorithm is intelligent because it automatically determines the optimum number of clusters. A theoretical understanding of the proposed algorithm is analyzed and case studied with Rough Fuzzy k means algorithm.
Conference Paper
Full-text available
In many applications, the expert interpretation of co- clustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a col- lection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algo- rithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional case (e.g., using the so-called must-link and cannot-link con- straints). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for in- terval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative co- clustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objec- tive function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data.
Conference Paper
Full-text available
This paper presents a new approach for the biclustering problem. For this purpose new notions like half-bicluster and biclustering matrix were developed. Results obtained with the algorithm BicDM (Biclustering of Discrete value Matrix) were compared with some other methods of biclustering. In this article the new algorithm is applied for binary data but there is no limitation to use it for other discrete type data sets. In this paper also two postprocessing steps are defined: generalization and filtering. In the first step biclusters are generalized and after that only those which are the best become the final set - weak biclusters are filtered from the set. The usage of the algorithm makes it possible to improve the description of data with the reduction of bicluster number without the loss of information. The postprocessing was performed on the new algorithm results and compared with other biclustering methods.
Article
We investigate in this paper approximate operations on sets, approximate equality of sets, and approximate inclusion of sets. The presented approach may be considered as an alternative to fuzzy sets theory and tolerance theory. Some applications are outlined.
Article
Clustering algorithms are now in widespread use for sorting heterogeneous data into homogeneous blocks. If the data consist of a number of variables taking values over a number of cases, these algorithms may be used either to construct clusters of variables (using, say, correlation as a measure of distance between variables) or clusters of cases. This article presents a model, and a technique, for clustering cases and variables simultaneously. The principal advantage in this approach is the direct interpretation of the clusters on the data.