ArticlePDF Available

Rough Set Methods for Attribute Clustering and Selection

Authors:

Abstract and Figures

In this study we investigate methods for attribute clustering and their possible applications to the task of computation of decision reducts from information systems. We focus on high-dimensional datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques either can be extremely computationally intensive or can yield poor performance in terms of the size of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of similar—in some sense interchangeable—attributes, it is possible to significantly decrease computation time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size). We examine several criteria for attribute clustering, and we also identify so-called garbage clusters, which contain attributes that can be regarded as irrelevant.
Content may be subject to copyright.
This article was downloaded by: [Uniwersytet Warszawski], [Dominik Slezak]
On: 10 September 2014, At: 02:11
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Applied Artificial Intelligence: An
International Journal
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/uaai20
Rough Set Methods for Attribute
Clustering and Selection
Andrzej Janusza & Dominik Ślęzakab
a Institute of Mathematics, University of Warsaw, Warsaw, Poland
b Infobright Inc., Warsaw, Poland
Published online: 14 Mar 2014.
To cite this article: Andrzej Janusz & Dominik Ślęzak (2014) Rough Set Methods for Attribute
Clustering and Selection, Applied Artificial Intelligence: An International Journal, 28:3, 220-242, DOI:
10.1080/08839514.2014.883902
To link to this article: http://dx.doi.org/10.1080/08839514.2014.883902
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Applied Artificial Intelligence, 28:220–242, 2014
Copyright © 2014 Taylor & Francis Group, LLC
ISSN: 0883-9514 print/1087-6545 online
DOI: 10.1080/08839514.2014.883902
ROUGH SET METHODS FOR ATTRIBUTE CLUSTERING AND
SELECTION
Andrzej Janusz1and Dominik ´
Sl ˛ezak1,2
1Institute of Mathematics, University of Warsaw, Warsaw, Poland
2Infobright Inc., Warsaw, Poland
In this study we investigate methods for attribute clustering and their possible applications to
the task of computation of decision reducts from information systems. We focus on high-dimensional
datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques
either can be extremely computationally intensive or can yield poor performance in terms of the size
of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search
with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of
similar—in some sense interchangeable—attributes, it is possible to significantly decrease computa-
tion time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size).
We examine several criteria for attribute clustering, and we also identify so-called garbage clusters,
which contain attributes that can be regarded as irrelevant.
INTRODUCTION
In many applications, available information about objects from a con-
sidered universe has to be reduced. This reduction might be required in
order to limit resources that are needed by algorithms analyzing the data
or to prevent crippling their performance by noisy or irrelevant attributes
(Kohavi and John 1997; Mitchell 1997). Many of the popular attribute subset
selection methods are derived from the theory of rough sets (Pawlak 1991,
´
Swiniarski and Skowron 2003).
In the rough set approach, the reduction of an object description is usu-
ally done by following the notion of a reduct—a minimal set of attributes that
sufficiently preserves information allowing the discernment of objects with
different properties, for example, belonging to different decision classes.
The techniques for computation of decision reducts have been widely dis-
cussed in literature related to data analysis and knowledge discovery. Their
practical significance for tasks such as attribute selection, rule induction,
Address correspondence to Dominik ´
Sl˛ezak, Institute of Mathematics, University of Warsaw, ul.
Banacha 2, Warsaw, 02-097, Poland. E-mail: slezak@mimu.edu.pl
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 221
and data visualization is unquestionable (Janusz and Stawicki 2011; Widz
and ´
Sl˛ezak 2012).
The discussed approach to attribute reduction can be used for a wide
spectrum of high-dimensional data types. In this article, we focus on gene
expressions. There is a great deal of literature showing how to apply the
rough set approach to microarray data analysis (Fang and Grzymała-Busse
2006; Janusz and Stawicki 2011; Midelfart et al. 2002). In Gru˙
zd´z, Ihnatowicz,
and ´
Sl˛ezak (2006), Janusz and ´
Sl˛ezak (2012), and ´
Sl˛ezak (2007), it was dis-
cussed that, given such a huge amount of attributes in microarray datasets,
it is indeed better to combine the standard computation mechanisms with
some elements of attribute clustering. Therefore, this work aims at exper-
imental verification of these ideas by combining rough set algorithms for
attribute reduction with rough set inspired methods for attribute clustering.
This study is a continuation of research described in Janusz and ´
Sl˛ezak
(2012), in which we focused on identification of attribute dissimilarity mea-
sures that are appropriate for finding groups of interchangeable attributes.
We extend this work by an in-depth investigation of the selected gene-
clustering results. We also propose two algorithms for computation of
multiple decision reducts. Those algorithms combine the greedy heuristic
approach and attribute clustering results in order to obtain a set of diverse
and short reducts. We evaluate the proposed methods in a series of experi-
ments, and we discuss the impact of attribute clustering on the performance
of greedy reduct computation heuristics.
This article is organized as follows: “Rough Set-Based Attribute Selection
and Clustering” discusses basic notions from the rough set theory, that
are related to the attribute reduction problem and recalls some popular
algorithms for the computation of reducts. It also outlines our intuition
behind combining attribute reduction with attribute clustering. “Framework
for Experimental Validation” reports our experimental framework for utiliz-
ing gene-clustering methods for computation of reducts, and “Experiments
with Dissimilarity Functions” presents the evaluation of the proposed mod-
ification to the permutation-based reduct computation algorithm. “Analysis
of Selected Gene-Clustering Results” investigates selected gene-clustering
results. “Randomized Greedy Computation Decision Reducts” shows how
those observations can be used to overcome major issues related to the
greedy computation of reducts for high-dimensional datasets. “Concluding
Remarks” concludes the paper.
ROUGH SET-BASED ATTRIBUTE SELECTION AND CLUSTERING
In the rough set theory, the available information about the considered
universe is usually represented in a decision system understood as a tuple
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
222 A. Janusz and D. ´
Sl˛ezak
Sd=(U,A,d), where Uis a set of objects, Ais a set of attributes, and d
is a distinguished attribute called a decision. By a decision reduct of Sd,we
usually mean a compact yet informative subset of available attributes. The
most basic type of decision reduct is a subset of attributes DR Asatisfying
the following conditions:
1. For any pair u,uUof objects belonging to different decision classes
(i.e., d(u)= d(u)), if uand uare discerned by A(i.e., aAsuch that
a(u)= a(u)), then they are also discerned by DR.
2. There is no proper subset DR DR for which the first condition holds.
A decision reduct is a set of attributes that are sufficient to discern objects
from different decision classes. At the same time, this set has to be minimal,
in a sense that no further attributes can be removed from DR without los-
ing the discernibility property. For example, {a3,a5}and {a3,a6}are decision
reducts of the decision system Sdfrom Table 1 . The first listed condition for a
decision reduct is often replaced by some other requirements for preserving
information about the decision while reducing attributes. In this article, for
simplicity, we restrict ourselves to the aforementioned discernibility-based
criterion, which is well documented in the rough set literature (Bazan et al.
2000; Nguyen 2006).
Many algorithms for attribute reduction are described, utilizing various
greedy or randomized search approaches (Janusz and Stawicki 2011). Most
of them refer to the search for an optimal (shortest, generating minimum
number of rules, etc.) decision reduct or some larger ensembles of deci-
sion reducts that constitute efficient classification models (Janusz 2012).
We can also consider their approximate versions, which are especially use-
ful for noisy datasets (´
Sl˛ezak 2000). For instance, we can require that only a
percentage of object pairs satisfies the first condition for a decision reduct.
Moreover, we may extend the discernibility notion toward the criteria of a
TABL E 1 An Exemplary Decision Table Sdwith a Binary Decision
a1a2a3a4a5a6a7a8d
u1122001011
u2011110101
u3120102101
u4010010010
u5201021001
u6102020020
u7011202101
u8000211110
u9210011000
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 223
sufficient dissimilarity or a discernibility in a degree, which are useful in the
case of numeric data (Jensen and Shen 2009).
A commonly used technique for the computation of decision reducts is
the greedy approach explained by Algorithm 1. In this algorithm, Qd:A×
2AR+∪{0}corresponds to an attribute quality measure that is mono-
tonic, in a sense that it decreases with an increasing size of the set from its
second argument. This function also needs a property that it equals 0 if the
second argument is a superreduct, that is, a set of attributes that discern
all objects from different decision classes. A number of such functions were
adopted for the purposes of reduct computation (Janusz and Stawicki 2011;
´
Sl˛ezak 2000).
The utilization of the greedy heuristic usually leads to a short reduct (by
means of its cardinality). However, two major disadvantages of this approach
are its high computational complexity with regard to the total number of
attributes in the data and the fact that it can be used to construct only
a single reduct. A viable alternative that can overcome those issues is the
permutation-based method, called ordered reducts (Janusz and ´
Sl˛ezak 2012;
Wróblewski 2001), explained by Algorithm 2.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
224 A. Janusz and D. ´
Sl˛ezak
Appropriately extended classical notions of the rough set theory can
be successfully applied as an attribute selection framework for the analysis
of large and complex data sets, such as microarray data. However, there is
yet another possibility for scaling the original rough set notions with regard
to the number of attributes. The basic idea is to utilize additional informa-
tion about groups of attributes that can potentially replace each other while
constructing reducts from the data.
For a moment, let us imagine that some miraculous oracle could iden-
tify such groups of attributes in the data. In Janusz and ´
Sl˛ezak (2012), we
proposed a method that allows us to incorporate this additional knowledge
into the reduct computation process by influencing the generation of per-
mutations in the ordered reducts algorithm. This process is explained by
Algorithm 3. We refer to a fusion of Algorithm 3 into the permutation-based
method as ordered reducts with diverse attribute drawing (OR-DAD).
The knowledge regarding groups of interchangeable attributes can often
be acquired from domain experts or external knowledge bases such as
domain ontologies (e.g., the Gene Ontology). It can also be obtained auto-
matically by utilizing attribute clustering methods (Jain, Murty, and Flynn
1999). From a perspective of the microarray data analysis, such an idea
refers to a task of gene clustering (Baldi and Hatfield 2002; McKinney
et al. 2006). In Gru˙
zd´z, Ihnatowicz, and ´
Sl˛ezak (2006), we reported that
the gene-clustering outcomes may meet expert expectations to more extent
when they are based on information-theoretic measures, rather than on
standard numeric and rank-based correlations. In other words, interpreting
genes as attributes with some approximate dependencies between them may
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 225
bring better results than treating them as numeric vectors. In ´
Sl˛ezak (2007),
we suggested that attribute clustering can be conducted also by means of
dissimilarity functions based on discernibility between objects, utilized as a
form of measuring degrees of functional dependencies between attributes.
We also proposed a mechanism wherein reducts could be searched in a
data table consisting only of the previously computed cluster representa-
tives, with their occurrence in reducts used as feedback for the clustering
refinements.
It is reasonable to use analogous criteria for preserving information
about a decision while reducing attributes and measuring distances between
them. As an example, let us compare the attributes a5and a6in Table 1 .
In the case of most of pairs of objects, a5discerns them, iff a6does. This may
indicate either that there are relatively many pairs of reducts of the form
B∪{a5}and B∪{a6},BA\{a5,a6}, or that the attributes a5and a6do not
occur in reducts at all. Reducts {a3,a5}and {a3,a6}are an illustration of this
kind of replaceability. The attributes that are likely to be interchangeable
can be easily noticed by studying a dendrogram generated by a hierarchical
clustering algorithm. An example of such a tree generated for the decision
system from Table 1 is presented in Figure 1. As expected, the attributes a5
and a6are merged into a single cluster as the second pair.
The methods of an attribute reduction and grouping can be put together
in many different ways. As an example, in Abeel and colleagues. (2010),
it is noted that so-called signatures (irreducible subsets of genes provid-
ing enough information about probabilities of specific types of cancer—
the reader may notice an interesting correspondence of this notion to a
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
226 A. Janusz and D. ´
Sl˛ezak
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
attribute 1
attribute 2
attribute 3
attribute 4
attribute 5
attribute 6
attribute 7
attribute 8
FIGURE 1 An attribute-clustering tree for the decision table from Table 1 , obtained by applying the
agglomerative nesting algorithm in combination with the direct discernibility dissimilarity function.
probabilistic version of a decision reduct in ´
Sl˛ezak 2000) can contain genes
that are interchangeable with the others because of data correlations or
multiple explanations of some biomedical phenomena. Moreover, such an
interchangeability can be observed not only for single elements but also for
whole sets of attributes.
FRAMEWORK FOR EXPERIMENTAL VALIDATION
We conducted a series of experiments to verify usefulness of the attribute
clustering for scalable computation of decision reducts. We wanted to find
answers to two main questions. The first was whether the attribute group-
ing can speed up searching for reducts. The second question was related
to a quality of reducts generated using different clustering methods—we
wanted to check if such reducts are more concise. The minimal number of
attributes is not the only possible optimization criterion for decision reducts
(´
Sl˛ezak 2000;Wr´oblewski 2001). However, it is indeed the most straightfor-
ward idea to rely on minimal reducts in order to clearly visualize the data
dependencies.
In the experiments, we use a microarray dataset from the Rough Sets
and Current Trends in Computing (RSCTC) 2010 conference competi-
tion aimed at constructing classifiers with the highest possible accuracy
(Wojnarski et al. 2010). We focus on this specific dataset because—although,
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 227
currently, we do not evaluate the obtained reducts by means of accuracy
of classifiers that they are yielding—this will be the next step of our inves-
tigation, leading toward the ability to compare reduct-based classification
models with the competition winners.
Microarrays are usually described by many thousands of attributes whose
values correspond to expression levels of genes. The considered dataset is
related to the investigation of a chronic hepatitis C virus role in the patho-
genesis of HCV-associated hepatocellular carcinoma. It contains data on
124 tissue samples described by 22,277 numeric attributes (genes). It was
obtained from the ArrayExpress repository (Parkinson et al. 2009; dataset
accession number: E-GEOD-14323). The gene expression levels in this
dataset were obtained using Affymetrix GeneChip Human Genome U133A
2.0 microarrays.
We preprocessed the data by discretizing attributes using an unsuper-
vised method. Every expression-level value of a given gene was replaced by
one of the three labels: over_expressed, normal, or under_expressed. A label
for an attribute aand a sample uis decided as follows:
¯a(u)=over_expressed if a(u)>meana+sda,
under_expressed if a(u)<meanasda,
normal otherwise,
where meanaand sdadenote the mean and the standard deviation of
expression level values of ain the whole dataset. We proceed with such
discretization for the sake of simplicity. One might also apply other dis-
cretization techniques (Bazan et al. 2000, Janusz and Stawicki 2011) or utilize
some rough set-based approaches that do not require explicit discretization
at all (Jensen and Shen 2009;´
Sl˛ezak 2007).
We operate on relatively simple rough set-motivated dissimilarity func-
tions that refer to the comparison of attributes’ abilities to discern important
pairs of objects. The first considered function, called a direct discernibility
function, is a ratio between a number of pairs of objects from different deci-
sion classes that are discerned by exactly one attribute to a number of such
objects discerned by at least one of the compared attributes. It can be written
down in a way that emphasizes its analogy to some standard measures used
in data clustering (Jain, Murty, and Flynn 1999; Kaufman and Rousseeuw
1990).
direct(a,b)=1|{(u,u):d(u)= d(u)a(u)= a(u)b(u)= b(u)}|
|{(u,u):d(u)= d(u)(a(u)= a(u)b(u)= b(u))}| .
We also verified the usefulness of two other discernibility-based dissimilarity
functions. The relative discernibility, described in more detail in Janusz and
´
Sl˛ezak (2012); takes into account the fact that some pairs of objects belong-
ing to different decision classes are more difficult to discern than the others.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
228 A. Janusz and D. ´
Sl˛ezak
It assigns higher weights to pairs of objects from different decision classes,
which are discerned by a lower number of attributes. The full discernibility
function does not take into account decision classes. Instead, it measures a
ratio between the total number of objects discerned by exactly one attribute to
the total number of objects discerned by at least one of the attributes. These
definitions should be regarded as just a few of many possible mathemati-
cal formulations of the basic intuition that an attribute dissimilarity measure
should help to identify groups of attributes that are interchangeable within
the same reducts.
In order to assess an impact of different attribute-clustering methods on
computation of reducts, in experiments we clustered the genes using several
techniques. We combined the discernibility-based functions with the agglom-
erative nesting (agnes), which is a hierarchical grouping algorithm (Jain,
Murty, and Flynn 1999; Kaufman and Rousseeuw 1990). We compared it with
k-means and agnes algorithms, working on dissimilarities computed using
Euclidean distance on nondiscretized data. We also checked clusterings
based on correlations between values of attributes, coupled with the agnes
algorithm. As a reference, we took results obtained for a random clustering,
which is actually equivalent to no clustering at all. We additionally checked
the worst-case scenario in which the attributes are grouped so that the most
dissimilar genes (according to the direct discernibility function) are in the
same clusters.
In each repetition of the experiment, we generated 100 reducts for
all the compared clustering methods. For the reduct computation we
used Algorithm 2. The permutations for each run of the algorithm were
generated based on the clusterings corresponding to the tested group-
ing methods. Algorithm 3 explains the permutation construction process.
In practice, there is no need to pregenerate a permutation for the reduct
computation, because it might be an integral part of the algorithm. However,
in experiments, we explicitly generated the permutations for the sake of
reproducibility of the results.
EXPERIMENTS WITH DISSIMILARITY FUNCTIONS
Table 2 summarizes measurements of computation times. For each clus-
tering method, the mean and standard deviations of 20 independent repe-
titions of the experiment are given. The results clearly show the advantage
of using the direct discernibility function in combination with a hierarchi-
cal clustering algorithm to speed up the generation of decision reducts.
Times obtained by this method are significantly lower than those of all other
approaches. The significance was measured using a t-test (Demšar 2006),
and the p-values obtained at 0.95 confidence level were all lower than 1010.
For instance, the times obtained by this method when grouping was made
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 229
TABL E 2 Average Computation Times of 100 Reducts for Permutations Produced Using Different
Clusterings
Clustering Method: 10 Clusters 100 Clusters 1000 Clusters
agnes & direct 3.536 ±0.112 3.151 ±0.097 3.015 ±0.117
agnes & relative 4.680 ±0.156 4.164 ±0.161 3.705 ±0.134
agnes & full disc. 5.350 ±0.157 5.287 ±0.244 5.018 ±0.280
agnes & correlation 4.443 ±0.154 3.999 ±0.189 3.805 ±0.157
agnes & Euclidean 3.965 ±0.158 4.430 ±0.251 4.839 ±0.199
k-means & Euclidean 4.872 ±0.239 4.434 ±0.229 4.545 ±0.148
random 4.597 ±0.155 4.665 ±0.190 4.543 ±0.147
worst 5.485 ±0.219 9.901 ±0.753 11.929 ±0.628
into 1000 clusters are, on average, lower by 34% than the corresponding
times for the random method. Moreover, robustness of the previously dis-
cussed tendency is confirmed in Tabl e 2 by the stability with regard to a
number of considered clusters.
The results obtained for the relative discernibility function may be
regarded as disappointing. The tested weighting schema seems to degrade
the performance of the reduct computation algorithm, especially when
a low number of gene clusters is considered. The explanation of this
behavior will be within the scope of our future research. The experi-
ments show that distinguishing between the cases that are easier or more
difficult to discern might not be necessary; however, a better-adjusted math-
ematical formula for such distinguishing may lead to more promising
outcomes.
The results from Table 2 obtained for the two Euclidean distance-based
clusterings also show a clear advantage of using hierarchical methods for
grouping genes in microarray data. Actually, the times for the k-means clus-
tering with the Euclidean settings cannot be regarded as statistically different
from the results of random clusterings at the level of 1000 generated clusters.
For each clustering method, we also measured an average size of the
generated reducts. This statistic reflects a quality of reducts, both by means
of data-based knowledge representation and ability to construct efficient
classification models. These results are displayed in Table 3 . The standard
deviations given in this table are not computed directly from the sizes
of the reducts but from the average sizes of 100 reducts in each of the
20 experiment runs. This explains such low values of this statistic.
The direct discernibility method significantly outperformed other
approaches also in terms of the reduct size. As before, the significance was
checked using a t-test. On average, decision reducts generated by using the
hierarchical clustering based on the direct discernibility function are shorter
than those computed from the random clusterings by nearly 1.5 genes. They
were also shorter than the reducts computed for the agnes algorithm and for
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
230 A. Janusz and D. ´
Sl˛ezak
TABL E 3 Average Sizes of 100 Reducts Computed for Different Clusterings
Clustering method: 10 Clusters 100 Clusters 1000 Clusters
agnes & direct 11.209 ±0.099 11.095 ±0.087 11.103 ±0.093
agnes & relative 12.102 ±0.132 11.790 ±0.134 11.638 ±0.114
agnes & full disc. 12.808 ±0.101 12.747±0.122 12.684 ±0.116
agnes & correlation 12.449 ±0.104 12.236 ±0.112 12.175 ±0.092
agnes & Euclidean 11.709 ±0.123 11.860 ±0.118 12.198 ±0.114
k-means & Euclidean 12.590 ±0.089 12.228 ±0.069 12.283 ±0.130
random 12.519 ±0.127 12.470 ±0.092 12.471 ±0.128
worst 12.731 ±0.133 14.800 ±0.159 15.624 ±0.180
TABL E 4 Average Minimal Sizes among 100 Reducts Computed for Different Clusterings
Clustering Method: 10 Clusters 100 Clusters 1000 Clusters
agnes & direct 8.900 ±0.307 8.950 ±0.223 9.200 ±0.410
agnes & relative 9.600 ±0.502 9.250 ±0.444 9.550 ±0.510
agnes & full disc. 10.250 ±0.550 10.200±0.523 10.150 ±0.489
agnes & correlation 10.100 ±0.447 9.700 ±0.470 9.700 ±0.571
agnes & Euclidean 9.500 ±0.512 9.250 ±0.444 9.600 ±0.502
k-means & Euclidean 10.000 ±0.458 9.650 ±0.489 9.600 ±0.502
random 9.85 ±0.489 9.900 ±0.447 10.000 ±0.324
worst 9.950 ±0.394 10.900 ±0.640 11.550 ±0.604
the Euclidean distances by over 0.5 gene. It confirms that a proper attribute
clustering increases efficiency of the reduct computation methods.
Because reducts are often computed in order to create a concise repre-
sentation of data (e.g., for a convenient visualization, see Widz and ´
Sl˛ezak
2012) we also measure sizes of the shortest reducts computed in each of
the 20 repetitions of the experiment. These results are shown in Table 4 .
They additionally confirm the importance of considering a specific decision
problem as a context when forming groups of genes. The attribute dissim-
ilarity functions that do not refer to a given decision task perform worse
than those taking the decision attribute into account. The best illustration
for this fact are the results obtained for the full discernibility function, which
are significantly worse than random. The full discernibility is a measure that
is similar to the direct discernibility measure but it, neglects the decision
attribute. This leads to a radical change in the obtained results—from the
best to worse-than-random.
ANALYSIS OF THE SELECTED GENE-CLUSTERING RESULTS
We manually investigated results of different clusterings in order to
gain some insights on factors that influence the reduct computation effi-
ciency. We noticed that the most successful clusterings, which are based on
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 231
TABL E 5 Distribution of Attributes in Clusterings into 10 Groups Using the Agnes Algorithm
Method: Gr.1 Gr.2 Gr.3 Gr.4 Gr.5 Gr.6 Gr.7 Gr.8 Gr.9 Gr.10
direct 13855 1814 1601 609 1010 1040 1248 1052 44 4
relative 6557 1923 3096 6536 1071 1279 991 565 226 33
full disc. 21565 356 186 106 7 18 18 15 5 1
correlation 1288 1750 2533 2127 801 1902 2867 3286 3266 2457
Euclidean 3818 1575 1161 290 9545 404 5452 28 1 3
the direct discernibility method combined with the agnes algorithm, are
significantly imbalanced with regard to the number of attributes in each
group. For instance, the distribution of attributes for the clustering into
10 groups is shown in Table 5 . Analogical distributions obtained for the
relative discernibility-, full discernibility-, correlation- and Euclidean-based
clusterings are given for a reference.
The first group in the direct discernibility clustering is highly over rep-
resented, whereas the distribution of genes for the correlation measure is
quite uniform. The distributions for the relative and Euclidean measures can
be placed between the distributions of the direct discernibility and correla-
tion groupings. Finally, the distribution of the full discernibility clustering
is the most imbalanced—nearly 95% of genes are placed in a single group.
This result confirms that the full discernibility function is unable to capture
different roles played by particular genes in the decision problem.
When we compared the clustering trees obtained for those measures,
we found that the direct discernibility measure leads to a skewed outcome,
whereas the trees for the other functions (apart from the full discernibility
measure) are wellbalanced (see Figure 2). However, the dissimilarities
between the clusters—corresponding to relative differences in height of the
tree nodes—are usually larger for the direct discernibility measure.
The presence of a majority cluster among groups of genes may have a
very intuitive explanation. It is common that only a small portion of genes in
data is truly related to a problem indicated by a decision attribute. A major-
ity of genes do not bring any important information, hence, intuitively, a
good gene-clustering algorithm should place them in a separate cluster.
We decided to perform an additional series of tests in order to verify whether
this hypothesis is true for the direct discernibility clustering.
We checked the performance of the permutation-based reduct computa-
tion heuristic (the OR-DAD algorithm) in a case when we drop the attributes
from the majority cluster of the direct discernibility clustering into 10 groups
(using agnes). We modified the permutation generation process so that it
does not include attributes from the majority cluster. The results of the
comparison are shown in Table 6 . By removing attributes from the major-
ity cluster, we decreased average computation time of 100 reducts by 0.162s
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
232 A. Janusz and D. ´
Sl˛ezak
(a) direct discernibility
(b) relative discernibility (c) full discernibility
(d) correlation (e) Euclidean
FIGURE 2 A visualization of the clustering trees for five different gene dissimilarity measures, which
were cut at a height corresponding to the division into 10 groups.
TABL E 6 Performance of the Ordered Reducts (OR) and Ordered Reducts with Diverse Attribute
Drawing (OR-DAD) Algorithms with and without Attributes from the Group 1 of the Direct Discernibility
Clustering into 10 Groups (Results of 20 Independent Repetitions of the Experiment)
Clustering: Ave. Time Ave. Reduct Size Ave. Minimal Reduct Size
OR-DAD (direct disc.) 3.536 ±0.112 11.209 ±0.099 8.900 ±0.307
OR-DAD without gr.1 3.374±0.107 11.067 ±0.083 8.900 ±0.308
OR (random) 4.597 ±0.155 12.519 ±0.127 9.850 ±0.489
OR without gr.1 3.845 ±0.114 11.812 ±0.087 9.550 ±0.604
OR within gr.1 5.318 ±0.201 13.310 ±0.112 10.700 ±0.571
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 233
and their average size by 0.142. Although such an improvement is not very
large, its statistical significance was confirmed by the t-test at 0.95 confidence
level. It is worth mentioning that the time complexity of the permutation-
based heuristic is constant with regard to the number of attributes in the
data, so this difference in performance was solely a result of the omission of
unnecessary (uninformative) attributes from the majority cluster. We addi-
tionally computed reducts from random permutations of genes from the
majority cluster and from random permutations of the remaining attributes.
The reducts constructed from attributes placed in the majority cluster were,
on average, longer by nearly 13%, and their construction was over 38% more
timeconsuming. In fact, their statistics were comparable or even worse than
those obtained for the worst-case clustering scenario (see Tables 2,3,and
4). We may use this observation to improve the existing reduct computation
heuristics.
RANDOMIZED-GREEDY COMPUTATION OF DECISION REDUCTS
Our experiments described in previous sections showed that a proper
attribute-clustering method can significantly improve the permutation-based
reduct algorithm by indicating groups of attributes that are potentially inter-
changeable in many reducts. In our research, we were interested in whether
this observation holds also for the greedy reduct computation methods. The
greedy heuristic often allows us to find a much shorter reduct than those
obtained from the randomized algorithms. For instance, Algorithm 1 com-
bined with the gini gain measure for evaluation of attribute quality, applied
to the hepatitis C data, generates a reduct consisting of only six genes. When
we compare this result to the minimal sizes of the reducts constructed using
the OR-DAD algorithm (see Table 4 ), it is clearly visible that the greedy
reduct is, on average, smaller by two to four attributes. The gini gain mea-
sure could be used also to reformulate constraints in the decision reduct
definition, as proposed in ´
Sl˛ezak (2000). However, in this study we keep our
focus on standard decision reducts and we treat gini gain just as an example
of a greedy evaluation function.
Two major disadvantages of the greedy heuristic are its computational
inefficiency for datasets with a significantly large number of attributes and
the fact that it can be used to generate only a single reduct. For example,
in the described experiment, the computation time needed to construct the
greedy reduct was 544 seconds, which is over 10, 000 times slower than in the
case of the permutation-based algorithm.
The above observation motivated us to measure an impact of attribute
grouping on a computation time of the greedy reduct. We introduced con-
straints to the greedy algorithm that allow selection of only a single attribute
from each cluster. The selection itself was still done in the greedy fashion.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
234 A. Janusz and D. ´
Sl˛ezak
This modification resulted in a significant decrease of time needed for com-
putation of a single greedy reduct—it took 392 seconds when grouped into
10 clusters with the direct discernibility measure (about 28% less). The size
of a reduct obtained in this way was six, which is equal to the classical case.
However, those two greedy reducts differed on two out of six attributes.
In particular, this shows that searching for a single-decision reduct provides
highly incomplete knowledge about dependencies in the data, especially for
such large numbers of attributes. Hence, the approaches aimed at extrac-
tion of larger families of reducts should be preferred (Widz and ´
Sl˛ezak 2012;
Wr ´oblewski 2001).
Following the previous study, we wanted to check if a more significant
improvement could be obtained. We decided to investigate the possibility
of introducing some intelligent attribute search strategies into the greedy
algorithm to accelerate its execution. We also wanted to check whether our
previous observation regarding the majority cluster can bring some benefit
for the greedy computation of decision reducts.
First, we repeated the execution of the greedy algorithm combined with
the clustering into 10 groups, based on the direct discernibility clustering
but without consideration of genes from the majority cluster (i.e., gr.1). The
reduct was generated in 152s which is over 3.5 times faster than in the case
of the standard greedy algorithm. The obtained reduct had the same size
as the original one (i.e., it consisted of six attributes). It differed, however,
on three out of six genes. Interestingly, it also differed on three genes from
the reduct obtained by application of the clustering but with the majority
cluster included.
In the second experiment, we verified efficiency of two reduct genera-
tion heuristics that combine the greedy approach with some randomization
techniques and the utilization of the attribute clustering results. They
were motivated by the random forest algorithm (Breiman 2001)which
constructs an ensemble of decision trees generated from randomized sub-
sets of attributes. Analogically, at each step of the reduct computation
algorithm, only a small subset of randomly chosen attributes can be con-
sidered. This approach is sometimes called the random reducts algorithm
(Algorithm 4) and it was already used in a slightly modified version in, for
example, Janusz (2012) and Janusz and Stawicki (2011).
By the utilization of attribute-clustering results, we may try to bias the
attribute-sampling process and improve the efficiency of the reduct con-
struction. For this purpose we propose two heuristics. In the first, which we
call random reducts with diverse attribute sampling (RR-DAS), attributes are
uniformly sampled from all the clusters. At each step of the algorithm, the
set of attributes to be evaluated contains approximately the same number of
elements from every group. This guarantees maximal diversity of attributes
considered at every step of the algorithm. The search for the best attribute
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 235
in every iteration is performed using the greedy approach. This heuristic is
outlined in Algorithm 5.
The second of the proposed heuristics aims to diversify sets of attributes
considered during different steps of the reduct computation. In this approach,
called ordered reducts with diverse attribute search (OR-DAS), groups
of attributes are permuted and during each iteration the best attribute
is searched within an attribute sample drawn from a single cluster (see
Algorithm 6).
Performances of the aforen mentioned heuristics were compared in
a series of tests on the hepatitis C data. The plots shown in Figure 3
present average results for computation of 100 reducts using the com-
pared algorithms. Average computation times, average reduct sizes, and
average minimal reduct sizes are displayed. The last statistic—average max-
imal overlap—reflects a homogeneity of a set of reducts. For each reduct
DR from a set RS, it computes its maximal percentage of common attributes
with other reducts in the set and takes the mean of those values:
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
236 A. Janusz and D. ´
Sl˛ezak
aveMaxOverlap(RS)=1
|RS|
DRRS
max
DRRS
|DR DR|
|DR|
In the plots, the approach that does not use attribute clusterings (the
random reducts algorithm) is compared with both of the proposed heuris-
tics combined with the direct discernibility clustering (see “Framework for
Experimental Validation”) into 10 groups (labels RR-DAS and OR-DAS).
The impact of removing the majority cluster prior to computation of reducts
(the bars with the label w/o gr.1) is also assessed.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 237
The advantage of using the attribute clustering results for computing
the randomized greedy reducts is clearly visible. The usage of the direct
discernibility clustering not only does speed up the computations, on aver-
age, by approximately 18%, but it also decreases the average and minimal
size of the obtained reducts (on average, by approximately 10%). The
reducts computed with the use of the gene-clustering results were often as
small as the reduct generated using the classical greedy heuristic. The com-
bination of nondeterminism and clustering, however, allowed us to obtain
several different short reducts in a much shorter time. The removal of the
majority cluster brought a further improvement of those results, but in most
of the cases the difference was statistically insignificant.
In all cases, the employment of the clustering results decreased the diver-
sity of the obtained sets of reducts. This can be an issue if the reducts are to
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
238 A. Janusz and D. ´
Sl˛ezak
100 attributes 30 attributes 10 attributes
Random Reducts
RR−DAS
RR−DAS, w/o gr.1
OR−DAS
OR−DAS, w/o gr.1
Avg. computation time of 100 reducts (sec.)
100 200 300 400
100 attributes 30 attributes 10 attributes
Avg. size of a reduct
7891011
100 attributes 30 attributes 10 attributes
Avg. minimal size of a reduct
6.0 6.5 7.0 7.5 8.0 8.5 9.0
100 attributes 30 attributes 10 attributes
Avg. maximal overlap of reducts
0.00 0.05 0.10 0.15 0.20 0.25
FIGURE 3 Average computation times, minimal and average sizes, and average maximal overlap of
reducts computed using the RR-DAS and OR-DAS algorithms based on direct discernibility. Plots
correspond to different settings of the attribute sample size used in every iteration of the algorithms.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 239
be utilized for constructing an ensemble of classification models. This prob-
lem is less conspicuous for the second of the proposed heuristics (OR-DAS),
hence, it might be preferable for constructing diverse ensembles based on
short decision reducts.
One should remember that the sets of reducts can be searched also for
other reasons. As an example, let us consider a task of a robust attribute
selection (Abeel et al. 2010;´
Swiniarski and Skowron 2003). In such a case,
one is interested in a single subset of attributes selected from the union
of reducts, which would be stable over multiple runs of a given algorithm.
In practice, it is often accomplished by choosing the attributes that most
frequently occur in the obtained attribute subsets. Already, several attribute
filtering techniques that derive from the rough set theory have investigated
attributes from a union of multiple reducts (Błaszczy´
nski, Słowi¨
nski, and
Susmaga, 2011; Janusz and Stawicki 2011). In order to better reflect stabil-
ity of such an attribute subset, the aforementioned way of evaluating sets of
reducts may need a revision.
Following, we propose several methods for measuring stability of
attribute subsets retrieved from families of reducts. We decided to compare
reducts obtained in 20 repetitions of previous experiments. We measured
the average maximum overlap between the unions of reducts from each of
the runs (aveMaxOverlap) and we counted how many attributes were present
in the intersection of all the unions (common attrs). For each execution of
this experiment we additionally checked how many attributes were present
in at least 5 out of 100 reducts (frequent attrs) and we measured the aver-
age maximum overlap of those attribute sets (freqMaxOverlap). All of those
statistics are presented in Table 7 .
Attribute sets returned by the proposed algorithms turned out to be
much more stable than those obtained without the use of clustering.
Interestingly, the permutation-based method (OR-DAS) achieved slightly
better results than the RR-DAS algorithm. Moreover, as it was expected, the
average maximal overlap of unions of reducts increased when the attributes
from the majority group were removed. However, we need to remember
TABL E 7 Stability of an Attribute Selection Using Different Reduct Computation Algorithms
Algorithm AveMaxOverlap Common Attrs Frequent Attrs FreqMaxOverlap
Random Reducts 0.221 9 4.40 0.584
RR-DAS 0.269 25 12.05 0.688
OR-DAS 0.277 28 17.60 0.719
RR-DAS w/o gr.1 0.307 22 13.65 0.661
OR-DAS w/o gr.1 0.315 28 19.30 0.656
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
240 A. Janusz and D. ´
Sl˛ezak
that the parameters considered are designed to investigate stability of results
with respect to specific goals of an attribute selection. Some other measures
should be introduced in order to study the stability of results in a form of
sets of attribute sets, optimized for the purposes of the representation of
data dependencies or the construction of classifier ensembles.
CONCLUDING REMARKS
In this article, we presented an investigation of a possibility of combining
the greedy and permutation-based heuristics to facilitate fast computation
of representative ensembles of short decision reducts. The choice of param-
eters responsible for generation of permutations and the greedy heuristic
measures may have a significant influence on the results. However, in all
such scenarios it is expected that attribute clustering can improve the
computations and the interpretation of reducts.
We proposed a new approach to attribute clustering and its application
to a task of computation of short decision reducts from datasets with a large
number of attributes. We showed that by utilization of clustering results, it is
possible to significantly speed up the search for decision reducts and that the
obtained reducts tend to be smaller than those reached without the cluster-
ing. We also proposed a discernibility-based attribute dissimilarity measure
that is particularly useful for identifying groups of attributes that are likely
to be interchangeable in many reducts.
We intend to combine our methods with other knowledge-discovery
approaches that involve attribute grouping and selection (Abeel et al. 2010;
Gru˙
zd´z, Ihnatowicz, and ´
Sl˛ezak 2006). One may also consider an idea of full
integration of the algorithms for attribute clustering and selection, so they
can provide feedback to each other within the same learning process. Such a
new process may be performed separately for particular microarray datasets
or over their larger unions (Janusz and ´
Sl˛ezak 2012).
The integration of the attribute clustering and selection procedures may
bring not only significant performance improvements but may also provide
a new meaning with regard to the attribute selection outcomes. Instead of
subsets of individuals chosen from thousands of attributes, it may be better to
deal with subsets of representatives selected from much more robust clusters
of interchangeable attributes. Moreover, the outcomes of attribute clustering
may help to identify truly irrelevant attributes.
NOTE
All the algorithms and experiments were implemented and conducted in R System (http://www.r-project.
org/).
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
Rough Set Methods for Attribute Clustering and Selection 241
FUNDING
This research was partly supported by the Polish National Science Centre (NCN) grants DEC-
2011/01/B/ST6/03867 and DEC-2012/05/B/ST6/03215.
REFERENCES
Abeel, T., T. Helleputte, Y. V. de Peer, P. Dupont, and Y. Saeys. 2010. Robust biomarker identification for
cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398.
Baldi, P., and G. W. Hatfield. 2002. DNA microarrays and gene expression: From experiments to data analysis and
modeling. Cambridge, UK: Cambridge University Press.
Bazan, J. G., H. S. Nguyen, S. H. Nguyen, P. Synak, and J. Wróblewski. 2000. Rough set algorithms in
classification problem. In Rough set methods and applications: New developments in knowledge discovery
in information systems, Studies in Fuzziness and Soft Computing 56:49–88, L. Polkowski, S. Tsumoto,
and T. Y. Lin. ed. Heidelberg: Physica-Verlag.
Błaszczy´
nski, J., R. Słowi ¨
nski, and R. Susmaga. 2011. Rule-based estimation of attribute relevance. In Rough
sets and knowledge technology, Lecture Notes in Computer Science 6954:36–44. Berlin, Heidelberg:
Springer.
Breiman, L. 2001. Random forests. Machine Learning 45(1):5–32.
Demšar, J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning
Research 7:1–30.
Fang, J., and J. W. Grzymała-Busse. 2006. Leukemia prediction from gene expression data – A rough
set approach. In International conference on artificial intelligence and soft computing , Lecture Notes in
Computer Science 4029:899–908. Berlin, Heidelberg: Springer.
Gru˙
zd´z, A., A. Ihnatowicz, and D. ´
Sl˛ezak. 2006. Interactive gene clustering – A case study of breast cancer
microarray data. Information Systems Frontiers 8(1):21–27.
Jain, A. K., M. N. Murty, and P. J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys
31(3):264–323.
Janusz, A. 2012. Dynamic rule-based similarity model for DNA microarray data. In Transactions on rough
sets XV , Lecture Notes in Computer Science 7255:1–25. Berlin, Heidelberg: Springer.
Janusz, A., and D. ´
Sl˛ezak. 2012. Utilization of attribute clustering methods for scalable computation of
reducts from high-dimensional data. In Federated conference on computer science and information systems,
295–302. Washington, D.C.: IEEE.
Janusz, A., and S. Stawicki. 2011. Applications of approximate reducts to the feature selection prob-
lem. In Rough sets and knowledge technology, Lecture Notes in Computer Science 6954:45–50. Berlin,
Heidelberg: Springer.
Jensen, R., and Q. Shen. 2009. New approaches to fuzzy-rough feature selection. IEEE Transactions on
Fuzzy Systems 17(4):824–838.
Kaufman, L., and P. Rousseeuw. 1990. Finding groups in data: An introduction to cluster analysis.NewYork,
NY: Wiley Interscience.
Kohavi, R., and G. H. John. 1997, December. Wrappers for feature subset selection. Artificial Intelligence
97:273–324.
McKinney, B. A., D. M. Reif, M. D. Ritchie, and J. H. Moore. 2006. Machine learning for detecting gene-
gene interactions: A review. Applied Bioinformatics 5(2):77–88.
Midelfart, H., H. J. Komorowski, K. Nørsett, F. Yadetie, A. K. Sandvik, and A. Lægreid. 2002.
Learning rough set classifiers from gene expressions and clinical data. Fundamenta Informaticae
53(2):155–183.
Mitchell, T. M. 1997. Machine learning . New York, NY: McGraw-Hill.
Nguyen, H. S. 2006. Approximate Boolean reasoning: Foundations and applications in data mining. In
Transactions on rough sets V, Lecture Notes in Computer Science 4100:334–506. Berlin, Heidelberg:
Springer.
Parkinson, H. E., M. Kapushesky, N. Kolesnikov, G. Rustici, M. Shojatalab, N. Abeygunawardena, H.
Berube, M. Dylag, I. Emam, A. Farne, E. Holloway, M. Lukk, J. Malone, R. Mani, E. Pilicheva,
T. F. Rayner, F. I. Rezwan, A. Sharma, E. Williams, X. Z. Bradley, T. Adamusiak, M. Brandizi, T.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
242 A. Janusz and D. ´
Sl˛ezak
Burdett, R. Coulson, M. Krestyaninova, P. Kurnosov, E. Maguire, S. G. Neogi, P. Rocca-Serra, S.-A.
Sansone, N. Sklyar, M. Zhao, U. Sarkans, and A. Brazma. 2009. Array express update – From an
archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research
37(Database-Issue):868–872.
Pawlak, Z. 1991. Rough sets – Theoretical aspects of reasoning about data. Boston, MA: Kluwer Academic
Publishers.
´
Sl˛ezak, D. 2000. Normalized decision functions and measures for inconsistent decision tables analysis.
Fundamenta Informaticae 44(3):291–319.
´
Sl˛ezak, D. 2007. Rough sets and few-objects-many-attributes problem: The case study of analysis of gene
expression data sets. In Frontiers in the convergence of bioscience and information technologies, 437–442.
Washington, D.C.: IEEE.
´
Swiniarski, R. W., and A. Skowron. 2003. Rough set methods in feature selection and recognition. Pattern
Recognition Letters 24(6):833–849.
Widz, S., and D. ´
Sl˛ezak. 2012. Rough set based decision support – Models easy to interpret. In Selected meth-
ods and applications of rough sets in management and engineering, Advanced Information and Knowledge
Processing, 95–112, G. Peters, P. Lingras, D. ´
Sl˛ezak, and Y. Yao. ed. Berlin: Springer.
Wojnarski, M., A. Janusz, H. S. Nguyen, J. G. Bazan, C. Luo, Z. Chen, F. Hu, G. Wang, L. Guan, and H.
Luo. 2010. RSCTC 2010 discovery challenge: Mining DNA microarray data for medical diagnosis
and treatment. In Rough sets and current trends in computing, Lecture Notes in Computer Science
6086:4–19. Berlin, Heidelberg: Springer.
Wróblewski, J. 2001. Ensembles of classifiers based on approximate reducts. Fundamenta Informaticae
47(3–4):351–360.
Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014
... Boolean reasoning [5,6], dynamic programming [7, 8,9], separate-and-conquer approach [10,11,12], algorithms based on decision tree construction [13,14], genetic algorithms [15,16], different kinds of greedy algorithms [5,17], and var- The paper presents a research methodology that proposes to employ decision 25 rules, induced by greedy heuristics, in construction of feature rankings [ 19,20], with the score function dependent on rule characteristics. The obtained rankings can be next applied in the process of feature selection and reduction. ...
... 60 In recent years, a huge increase of data stored, transmitted, and processed could be observed. As a consequence, feature selection and reduction domain plays an important role in knowledge discovery and different data mining tasks [19,20], especially in areas where data sets contain a huge number of attributes, for example, sequence-pattern in bioinformatics, genes expression analysis, mar- There are different approaches and algorithms for features selection [25,26]. ...
Article
Full-text available
The paper presents the proposed research methodology, dedicated to the application of greedy heuristics as a way of gathering information about available features. Discovered knowledge, represented in the form of generated decision rules, was employed to support feature selection and reduction process for induction of decision rules with classical rough set approach. Observations were executed over input data sets discretised by several methods. Experimental results show that elimination of less relevant attributes through the proposed methodology led to inferring rule sets with reduced cardinalities, while maintaining rule quality necessary for satisfactory classification.
... In classical rough set theory, reduct computation is primarily done in two approaches: dependency measure approach and discernibility matrix approach [12]. Using these methods, several attribute reduction algorithms are developed [13][14][15][16][17][18][19]. These methods are generalised to the fuzzy-rough set model, and a number of fuzzy-rough attribute reduction algorithms [3,5,10,11,[20][21][22][23][24][25] are proposed for numerical data. ...
Article
Full-text available
Fuzzy-rough set theory is an efficient method for attribute reduction. It can effectively handle the imprecision and uncertainty of the data in the attribute reduction. Despite its efficacy, current approaches to fuzzy-rough attribute reduction are not efficient for the processing of large data sets due to the requirement of higher space complexities. A limited number of accelerators and parallel/distributed approaches have been proposed for fuzzy-rough attribute reduction in large data sets. However, all of these approaches are dependency measure based methods in which fuzzy similarity matrices are used for performing attribute reduction. Alternative discernibility matrix based attribute reduction methods are found to have less space requirements and more amicable to parallelization in building parallel/distributed algorithms. This paper therefore introduces a fuzzy discernibility matrix-based attribute reduction accelerator (DARA) to accelerate the attribute reduction. DARA is used to build a sequential approach and the corresponding parallel/distributed approach for attribute reduction in large data sets. The proposed approaches are compared to the existing state-of-the-art approaches with a systematic experimental analysis to assess computational efficiency. The experimental study, along with theoretical validation, shows that the proposed approaches are effective and perform better than the current approaches.
... The notion of reduct, as a minimal set of attributes keeping the original information (associated with the first step mentioned above), has been widely studied within this mathematical theory. [14][15][16][17][18] In addition, Pawlak 12 also defined value reducts in RST in order to keep reducing the size of relational databases (associated with the second step). The idea of this notion is to consider the indispensable attributes for each object instead of indispensable attributes for all the objects. ...
Article
Full-text available
In Rough Set Theory, the notion of bireduct allows to simultaneously reduce the sets of objects and attributes contained in a dataset. In addition, value reducts are used to remove some unnecessary values of certain attributes for a specific object. Therefore, the combination of both notions provides a higher reduction of unnecessary data. This paper is focused on the study of bireducts and value reducts of information and decision tables. We present theoretical results capturing different aspects about the relationship between bireducts and reducts, offering new insights at a conceptual level. We also analyze the relationship between bireducts and value reducts. The studied connections among these notions provide important profits for the efficient information analysis, as well as for the detection of unnecessary or redundant information.
... Feature selection domain has been extensively studied by many researchers over the last years [13,17,29]. It has various applications in plenty of areas connected with data mining, machine learning, and knowledge representation [31]. ...
Article
Full-text available
The paper is dedicated to the area of feature selection, in particular a notion of attribute rankings that allow to estimate importance of variables. In the research presented for ranking construction a new weighting factor was defined, based on relative reducts. A reduct constitutes an embedded mechanism of feature selection, specific to rough set theory. The proposed factor takes into account the number of reducts in which a given attribute exists, as well as lengths of reducts. Two approaches for reduct generation were employed and compared, with search executed by a genetic algorithm. To validate the usefulness of the reduct-based rankings in the process of feature reduction, for gradually decreasing subsets of attributes, selected through rankings, sets of decision rules were induced in classical rough set approach. The performance of all rule classifiers was evaluated, and experimental results showed that the proposed rankings led to at least the same, or even increased classification accuracy for reduced sets of features than in the case of operating on the entire set of condition attributes. The experiments were performed on datasets from stylometry domain, with treating authorship attribution as a classification task, and stylometric descriptors as characteristic features defining writing styles.
Chapter
Information bireducts are useful tools in Rough Set Theory in order to simultaneously reduce the sets of objects and attributes in a dataset. Specifically, information bireducts provide non-redundant subtables preserving the original discernibilities. This paper presents different properties of information bireducts taking special interest in its relationship with reducts.KeywordsRough Set TheoryInformation tablesReductsBireducts
Article
In the fields of rough set and machine learning, attribute reduction has been demonstrated to be effective in removing redundant attributes with clear explanations. Therefore, not only the generalization performances of the derived reducts, but also the efficiencies of searching reducts have drawn much attention. Immediately, various accelerators for quickly deriving reducts have been designed. However, most of the existing solutions merely speed up the procedure of searching reduct from one and only one perspective, it follows that the efficiencies of those accelerators may be further improved with a fusion view. For such a reason, a framework called Fusing Attribute Reduction Accelerators (FARA) is developed. Our framework is specifically characterized by the following three aspects: (1) sample based accelerator, which is realized by gradually reducing the volume of samples based on the mechanism of positive approximation; (2) attribute based accelerator, which is realized by adding multiple qualified attributes into the potential reduct for each iteration; (3) granularity based accelerator, which is realized by ignoring the candidate attributes within coarser granularity. By examining both the efficiencies of the searchings and the effectiveness of the searched reducts, comprehensive experiments over 20 public datasets fairly validated the superiorities of our framework against 5 popular accelerators.
Article
In this study, a granular ball based selector was developed for reducing the dimensions of data from the perspective of attribute reduction. The granular ball theory offers a data-adaptive strategy for realizing information granulation process. It follows that the obtained granular balls can be regarded as the fundamental units of sampling and thereafter, the procedure of deriving the reduct(s) can be redesigned from a novel perspective. Firstly, the set of all granular balls is sorted based on their purities, following which each granular ball is considered as a group of samples, this is actually a process of sampling. Secondly, a potential reduct is derived over the first granular ball. Thereafter, a reduct over the subsequent granular ball can be obtained through correcting this potential reduct. Repeat this process until the reduct over the last granular ball is generated. Finally, the last reduct will be further corrected for deriving the final result over the whole universe. By considering both the efficiency of searching the reduct(s) and the effectiveness of the obtained reduct(s), comprehensive experiments over a total of 20 UCI datasets clearly validated the superiority of our approach against six well-established algorithms.
Chapter
We introduce a new rough-set-inspired binary feature selection framework, whereby it is preferred to choose attributes which let us distinguish between objects (cases, rows, examples) having different decision values according to the following mechanism: for objects u1 and u2 with decision values \(dec(u1)=0\) and \(dec(u2)=1\), it is preferred to select attributes a such that \(a(u1)=0\) and \(a(u2)=1\), with the secondary option – if the first one is impossible – to select a such that \(a(u1)=1\) and \(a(u2)=0\). We discuss the background for this approach, originally inspired by the needs of the genetic data analysis. We show how to derive the sets of such attributes – called positive-correlation-promoting reducts (PCP reducts in short) – using standard calculations over appropriately modified rough-set-based discernibility matrices. The proposed framework is implemented within the RoughSets R package which is widely used for the data exploration and knowledge discovery purposes.
Chapter
Full-text available
Rapid evolution of technology allows people to record more data than ever. Gathered information is intensively used by data analysts and domain experts. Collections of patterns extracted from data compose models (compact representations of discovered knowledge), which are at the heart of each decision support system. Models based on mathematically sophisticated methods may achieve high accuracy but they are hardly understandable by decision-makers. Models relying on symbolic, e.g. rule based methods can be less accurate but more intuitive. In both cases, feature subset selection leads to an increase of interpretability and practical usefulness of decision support systems. In this chapter, we discuss how rough sets can contribute in this respect.
Conference Paper
Full-text available
We investigate methods for attribute clustering and their possible applications to a task of computation of decision reducts from information systems. We focus on high-dimensional data sets, for which the problem of selecting attributes that constitute a reduct can be extremely computationally intensive. We apply an attribute clustering method to facilitate construction of reducts from microarray data. Our experiments confirm that by proper grouping of similar, in some sense replaceable attributes it is possible to significantly decrease a computation time, as well as increase a quality of resulting reducts (i.e. decrease their average size).
Article
Full-text available
Preface 1. A brief history of genomics 2. DNA array formats 3. DNA array readout methods 4. Gene expression profiling experiments: problems, pitfalls and solutions 5. Statistical analysis of array data: inferring changes 6. Statistical analysis of array data: dimensionality reduction, clustering, and regulatory regions 7. Survey of current DNA array applications 8. Systems biology: overview of regulatory, metabolic and signaling networks.
Article
p>ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository--a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse--a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas--a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200,000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently-ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.</p
Conference Paper
Rules-based Similarity (RBS) is a framework in which concepts from rough set theory are used for learning a similarity relation from data. This paper presents an extension of RBS called Dynamic Rules-based Similarity model (DRBS) which is designed to boost the quality of the learned relation in case of highly dimensional data. Rules-based Similarity utilizes a notion of a reduct to construct new features which can be interpreted as important aspects of a similarity in the classification context. Having defined such features it is possible to utilize the idea of Tversky's feature contrast similarity model in order to design an accurate and psychologically plausible similarity relation for a given domain of objects. DRBS tries to incorporate a broader array of aspects of the similarity into the model by constructing many heterogeneous sets of features from multiple decision reducts. To ensure diversity, the reducts are computed on random subsets of objects and attributes. This approach is particularly well-suited for dealing with "few-objects-many-attributes" problem, such as mining of DNA microarray data. The induced similarity relation and the resulting similarity function can be used to perform an accurate classification of previously unseen objects in a case-based fashion. Experiments, whose results are also presented in the paper, show that the proposed model can successfully compete with other state-of-the-art algorithms such as Random Forest or SVM.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.