Content uploaded by Dominik Ślęzak

Author content

All content in this area was uploaded by Dominik Ślęzak on Mar 02, 2015

Content may be subject to copyright.

This article was downloaded by: [Uniwersytet Warszawski], [Dominik Slezak]

On: 10 September 2014, At: 02:11

Publisher: Taylor & Francis

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Artificial Intelligence: An

International Journal

Publication details, including instructions for authors and

subscription information:

http://www.tandfonline.com/loi/uaai20

Rough Set Methods for Attribute

Clustering and Selection

Andrzej Janusza & Dominik Ślęzakab

a Institute of Mathematics, University of Warsaw, Warsaw, Poland

b Infobright Inc., Warsaw, Poland

Published online: 14 Mar 2014.

To cite this article: Andrzej Janusz & Dominik Ślęzak (2014) Rough Set Methods for Attribute

Clustering and Selection, Applied Artificial Intelligence: An International Journal, 28:3, 220-242, DOI:

10.1080/08839514.2014.883902

To link to this article: http://dx.doi.org/10.1080/08839514.2014.883902

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the

“Content”) contained in the publications on our platform. However, Taylor & Francis,

our agents, and our licensors make no representations or warranties whatsoever as to

the accuracy, completeness, or suitability for any purpose of the Content. Any opinions

and views expressed in this publication are the opinions and views of the authors,

and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content

should not be relied upon and should be independently verified with primary sources

of information. Taylor and Francis shall not be liable for any losses, actions, claims,

proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or

howsoever caused arising directly or indirectly in connection with, in relation to or arising

out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any

substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,

systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-

and-conditions

Applied Artiﬁcial Intelligence, 28:220–242, 2014

Copyright © 2014 Taylor & Francis Group, LLC

ISSN: 0883-9514 print/1087-6545 online

DOI: 10.1080/08839514.2014.883902

ROUGH SET METHODS FOR ATTRIBUTE CLUSTERING AND

SELECTION

Andrzej Janusz1and Dominik ´

Sl ˛ezak1,2

1Institute of Mathematics, University of Warsaw, Warsaw, Poland

2Infobright Inc., Warsaw, Poland

In this study we investigate methods for attribute clustering and their possible applications to

the task of computation of decision reducts from information systems. We focus on high-dimensional

datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques

either can be extremely computationally intensive or can yield poor performance in terms of the size

of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search

with a diverse selection of candidate attributes. Our experiments conﬁrm that by proper grouping of

similar—in some sense interchangeable—attributes, it is possible to signiﬁcantly decrease computa-

tion time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size).

We examine several criteria for attribute clustering, and we also identify so-called garbage clusters,

which contain attributes that can be regarded as irrelevant.

INTRODUCTION

In many applications, available information about objects from a con-

sidered universe has to be reduced. This reduction might be required in

order to limit resources that are needed by algorithms analyzing the data

or to prevent crippling their performance by noisy or irrelevant attributes

(Kohavi and John 1997; Mitchell 1997). Many of the popular attribute subset

selection methods are derived from the theory of rough sets (Pawlak 1991,

´

Swiniarski and Skowron 2003).

In the rough set approach, the reduction of an object description is usu-

ally done by following the notion of a reduct—a minimal set of attributes that

sufﬁciently preserves information allowing the discernment of objects with

different properties, for example, belonging to different decision classes.

The techniques for computation of decision reducts have been widely dis-

cussed in literature related to data analysis and knowledge discovery. Their

practical signiﬁcance for tasks such as attribute selection, rule induction,

Address correspondence to Dominik ´

Sl˛ezak, Institute of Mathematics, University of Warsaw, ul.

Banacha 2, Warsaw, 02-097, Poland. E-mail: slezak@mimu.edu.pl

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 221

and data visualization is unquestionable (Janusz and Stawicki 2011; Widz

and ´

Sl˛ezak 2012).

The discussed approach to attribute reduction can be used for a wide

spectrum of high-dimensional data types. In this article, we focus on gene

expressions. There is a great deal of literature showing how to apply the

rough set approach to microarray data analysis (Fang and Grzymała-Busse

2006; Janusz and Stawicki 2011; Midelfart et al. 2002). In Gru˙

zd´z, Ihnatowicz,

and ´

Sl˛ezak (2006), Janusz and ´

Sl˛ezak (2012), and ´

Sl˛ezak (2007), it was dis-

cussed that, given such a huge amount of attributes in microarray datasets,

it is indeed better to combine the standard computation mechanisms with

some elements of attribute clustering. Therefore, this work aims at exper-

imental veriﬁcation of these ideas by combining rough set algorithms for

attribute reduction with rough set inspired methods for attribute clustering.

This study is a continuation of research described in Janusz and ´

Sl˛ezak

(2012), in which we focused on identiﬁcation of attribute dissimilarity mea-

sures that are appropriate for ﬁnding groups of interchangeable attributes.

We extend this work by an in-depth investigation of the selected gene-

clustering results. We also propose two algorithms for computation of

multiple decision reducts. Those algorithms combine the greedy heuristic

approach and attribute clustering results in order to obtain a set of diverse

and short reducts. We evaluate the proposed methods in a series of experi-

ments, and we discuss the impact of attribute clustering on the performance

of greedy reduct computation heuristics.

This article is organized as follows: “Rough Set-Based Attribute Selection

and Clustering” discusses basic notions from the rough set theory, that

are related to the attribute reduction problem and recalls some popular

algorithms for the computation of reducts. It also outlines our intuition

behind combining attribute reduction with attribute clustering. “Framework

for Experimental Validation” reports our experimental framework for utiliz-

ing gene-clustering methods for computation of reducts, and “Experiments

with Dissimilarity Functions” presents the evaluation of the proposed mod-

iﬁcation to the permutation-based reduct computation algorithm. “Analysis

of Selected Gene-Clustering Results” investigates selected gene-clustering

results. “Randomized Greedy Computation Decision Reducts” shows how

those observations can be used to overcome major issues related to the

greedy computation of reducts for high-dimensional datasets. “Concluding

Remarks” concludes the paper.

ROUGH SET-BASED ATTRIBUTE SELECTION AND CLUSTERING

In the rough set theory, the available information about the considered

universe is usually represented in a decision system understood as a tuple

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

222 A. Janusz and D. ´

Sl˛ezak

Sd=(U,A,d), where Uis a set of objects, Ais a set of attributes, and d

is a distinguished attribute called a decision. By a decision reduct of Sd,we

usually mean a compact yet informative subset of available attributes. The

most basic type of decision reduct is a subset of attributes DR ⊆Asatisfying

the following conditions:

1. For any pair u,u∈Uof objects belonging to different decision classes

(i.e., d(u)= d(u)), if uand uare discerned by A(i.e., ∃a∈Asuch that

a(u)= a(u)), then they are also discerned by DR.

2. There is no proper subset DR ⊆DR for which the ﬁrst condition holds.

A decision reduct is a set of attributes that are sufﬁcient to discern objects

from different decision classes. At the same time, this set has to be minimal,

in a sense that no further attributes can be removed from DR without los-

ing the discernibility property. For example, {a3,a5}and {a3,a6}are decision

reducts of the decision system Sdfrom Table 1 . The ﬁrst listed condition for a

decision reduct is often replaced by some other requirements for preserving

information about the decision while reducing attributes. In this article, for

simplicity, we restrict ourselves to the aforementioned discernibility-based

criterion, which is well documented in the rough set literature (Bazan et al.

2000; Nguyen 2006).

Many algorithms for attribute reduction are described, utilizing various

greedy or randomized search approaches (Janusz and Stawicki 2011). Most

of them refer to the search for an optimal (shortest, generating minimum

number of rules, etc.) decision reduct or some larger ensembles of deci-

sion reducts that constitute efﬁcient classiﬁcation models (Janusz 2012).

We can also consider their approximate versions, which are especially use-

ful for noisy datasets (´

Sl˛ezak 2000). For instance, we can require that only a

percentage of object pairs satisﬁes the ﬁrst condition for a decision reduct.

Moreover, we may extend the discernibility notion toward the criteria of a

TABL E 1 An Exemplary Decision Table Sdwith a Binary Decision

a1a2a3a4a5a6a7a8d

u1122001011

u2011110101

u3120102101

u4010010010

u5201021001

u6102020020

u7011202101

u8000211110

u9210011000

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 223

sufﬁcient dissimilarity or a discernibility in a degree, which are useful in the

case of numeric data (Jensen and Shen 2009).

A commonly used technique for the computation of decision reducts is

the greedy approach explained by Algorithm 1. In this algorithm, Qd:A×

2A→R+∪{0}corresponds to an attribute quality measure that is mono-

tonic, in a sense that it decreases with an increasing size of the set from its

second argument. This function also needs a property that it equals 0 if the

second argument is a superreduct, that is, a set of attributes that discern

all objects from different decision classes. A number of such functions were

adopted for the purposes of reduct computation (Janusz and Stawicki 2011;

´

Sl˛ezak 2000).

The utilization of the greedy heuristic usually leads to a short reduct (by

means of its cardinality). However, two major disadvantages of this approach

are its high computational complexity with regard to the total number of

attributes in the data and the fact that it can be used to construct only

a single reduct. A viable alternative that can overcome those issues is the

permutation-based method, called ordered reducts (Janusz and ´

Sl˛ezak 2012;

Wróblewski 2001), explained by Algorithm 2.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

224 A. Janusz and D. ´

Sl˛ezak

Appropriately extended classical notions of the rough set theory can

be successfully applied as an attribute selection framework for the analysis

of large and complex data sets, such as microarray data. However, there is

yet another possibility for scaling the original rough set notions with regard

to the number of attributes. The basic idea is to utilize additional informa-

tion about groups of attributes that can potentially replace each other while

constructing reducts from the data.

For a moment, let us imagine that some miraculous oracle could iden-

tify such groups of attributes in the data. In Janusz and ´

Sl˛ezak (2012), we

proposed a method that allows us to incorporate this additional knowledge

into the reduct computation process by inﬂuencing the generation of per-

mutations in the ordered reducts algorithm. This process is explained by

Algorithm 3. We refer to a fusion of Algorithm 3 into the permutation-based

method as ordered reducts with diverse attribute drawing (OR-DAD).

The knowledge regarding groups of interchangeable attributes can often

be acquired from domain experts or external knowledge bases such as

domain ontologies (e.g., the Gene Ontology). It can also be obtained auto-

matically by utilizing attribute clustering methods (Jain, Murty, and Flynn

1999). From a perspective of the microarray data analysis, such an idea

refers to a task of gene clustering (Baldi and Hatﬁeld 2002; McKinney

et al. 2006). In Gru˙

zd´z, Ihnatowicz, and ´

Sl˛ezak (2006), we reported that

the gene-clustering outcomes may meet expert expectations to more extent

when they are based on information-theoretic measures, rather than on

standard numeric and rank-based correlations. In other words, interpreting

genes as attributes with some approximate dependencies between them may

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 225

bring better results than treating them as numeric vectors. In ´

Sl˛ezak (2007),

we suggested that attribute clustering can be conducted also by means of

dissimilarity functions based on discernibility between objects, utilized as a

form of measuring degrees of functional dependencies between attributes.

We also proposed a mechanism wherein reducts could be searched in a

data table consisting only of the previously computed cluster representa-

tives, with their occurrence in reducts used as feedback for the clustering

reﬁnements.

It is reasonable to use analogous criteria for preserving information

about a decision while reducing attributes and measuring distances between

them. As an example, let us compare the attributes a5and a6in Table 1 .

In the case of most of pairs of objects, a5discerns them, iff a6does. This may

indicate either that there are relatively many pairs of reducts of the form

B∪{a5}and B∪{a6},B⊆A\{a5,a6}, or that the attributes a5and a6do not

occur in reducts at all. Reducts {a3,a5}and {a3,a6}are an illustration of this

kind of replaceability. The attributes that are likely to be interchangeable

can be easily noticed by studying a dendrogram generated by a hierarchical

clustering algorithm. An example of such a tree generated for the decision

system from Table 1 is presented in Figure 1. As expected, the attributes a5

and a6are merged into a single cluster as the second pair.

The methods of an attribute reduction and grouping can be put together

in many different ways. As an example, in Abeel and colleagues. (2010),

it is noted that so-called signatures (irreducible subsets of genes provid-

ing enough information about probabilities of speciﬁc types of cancer—

the reader may notice an interesting correspondence of this notion to a

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

226 A. Janusz and D. ´

Sl˛ezak

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

attribute 1

attribute 2

attribute 3

attribute 4

attribute 5

attribute 6

attribute 7

attribute 8

FIGURE 1 An attribute-clustering tree for the decision table from Table 1 , obtained by applying the

agglomerative nesting algorithm in combination with the direct discernibility dissimilarity function.

probabilistic version of a decision reduct in ´

Sl˛ezak 2000) can contain genes

that are interchangeable with the others because of data correlations or

multiple explanations of some biomedical phenomena. Moreover, such an

interchangeability can be observed not only for single elements but also for

whole sets of attributes.

FRAMEWORK FOR EXPERIMENTAL VALIDATION

We conducted a series of experiments to verify usefulness of the attribute

clustering for scalable computation of decision reducts. We wanted to ﬁnd

answers to two main questions. The ﬁrst was whether the attribute group-

ing can speed up searching for reducts. The second question was related

to a quality of reducts generated using different clustering methods—we

wanted to check if such reducts are more concise. The minimal number of

attributes is not the only possible optimization criterion for decision reducts

(´

Sl˛ezak 2000;Wr´oblewski 2001). However, it is indeed the most straightfor-

ward idea to rely on minimal reducts in order to clearly visualize the data

dependencies.

In the experiments, we use a microarray dataset from the Rough Sets

and Current Trends in Computing (RSCTC) 2010 conference competi-

tion aimed at constructing classiﬁers with the highest possible accuracy

(Wojnarski et al. 2010). We focus on this speciﬁc dataset because—although,

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 227

currently, we do not evaluate the obtained reducts by means of accuracy

of classiﬁers that they are yielding—this will be the next step of our inves-

tigation, leading toward the ability to compare reduct-based classiﬁcation

models with the competition winners.

Microarrays are usually described by many thousands of attributes whose

values correspond to expression levels of genes. The considered dataset is

related to the investigation of a chronic hepatitis C virus role in the patho-

genesis of HCV-associated hepatocellular carcinoma. It contains data on

124 tissue samples described by 22,277 numeric attributes (genes). It was

obtained from the ArrayExpress repository (Parkinson et al. 2009; dataset

accession number: E-GEOD-14323). The gene expression levels in this

dataset were obtained using Affymetrix GeneChip Human Genome U133A

2.0 microarrays.

We preprocessed the data by discretizing attributes using an unsuper-

vised method. Every expression-level value of a given gene was replaced by

one of the three labels: over_expressed, normal, or under_expressed. A label

for an attribute aand a sample uis decided as follows:

¯a(u)=over_expressed if a(u)>meana+sda,

under_expressed if a(u)<meana−sda,

normal otherwise,

where meanaand sdadenote the mean and the standard deviation of

expression level values of ain the whole dataset. We proceed with such

discretization for the sake of simplicity. One might also apply other dis-

cretization techniques (Bazan et al. 2000, Janusz and Stawicki 2011) or utilize

some rough set-based approaches that do not require explicit discretization

at all (Jensen and Shen 2009;´

Sl˛ezak 2007).

We operate on relatively simple rough set-motivated dissimilarity func-

tions that refer to the comparison of attributes’ abilities to discern important

pairs of objects. The ﬁrst considered function, called a direct discernibility

function, is a ratio between a number of pairs of objects from different deci-

sion classes that are discerned by exactly one attribute to a number of such

objects discerned by at least one of the compared attributes. It can be written

down in a way that emphasizes its analogy to some standard measures used

in data clustering (Jain, Murty, and Flynn 1999; Kaufman and Rousseeuw

1990).

direct(a,b)=1−|{(u,u):d(u)= d(u)∧a(u)= a(u)∧b(u)= b(u)}|

|{(u,u):d(u)= d(u)∧(a(u)= a(u)∨b(u)= b(u))}| .

We also veriﬁed the usefulness of two other discernibility-based dissimilarity

functions. The relative discernibility, described in more detail in Janusz and

´

Sl˛ezak (2012); takes into account the fact that some pairs of objects belong-

ing to different decision classes are more difﬁcult to discern than the others.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

228 A. Janusz and D. ´

Sl˛ezak

It assigns higher weights to pairs of objects from different decision classes,

which are discerned by a lower number of attributes. The full discernibility

function does not take into account decision classes. Instead, it measures a

ratio between the total number of objects discerned by exactly one attribute to

the total number of objects discerned by at least one of the attributes. These

deﬁnitions should be regarded as just a few of many possible mathemati-

cal formulations of the basic intuition that an attribute dissimilarity measure

should help to identify groups of attributes that are interchangeable within

the same reducts.

In order to assess an impact of different attribute-clustering methods on

computation of reducts, in experiments we clustered the genes using several

techniques. We combined the discernibility-based functions with the agglom-

erative nesting (agnes), which is a hierarchical grouping algorithm (Jain,

Murty, and Flynn 1999; Kaufman and Rousseeuw 1990). We compared it with

k-means and agnes algorithms, working on dissimilarities computed using

Euclidean distance on nondiscretized data. We also checked clusterings

based on correlations between values of attributes, coupled with the agnes

algorithm. As a reference, we took results obtained for a random clustering,

which is actually equivalent to no clustering at all. We additionally checked

the worst-case scenario in which the attributes are grouped so that the most

dissimilar genes (according to the direct discernibility function) are in the

same clusters.

In each repetition of the experiment, we generated 100 reducts for

all the compared clustering methods. For the reduct computation we

used Algorithm 2. The permutations for each run of the algorithm were

generated based on the clusterings corresponding to the tested group-

ing methods. Algorithm 3 explains the permutation construction process.

In practice, there is no need to pregenerate a permutation for the reduct

computation, because it might be an integral part of the algorithm. However,

in experiments, we explicitly generated the permutations for the sake of

reproducibility of the results.

EXPERIMENTS WITH DISSIMILARITY FUNCTIONS

Table 2 summarizes measurements of computation times. For each clus-

tering method, the mean and standard deviations of 20 independent repe-

titions of the experiment are given. The results clearly show the advantage

of using the direct discernibility function in combination with a hierarchi-

cal clustering algorithm to speed up the generation of decision reducts.

Times obtained by this method are signiﬁcantly lower than those of all other

approaches. The signiﬁcance was measured using a t-test (Demšar 2006),

and the p-values obtained at 0.95 conﬁdence level were all lower than 10−10.

For instance, the times obtained by this method when grouping was made

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 229

TABL E 2 Average Computation Times of 100 Reducts for Permutations Produced Using Different

Clusterings

Clustering Method: 10 Clusters 100 Clusters 1000 Clusters

agnes & direct 3.536 ±0.112 3.151 ±0.097 3.015 ±0.117

agnes & relative 4.680 ±0.156 4.164 ±0.161 3.705 ±0.134

agnes & full disc. 5.350 ±0.157 5.287 ±0.244 5.018 ±0.280

agnes & correlation 4.443 ±0.154 3.999 ±0.189 3.805 ±0.157

agnes & Euclidean 3.965 ±0.158 4.430 ±0.251 4.839 ±0.199

k-means & Euclidean 4.872 ±0.239 4.434 ±0.229 4.545 ±0.148

random 4.597 ±0.155 4.665 ±0.190 4.543 ±0.147

worst 5.485 ±0.219 9.901 ±0.753 11.929 ±0.628

into 1000 clusters are, on average, lower by 34% than the corresponding

times for the random method. Moreover, robustness of the previously dis-

cussed tendency is conﬁrmed in Tabl e 2 by the stability with regard to a

number of considered clusters.

The results obtained for the relative discernibility function may be

regarded as disappointing. The tested weighting schema seems to degrade

the performance of the reduct computation algorithm, especially when

a low number of gene clusters is considered. The explanation of this

behavior will be within the scope of our future research. The experi-

ments show that distinguishing between the cases that are easier or more

difﬁcult to discern might not be necessary; however, a better-adjusted math-

ematical formula for such distinguishing may lead to more promising

outcomes.

The results from Table 2 obtained for the two Euclidean distance-based

clusterings also show a clear advantage of using hierarchical methods for

grouping genes in microarray data. Actually, the times for the k-means clus-

tering with the Euclidean settings cannot be regarded as statistically different

from the results of random clusterings at the level of 1000 generated clusters.

For each clustering method, we also measured an average size of the

generated reducts. This statistic reﬂects a quality of reducts, both by means

of data-based knowledge representation and ability to construct efﬁcient

classiﬁcation models. These results are displayed in Table 3 . The standard

deviations given in this table are not computed directly from the sizes

of the reducts but from the average sizes of 100 reducts in each of the

20 experiment runs. This explains such low values of this statistic.

The direct discernibility method signiﬁcantly outperformed other

approaches also in terms of the reduct size. As before, the signiﬁcance was

checked using a t-test. On average, decision reducts generated by using the

hierarchical clustering based on the direct discernibility function are shorter

than those computed from the random clusterings by nearly 1.5 genes. They

were also shorter than the reducts computed for the agnes algorithm and for

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

230 A. Janusz and D. ´

Sl˛ezak

TABL E 3 Average Sizes of 100 Reducts Computed for Different Clusterings

Clustering method: 10 Clusters 100 Clusters 1000 Clusters

agnes & direct 11.209 ±0.099 11.095 ±0.087 11.103 ±0.093

agnes & relative 12.102 ±0.132 11.790 ±0.134 11.638 ±0.114

agnes & full disc. 12.808 ±0.101 12.747±0.122 12.684 ±0.116

agnes & correlation 12.449 ±0.104 12.236 ±0.112 12.175 ±0.092

agnes & Euclidean 11.709 ±0.123 11.860 ±0.118 12.198 ±0.114

k-means & Euclidean 12.590 ±0.089 12.228 ±0.069 12.283 ±0.130

random 12.519 ±0.127 12.470 ±0.092 12.471 ±0.128

worst 12.731 ±0.133 14.800 ±0.159 15.624 ±0.180

TABL E 4 Average Minimal Sizes among 100 Reducts Computed for Different Clusterings

Clustering Method: 10 Clusters 100 Clusters 1000 Clusters

agnes & direct 8.900 ±0.307 8.950 ±0.223 9.200 ±0.410

agnes & relative 9.600 ±0.502 9.250 ±0.444 9.550 ±0.510

agnes & full disc. 10.250 ±0.550 10.200±0.523 10.150 ±0.489

agnes & correlation 10.100 ±0.447 9.700 ±0.470 9.700 ±0.571

agnes & Euclidean 9.500 ±0.512 9.250 ±0.444 9.600 ±0.502

k-means & Euclidean 10.000 ±0.458 9.650 ±0.489 9.600 ±0.502

random 9.85 ±0.489 9.900 ±0.447 10.000 ±0.324

worst 9.950 ±0.394 10.900 ±0.640 11.550 ±0.604

the Euclidean distances by over 0.5 gene. It conﬁrms that a proper attribute

clustering increases efﬁciency of the reduct computation methods.

Because reducts are often computed in order to create a concise repre-

sentation of data (e.g., for a convenient visualization, see Widz and ´

Sl˛ezak

2012) we also measure sizes of the shortest reducts computed in each of

the 20 repetitions of the experiment. These results are shown in Table 4 .

They additionally conﬁrm the importance of considering a speciﬁc decision

problem as a context when forming groups of genes. The attribute dissim-

ilarity functions that do not refer to a given decision task perform worse

than those taking the decision attribute into account. The best illustration

for this fact are the results obtained for the full discernibility function, which

are signiﬁcantly worse than random. The full discernibility is a measure that

is similar to the direct discernibility measure but it, neglects the decision

attribute. This leads to a radical change in the obtained results—from the

best to worse-than-random.

ANALYSIS OF THE SELECTED GENE-CLUSTERING RESULTS

We manually investigated results of different clusterings in order to

gain some insights on factors that inﬂuence the reduct computation efﬁ-

ciency. We noticed that the most successful clusterings, which are based on

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 231

TABL E 5 Distribution of Attributes in Clusterings into 10 Groups Using the Agnes Algorithm

Method: Gr.1 Gr.2 Gr.3 Gr.4 Gr.5 Gr.6 Gr.7 Gr.8 Gr.9 Gr.10

direct 13855 1814 1601 609 1010 1040 1248 1052 44 4

relative 6557 1923 3096 6536 1071 1279 991 565 226 33

full disc. 21565 356 186 106 7 18 18 15 5 1

correlation 1288 1750 2533 2127 801 1902 2867 3286 3266 2457

Euclidean 3818 1575 1161 290 9545 404 5452 28 1 3

the direct discernibility method combined with the agnes algorithm, are

signiﬁcantly imbalanced with regard to the number of attributes in each

group. For instance, the distribution of attributes for the clustering into

10 groups is shown in Table 5 . Analogical distributions obtained for the

relative discernibility-, full discernibility-, correlation- and Euclidean-based

clusterings are given for a reference.

The ﬁrst group in the direct discernibility clustering is highly over rep-

resented, whereas the distribution of genes for the correlation measure is

quite uniform. The distributions for the relative and Euclidean measures can

be placed between the distributions of the direct discernibility and correla-

tion groupings. Finally, the distribution of the full discernibility clustering

is the most imbalanced—nearly 95% of genes are placed in a single group.

This result conﬁrms that the full discernibility function is unable to capture

different roles played by particular genes in the decision problem.

When we compared the clustering trees obtained for those measures,

we found that the direct discernibility measure leads to a skewed outcome,

whereas the trees for the other functions (apart from the full discernibility

measure) are wellbalanced (see Figure 2). However, the dissimilarities

between the clusters—corresponding to relative differences in height of the

tree nodes—are usually larger for the direct discernibility measure.

The presence of a majority cluster among groups of genes may have a

very intuitive explanation. It is common that only a small portion of genes in

data is truly related to a problem indicated by a decision attribute. A major-

ity of genes do not bring any important information, hence, intuitively, a

good gene-clustering algorithm should place them in a separate cluster.

We decided to perform an additional series of tests in order to verify whether

this hypothesis is true for the direct discernibility clustering.

We checked the performance of the permutation-based reduct computa-

tion heuristic (the OR-DAD algorithm) in a case when we drop the attributes

from the majority cluster of the direct discernibility clustering into 10 groups

(using agnes). We modiﬁed the permutation generation process so that it

does not include attributes from the majority cluster. The results of the

comparison are shown in Table 6 . By removing attributes from the major-

ity cluster, we decreased average computation time of 100 reducts by 0.162s

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

232 A. Janusz and D. ´

Sl˛ezak

(a) direct discernibility

(b) relative discernibility (c) full discernibility

(d) correlation (e) Euclidean

FIGURE 2 A visualization of the clustering trees for ﬁve different gene dissimilarity measures, which

were cut at a height corresponding to the division into 10 groups.

TABL E 6 Performance of the Ordered Reducts (OR) and Ordered Reducts with Diverse Attribute

Drawing (OR-DAD) Algorithms with and without Attributes from the Group 1 of the Direct Discernibility

Clustering into 10 Groups (Results of 20 Independent Repetitions of the Experiment)

Clustering: Ave. Time Ave. Reduct Size Ave. Minimal Reduct Size

OR-DAD (direct disc.) 3.536 ±0.112 11.209 ±0.099 8.900 ±0.307

OR-DAD without gr.1 3.374±0.107 11.067 ±0.083 8.900 ±0.308

OR (random) 4.597 ±0.155 12.519 ±0.127 9.850 ±0.489

OR without gr.1 3.845 ±0.114 11.812 ±0.087 9.550 ±0.604

OR within gr.1 5.318 ±0.201 13.310 ±0.112 10.700 ±0.571

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 233

and their average size by 0.142. Although such an improvement is not very

large, its statistical signiﬁcance was conﬁrmed by the t-test at 0.95 conﬁdence

level. It is worth mentioning that the time complexity of the permutation-

based heuristic is constant with regard to the number of attributes in the

data, so this difference in performance was solely a result of the omission of

unnecessary (uninformative) attributes from the majority cluster. We addi-

tionally computed reducts from random permutations of genes from the

majority cluster and from random permutations of the remaining attributes.

The reducts constructed from attributes placed in the majority cluster were,

on average, longer by nearly 13%, and their construction was over 38% more

timeconsuming. In fact, their statistics were comparable or even worse than

those obtained for the worst-case clustering scenario (see Tables 2,3,and

4). We may use this observation to improve the existing reduct computation

heuristics.

RANDOMIZED-GREEDY COMPUTATION OF DECISION REDUCTS

Our experiments described in previous sections showed that a proper

attribute-clustering method can signiﬁcantly improve the permutation-based

reduct algorithm by indicating groups of attributes that are potentially inter-

changeable in many reducts. In our research, we were interested in whether

this observation holds also for the greedy reduct computation methods. The

greedy heuristic often allows us to ﬁnd a much shorter reduct than those

obtained from the randomized algorithms. For instance, Algorithm 1 com-

bined with the gini gain measure for evaluation of attribute quality, applied

to the hepatitis C data, generates a reduct consisting of only six genes. When

we compare this result to the minimal sizes of the reducts constructed using

the OR-DAD algorithm (see Table 4 ), it is clearly visible that the greedy

reduct is, on average, smaller by two to four attributes. The gini gain mea-

sure could be used also to reformulate constraints in the decision reduct

deﬁnition, as proposed in ´

Sl˛ezak (2000). However, in this study we keep our

focus on standard decision reducts and we treat gini gain just as an example

of a greedy evaluation function.

Two major disadvantages of the greedy heuristic are its computational

inefﬁciency for datasets with a signiﬁcantly large number of attributes and

the fact that it can be used to generate only a single reduct. For example,

in the described experiment, the computation time needed to construct the

greedy reduct was 544 seconds, which is over 10, 000 times slower than in the

case of the permutation-based algorithm.

The above observation motivated us to measure an impact of attribute

grouping on a computation time of the greedy reduct. We introduced con-

straints to the greedy algorithm that allow selection of only a single attribute

from each cluster. The selection itself was still done in the greedy fashion.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

234 A. Janusz and D. ´

Sl˛ezak

This modiﬁcation resulted in a signiﬁcant decrease of time needed for com-

putation of a single greedy reduct—it took 392 seconds when grouped into

10 clusters with the direct discernibility measure (about 28% less). The size

of a reduct obtained in this way was six, which is equal to the classical case.

However, those two greedy reducts differed on two out of six attributes.

In particular, this shows that searching for a single-decision reduct provides

highly incomplete knowledge about dependencies in the data, especially for

such large numbers of attributes. Hence, the approaches aimed at extrac-

tion of larger families of reducts should be preferred (Widz and ´

Sl˛ezak 2012;

Wr ´oblewski 2001).

Following the previous study, we wanted to check if a more signiﬁcant

improvement could be obtained. We decided to investigate the possibility

of introducing some intelligent attribute search strategies into the greedy

algorithm to accelerate its execution. We also wanted to check whether our

previous observation regarding the majority cluster can bring some beneﬁt

for the greedy computation of decision reducts.

First, we repeated the execution of the greedy algorithm combined with

the clustering into 10 groups, based on the direct discernibility clustering

but without consideration of genes from the majority cluster (i.e., gr.1). The

reduct was generated in 152s which is over 3.5 times faster than in the case

of the standard greedy algorithm. The obtained reduct had the same size

as the original one (i.e., it consisted of six attributes). It differed, however,

on three out of six genes. Interestingly, it also differed on three genes from

the reduct obtained by application of the clustering but with the majority

cluster included.

In the second experiment, we veriﬁed efﬁciency of two reduct genera-

tion heuristics that combine the greedy approach with some randomization

techniques and the utilization of the attribute clustering results. They

were motivated by the random forest algorithm (Breiman 2001)which

constructs an ensemble of decision trees generated from randomized sub-

sets of attributes. Analogically, at each step of the reduct computation

algorithm, only a small subset of randomly chosen attributes can be con-

sidered. This approach is sometimes called the random reducts algorithm

(Algorithm 4) and it was already used in a slightly modiﬁed version in, for

example, Janusz (2012) and Janusz and Stawicki (2011).

By the utilization of attribute-clustering results, we may try to bias the

attribute-sampling process and improve the efﬁciency of the reduct con-

struction. For this purpose we propose two heuristics. In the ﬁrst, which we

call random reducts with diverse attribute sampling (RR-DAS), attributes are

uniformly sampled from all the clusters. At each step of the algorithm, the

set of attributes to be evaluated contains approximately the same number of

elements from every group. This guarantees maximal diversity of attributes

considered at every step of the algorithm. The search for the best attribute

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 235

in every iteration is performed using the greedy approach. This heuristic is

outlined in Algorithm 5.

The second of the proposed heuristics aims to diversify sets of attributes

considered during different steps of the reduct computation. In this approach,

called ordered reducts with diverse attribute search (OR-DAS), groups

of attributes are permuted and during each iteration the best attribute

is searched within an attribute sample drawn from a single cluster (see

Algorithm 6).

Performances of the aforen mentioned heuristics were compared in

a series of tests on the hepatitis C data. The plots shown in Figure 3

present average results for computation of 100 reducts using the com-

pared algorithms. Average computation times, average reduct sizes, and

average minimal reduct sizes are displayed. The last statistic—average max-

imal overlap—reﬂects a homogeneity of a set of reducts. For each reduct

DR from a set RS, it computes its maximal percentage of common attributes

with other reducts in the set and takes the mean of those values:

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

236 A. Janusz and D. ´

Sl˛ezak

aveMaxOverlap(RS)=1

|RS|

DR∈RS

max

DR∈RS

|DR ∩DR|

|DR|

In the plots, the approach that does not use attribute clusterings (the

random reducts algorithm) is compared with both of the proposed heuris-

tics combined with the direct discernibility clustering (see “Framework for

Experimental Validation”) into 10 groups (labels RR-DAS and OR-DAS).

The impact of removing the majority cluster prior to computation of reducts

(the bars with the label w/o gr.1) is also assessed.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 237

The advantage of using the attribute clustering results for computing

the randomized greedy reducts is clearly visible. The usage of the direct

discernibility clustering not only does speed up the computations, on aver-

age, by approximately 18%, but it also decreases the average and minimal

size of the obtained reducts (on average, by approximately 10%). The

reducts computed with the use of the gene-clustering results were often as

small as the reduct generated using the classical greedy heuristic. The com-

bination of nondeterminism and clustering, however, allowed us to obtain

several different short reducts in a much shorter time. The removal of the

majority cluster brought a further improvement of those results, but in most

of the cases the difference was statistically insigniﬁcant.

In all cases, the employment of the clustering results decreased the diver-

sity of the obtained sets of reducts. This can be an issue if the reducts are to

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

238 A. Janusz and D. ´

Sl˛ezak

100 attributes 30 attributes 10 attributes

Random Reducts

RR−DAS

RR−DAS, w/o gr.1

OR−DAS

OR−DAS, w/o gr.1

Avg. computation time of 100 reducts (sec.)

100 200 300 400

100 attributes 30 attributes 10 attributes

Avg. size of a reduct

7891011

100 attributes 30 attributes 10 attributes

Avg. minimal size of a reduct

6.0 6.5 7.0 7.5 8.0 8.5 9.0

100 attributes 30 attributes 10 attributes

Avg. maximal overlap of reducts

0.00 0.05 0.10 0.15 0.20 0.25

FIGURE 3 Average computation times, minimal and average sizes, and average maximal overlap of

reducts computed using the RR-DAS and OR-DAS algorithms based on direct discernibility. Plots

correspond to different settings of the attribute sample size used in every iteration of the algorithms.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 239

be utilized for constructing an ensemble of classiﬁcation models. This prob-

lem is less conspicuous for the second of the proposed heuristics (OR-DAS),

hence, it might be preferable for constructing diverse ensembles based on

short decision reducts.

One should remember that the sets of reducts can be searched also for

other reasons. As an example, let us consider a task of a robust attribute

selection (Abeel et al. 2010;´

Swiniarski and Skowron 2003). In such a case,

one is interested in a single subset of attributes selected from the union

of reducts, which would be stable over multiple runs of a given algorithm.

In practice, it is often accomplished by choosing the attributes that most

frequently occur in the obtained attribute subsets. Already, several attribute

ﬁltering techniques that derive from the rough set theory have investigated

attributes from a union of multiple reducts (Błaszczy´

nski, Słowi¨

nski, and

Susmaga, 2011; Janusz and Stawicki 2011). In order to better reﬂect stabil-

ity of such an attribute subset, the aforementioned way of evaluating sets of

reducts may need a revision.

Following, we propose several methods for measuring stability of

attribute subsets retrieved from families of reducts. We decided to compare

reducts obtained in 20 repetitions of previous experiments. We measured

the average maximum overlap between the unions of reducts from each of

the runs (aveMaxOverlap) and we counted how many attributes were present

in the intersection of all the unions (common attrs). For each execution of

this experiment we additionally checked how many attributes were present

in at least 5 out of 100 reducts (frequent attrs) and we measured the aver-

age maximum overlap of those attribute sets (freqMaxOverlap). All of those

statistics are presented in Table 7 .

Attribute sets returned by the proposed algorithms turned out to be

much more stable than those obtained without the use of clustering.

Interestingly, the permutation-based method (OR-DAS) achieved slightly

better results than the RR-DAS algorithm. Moreover, as it was expected, the

average maximal overlap of unions of reducts increased when the attributes

from the majority group were removed. However, we need to remember

TABL E 7 Stability of an Attribute Selection Using Different Reduct Computation Algorithms

Algorithm AveMaxOverlap Common Attrs Frequent Attrs FreqMaxOverlap

Random Reducts 0.221 9 4.40 0.584

RR-DAS 0.269 25 12.05 0.688

OR-DAS 0.277 28 17.60 0.719

RR-DAS w/o gr.1 0.307 22 13.65 0.661

OR-DAS w/o gr.1 0.315 28 19.30 0.656

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

240 A. Janusz and D. ´

Sl˛ezak

that the parameters considered are designed to investigate stability of results

with respect to speciﬁc goals of an attribute selection. Some other measures

should be introduced in order to study the stability of results in a form of

sets of attribute sets, optimized for the purposes of the representation of

data dependencies or the construction of classiﬁer ensembles.

CONCLUDING REMARKS

In this article, we presented an investigation of a possibility of combining

the greedy and permutation-based heuristics to facilitate fast computation

of representative ensembles of short decision reducts. The choice of param-

eters responsible for generation of permutations and the greedy heuristic

measures may have a signiﬁcant inﬂuence on the results. However, in all

such scenarios it is expected that attribute clustering can improve the

computations and the interpretation of reducts.

We proposed a new approach to attribute clustering and its application

to a task of computation of short decision reducts from datasets with a large

number of attributes. We showed that by utilization of clustering results, it is

possible to signiﬁcantly speed up the search for decision reducts and that the

obtained reducts tend to be smaller than those reached without the cluster-

ing. We also proposed a discernibility-based attribute dissimilarity measure

that is particularly useful for identifying groups of attributes that are likely

to be interchangeable in many reducts.

We intend to combine our methods with other knowledge-discovery

approaches that involve attribute grouping and selection (Abeel et al. 2010;

Gru˙

zd´z, Ihnatowicz, and ´

Sl˛ezak 2006). One may also consider an idea of full

integration of the algorithms for attribute clustering and selection, so they

can provide feedback to each other within the same learning process. Such a

new process may be performed separately for particular microarray datasets

or over their larger unions (Janusz and ´

Sl˛ezak 2012).

The integration of the attribute clustering and selection procedures may

bring not only signiﬁcant performance improvements but may also provide

a new meaning with regard to the attribute selection outcomes. Instead of

subsets of individuals chosen from thousands of attributes, it may be better to

deal with subsets of representatives selected from much more robust clusters

of interchangeable attributes. Moreover, the outcomes of attribute clustering

may help to identify truly irrelevant attributes.

NOTE

All the algorithms and experiments were implemented and conducted in R System (http://www.r-project.

org/).

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

Rough Set Methods for Attribute Clustering and Selection 241

FUNDING

This research was partly supported by the Polish National Science Centre (NCN) grants DEC-

2011/01/B/ST6/03867 and DEC-2012/05/B/ST6/03215.

REFERENCES

Abeel, T., T. Helleputte, Y. V. de Peer, P. Dupont, and Y. Saeys. 2010. Robust biomarker identiﬁcation for

cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398.

Baldi, P., and G. W. Hatﬁeld. 2002. DNA microarrays and gene expression: From experiments to data analysis and

modeling. Cambridge, UK: Cambridge University Press.

Bazan, J. G., H. S. Nguyen, S. H. Nguyen, P. Synak, and J. Wróblewski. 2000. Rough set algorithms in

classiﬁcation problem. In Rough set methods and applications: New developments in knowledge discovery

in information systems, Studies in Fuzziness and Soft Computing 56:49–88, L. Polkowski, S. Tsumoto,

and T. Y. Lin. ed. Heidelberg: Physica-Verlag.

Błaszczy´

nski, J., R. Słowi ¨

nski, and R. Susmaga. 2011. Rule-based estimation of attribute relevance. In Rough

sets and knowledge technology, Lecture Notes in Computer Science 6954:36–44. Berlin, Heidelberg:

Springer.

Breiman, L. 2001. Random forests. Machine Learning 45(1):5–32.

Demšar, J. 2006. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learning

Research 7:1–30.

Fang, J., and J. W. Grzymała-Busse. 2006. Leukemia prediction from gene expression data – A rough

set approach. In International conference on artiﬁcial intelligence and soft computing , Lecture Notes in

Computer Science 4029:899–908. Berlin, Heidelberg: Springer.

Gru˙

zd´z, A., A. Ihnatowicz, and D. ´

Sl˛ezak. 2006. Interactive gene clustering – A case study of breast cancer

microarray data. Information Systems Frontiers 8(1):21–27.

Jain, A. K., M. N. Murty, and P. J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys

31(3):264–323.

Janusz, A. 2012. Dynamic rule-based similarity model for DNA microarray data. In Transactions on rough

sets XV , Lecture Notes in Computer Science 7255:1–25. Berlin, Heidelberg: Springer.

Janusz, A., and D. ´

Sl˛ezak. 2012. Utilization of attribute clustering methods for scalable computation of

reducts from high-dimensional data. In Federated conference on computer science and information systems,

295–302. Washington, D.C.: IEEE.

Janusz, A., and S. Stawicki. 2011. Applications of approximate reducts to the feature selection prob-

lem. In Rough sets and knowledge technology, Lecture Notes in Computer Science 6954:45–50. Berlin,

Heidelberg: Springer.

Jensen, R., and Q. Shen. 2009. New approaches to fuzzy-rough feature selection. IEEE Transactions on

Fuzzy Systems 17(4):824–838.

Kaufman, L., and P. Rousseeuw. 1990. Finding groups in data: An introduction to cluster analysis.NewYork,

NY: Wiley Interscience.

Kohavi, R., and G. H. John. 1997, December. Wrappers for feature subset selection. Artiﬁcial Intelligence

97:273–324.

McKinney, B. A., D. M. Reif, M. D. Ritchie, and J. H. Moore. 2006. Machine learning for detecting gene-

gene interactions: A review. Applied Bioinformatics 5(2):77–88.

Midelfart, H., H. J. Komorowski, K. Nørsett, F. Yadetie, A. K. Sandvik, and A. Lægreid. 2002.

Learning rough set classiﬁers from gene expressions and clinical data. Fundamenta Informaticae

53(2):155–183.

Mitchell, T. M. 1997. Machine learning . New York, NY: McGraw-Hill.

Nguyen, H. S. 2006. Approximate Boolean reasoning: Foundations and applications in data mining. In

Transactions on rough sets V, Lecture Notes in Computer Science 4100:334–506. Berlin, Heidelberg:

Springer.

Parkinson, H. E., M. Kapushesky, N. Kolesnikov, G. Rustici, M. Shojatalab, N. Abeygunawardena, H.

Berube, M. Dylag, I. Emam, A. Farne, E. Holloway, M. Lukk, J. Malone, R. Mani, E. Pilicheva,

T. F. Rayner, F. I. Rezwan, A. Sharma, E. Williams, X. Z. Bradley, T. Adamusiak, M. Brandizi, T.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014

242 A. Janusz and D. ´

Sl˛ezak

Burdett, R. Coulson, M. Krestyaninova, P. Kurnosov, E. Maguire, S. G. Neogi, P. Rocca-Serra, S.-A.

Sansone, N. Sklyar, M. Zhao, U. Sarkans, and A. Brazma. 2009. Array express update – From an

archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research

37(Database-Issue):868–872.

Pawlak, Z. 1991. Rough sets – Theoretical aspects of reasoning about data. Boston, MA: Kluwer Academic

Publishers.

´

Sl˛ezak, D. 2000. Normalized decision functions and measures for inconsistent decision tables analysis.

Fundamenta Informaticae 44(3):291–319.

´

Sl˛ezak, D. 2007. Rough sets and few-objects-many-attributes problem: The case study of analysis of gene

expression data sets. In Frontiers in the convergence of bioscience and information technologies, 437–442.

Washington, D.C.: IEEE.

´

Swiniarski, R. W., and A. Skowron. 2003. Rough set methods in feature selection and recognition. Pattern

Recognition Letters 24(6):833–849.

Widz, S., and D. ´

Sl˛ezak. 2012. Rough set based decision support – Models easy to interpret. In Selected meth-

ods and applications of rough sets in management and engineering, Advanced Information and Knowledge

Processing, 95–112, G. Peters, P. Lingras, D. ´

Sl˛ezak, and Y. Yao. ed. Berlin: Springer.

Wojnarski, M., A. Janusz, H. S. Nguyen, J. G. Bazan, C. Luo, Z. Chen, F. Hu, G. Wang, L. Guan, and H.

Luo. 2010. RSCTC 2010 discovery challenge: Mining DNA microarray data for medical diagnosis

and treatment. In Rough sets and current trends in computing, Lecture Notes in Computer Science

6086:4–19. Berlin, Heidelberg: Springer.

Wróblewski, J. 2001. Ensembles of classiﬁers based on approximate reducts. Fundamenta Informaticae

47(3–4):351–360.

Downloaded by [Uniwersytet Warszawski], [Dominik Slezak] at 02:11 10 September 2014