
Chris CornelisGhent University | UGhent · Department of Applied Mathematics, Computer Science and Statistics
Chris Cornelis
PhD in Computer Science
About
228
Publications
30,762
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,973
Citations
Citations since 2017
Introduction
Chris Cornelis currently works at the Department of Applied Mathematics, Computer Science and Statistics, Ghent University. His current project is 'fuzzy sets, rough sets and their applications in machine learning.'
Additional affiliations
October 2011 - present
October 2000 - present
Publications
Publications (228)
One of the weaknesses of classical (fuzzy) rough sets is their sensitivity to noise, which is particularly undesirable for machine learning applications. One approach to solve this issue is by making use of fuzzy quantifiers, as done by the vaguely quantified fuzzy rough set (VQFRS) model. While this idea is intuitive, the VQFRS model suffers from...
Fuzzy rough sets are well-suited for working with vague, imprecise or uncertain information and have been succesfully applied in real-world classification problems. One of the prominent representatives of this theory is fuzzy-rough nearest neighbours (FRNN), a classification algorithm based on the classical k-nearest neighbours algorithm. The crux...
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values that preserves the information encoded in the distribution of the missing values. Unlike the existing missing-indicator approach, this does not require imputation. We support our proposal with three different arguments. Firstly, po...
By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost through imputation. The missing-indicator approach can be used in combination with imputation to instead represent...
In this article, a new Fuzzy Granular Approximation Classifier (FGAC) is introduced. The classifier is based on the previously introduced concept of the granular approximation and its multi-class classification case. The classifier is instance-based and its biggest advantage is its local transparency i.e., the ability to explain every individual pr...
In this short communication, we refute the conjecture by Fodor and Yager from [5] that the class of inclusion measures proposed by Kitainik coincides with that of inclusion measures based on contrapositive fuzzy implications. In particular, we show that the conjecture only holds when the considered universe of discourse is finite.
Fuzzy rough set theory can be used as a tool for dealing with inconsistent data when there is a gradual notion of indiscernibility between objects. It does this by providing lower and upper approximations of concepts. In classical fuzzy rough sets, the lower and upper approximations are determined using the minimum and maximum operators, respective...
Fuzzy rough set theory can be used as a tool for dealing with inconsistent data when there is a gradual notion of indiscernibility between objects. It does this by providing lower and upper approximations of concepts. In classical fuzzy rough sets, the lower and upper approximations are determined using the minimum and maximum operators, respective...
In granular computing, fuzzy sets can be approximated by granularly representable sets that are as close as possible to the original fuzzy set w.r.t. a given closeness measure. Such sets are called granular approximations. In this article, we introduce the concepts of disjoint and adjacent granules and we examine how the new definitions affect the...
Inconsistency in prediction problems occurs when instances that relate in a certain way on condition attributes, do not follow the same relation on the decision attribute. For example, in ordinal classification with monotonicity constraints, it occurs when an instance dominating another instance on condition attributes has been assigned to a worse...
One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Sup...
We propose an adaptation of fuzzy rough sets to model concepts in datasets with missing values. Upper and lower approximations are replaced by interval-valued fuzzy sets that express the uncertainty caused by incomplete information. Each of these interval-valued fuzzy sets is delineated by a pair of optimistic and pessimistic approximations. We sho...
This paper presents some functional dependency relations defined on the attribute set of an information system. We establish some basic relationships between functional dependency relations, attribute reduction, and closure operators. We use the partial order for dependencies to show that reducts of an information system can be obtained from the ma...
Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets.
Emotion detection is an important task that can be applied to social media data to discover new knowledge. While the use of deep learning methods for this task has been prevalent, they are black-box models, making their decisions hard to interpret for a human operator. Therefore, in this paper, we propose an approach using weighted $k$ Nearest Neig...
Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets. Specifically, we develop an approach f...
Granular representations of crisp and fuzzy sets play an important role in rule induction algorithms based on rough set theory. In particular, arbitrary fuzzy sets can be approximated using unions of simple fuzzy sets called granules. These granules, in turn, have a straightforward interpretation in terms of human-readable fuzzy “if..., then...” ru...
We provide a thorough treatment of hyperparameter optimisation for three data descriptors with a good track-record in the literature: Support Vector Machine (SVM), Nearest Neighbour Distance (NND) and Average Localised Proximity (ALP). The hyperparameters of SVM have to be optimised through cross-validation, while NND and ALP allow the reuse of a s...
In this paper, we first review existing fuzzy extensions of the dominance-based rough set approach (DRSA), and advance the theory considering additional properties. Moreover, we examine the application of Ordered Weighted Average (OWA) operators to fuzzy DRSA. OWA operators have shown a lot of potential in handling outliers and noisy data in decisi...
One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Sup...
We present fuzzy-rough-learn, the first Python library of fuzzy rough set machine learning algorithms. It contains three algorithms previously implemented in R and Java, as well as two new algorithms from the recent literature. We briefly discuss the use cases of fuzzy-rough-learn and the design philosophy guiding its development, before providing...
In this paper, we present a new view on how the concept of rough sets may be interpreted in terms of statistics and used for reasoning about numerical data. We show that under specific assumptions, neighborhood based rough approximations may be seen as statistical estimations of certain and possible events. We propose a way of choosing the optimal...
In this paper, we present a new closure operator defined on the set of attributes of an information system that satisfies the conditions for defining a matroid. We establish some basic relationships between equivalence classes and approximation operators where different sets of attributes are used. It is shown that the reducts of an information sys...
Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. Howeve...
This paper presents the concept of lower and upper rough matroids based on approximation operators for covering-based rough sets. This concept is a generalization of lower and upper rough matroids based on coverings. A new definition of lower and upper definable sets related with an approximation operator is presented and these definable sets are u...
Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fu...
Fuzzy rough set theory models both vagueness and indiscernibility in data, which makes it a very useful tool for application to various machine learning tasks. In this paper, we focus on one of its robust generalisations, namely ordered weighted average based fuzzy rough sets. This model uses a weighted approach in the definition of the fuzzy rough...
In this paper, we discuss the relationship between different types of reduction and set definability. We recall the definition of a decision reduct, a γ-decision reduct, a decision bireduct and a γ-decision bireduct in a Pawlak approximation space and the notion of set definability both in a Pawlak and a covering approximation space. We extend the...
Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set th...
A multi-label dataset consists of observations associated with one or more outcomes. The traditional classification task generalizes to the prediction of several class labels simultaneously. In this paper, we propose a new nearest neighbor based multi-label method. The nearest neighbor approach remains an intuitive and effective way to solve classi...
In the recent article ``On some types of covering rough sets from topological points of view" [14], the author develops a topological approach to covering-based rough sets. In this context, a number of corresponding approximation operators are introduced, their inclusion relationships are verified, and various conditions under which the operators c...
Fuzzy covering-based rough set models are hybrid models using both rough set and fuzzy set theory. The former is often used to deal with uncertain and incomplete information, while the latter is used to describe vague concepts. The study of fuzzy rough set models has provided very good tools for machine learning algorithms such as feature and insta...
In the machine-learning community, the most widely used MIL paradigm is Multi-Instance Classification (MIC). Most contributions in MIL are related to this predictive task and a considerable number of problems have been solved successfully. In Sects. 3.1 and 3.2, we introduce the MIC problem, give a formal definition, and describe the evaluation met...
Instance-based classification algorithms perform their main learning process at the instance level. They try to approximate a function that assigns class labels to instances. The instance classifier is combined with an underlying MI assumption, which links the class label of instances inside a bag with the bag class label. Many strategies have been...
Class imbalance is widely studied in single-instance learning and refers to the situation where the data observations are unevenly distributed among the possible classes. This phenomenon can present itself in MIL as well. Section 9.1 presents a general introduction to the topic of class imbalance, list the types of solutions to deal with it, and th...
In bag-based multi-instance methods, the main learning process occurs at the level of bags. In this chapter, we analyze two important subcategories of bag-based MIL classifiers. On the one hand, in Sect. 5.2, we examine classifiers that define a distance or similarity measure between bags to work directly in the original bag space. On the other han...
As applications grow more complex, proper data representation becomes more relevant. Experience shows that a representation accurately reflecting existing relations and interactions in the data renders the learning task easier to solve. In this context, multiple instance multiple label learning (MIMLL) appears as a flexible learning framework. The...
Unsupervised MIL is a descriptive task where the learning process is carried out without information about the labels of bags. This is a common setting when it is hard or costly to obtain labeled data or when the objective is to find inherent or unknown relations in data. As for supervised learning techniques studied in this book, unsupervised MIL...
An increase in dataset dimensionality and size implies a large computational complexity and possible estimation errors. In this context, data reduction methods try to construct a new and more compact data subset. This subset should maintain the most representative information and remove redundant, irrelevant, and/or noisy information. The inherent...
This chapter provides a general introduction to the main subject matter of this work: multiple instance or multi-instance learning. The two terms are used interchangeably in the literature and they both convey the crucial point of difference with traditional (single-instance) learning. A formal description of multiple instance learning is provided...
Regression is a popular machine learning task that aims to predict a numerical outcome. In multi-instance regression (MIR), each observation can be described by several instances. After a brief introduction to this topic in Sect. 6.1, we present a formal definition of MIR and its appropriate evaluation measures in Sect. 6.2. We organize the MIR met...
This book provides a general overview of multiple instance learning (MIL), defining the framework and covering the central paradigms. The authors discuss the most important algorithms for MIL such as classification, regression and clustering. With a focus on classification, a taxonomy is set and the most relevant proposals are specified. Efficient...
Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored...
Semi-supervised learning incorporates aspects of both supervised and unsupervised learning. In semi-supervised classification, only some data instances have associated class labels, while others are unlabelled. One particular group of semi-supervised classification approaches are those known as self-labelling techniques, which attempt to assign cla...
In this paper, we discuss a semantically sound approach to covering-based rough sets. We recall and elaborate on a conceptual approach to Pawlak's rough set model, in which we consider a two-part descriptive language. The first part of the language is used to describe conjunctive concepts, while in the second part disjunctions are allowed as well....
There exist two formulations of rough sets: the conceptual and computational one. The conceptual or semantical approach of rough set theory focuses on the meaning and interpretation of concepts, while algorithms to compute those concepts are studied in the computational formulation. However, the research on the former is rather limited. In this pap...
One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost....
In many data mining processes, neighborhood operators play an important role as they are generalizations of equivalence classes which were used in the original rough set model of Pawlak. In this article, we introduce the notion of fuzzy neighborhood system of an object based on a given fuzzy covering, as well as the notion of the fuzzy minimal and...
For any electric power system, it is crucial to guarantee a reliable performance of its High Voltage Circuit Breaker (HCVB). Determining when the HCVB needs maintenance is an important and non-trivial problem, since these devices are used over extensive periods of time. In this paper, we propose the use of data mining techniques in order to predict...
This book reviews the multiple instance learning paradigm. This concept was introduced as a type of supervised learning, dealing with datasets that are more complex than traditionally encountered and presented. Before formally describing multiple instance learning, its methods, developments and applications, this introductory chapter first recalls...
With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough s...
Data used in machine learning applications is prone to contain both vague and incomplete information. Many authors have proposed to use fuzzy rough set theory in the development of new techniques tackling these characteristics. Fuzzy sets deal with vague data, while rough sets allow to model incomplete information. As such, the hybrid setting of th...
Covering-based rough sets are important generalizations of the classical rough sets of Pawlak. A common way to shape lower and upper approximations within this framework is by means of a neighborhood operator. In this article, we study 24 such neighborhood operators that can be derived from a single covering. We verify equalities between them, redu...
In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class...
Neighborhood based rough sets are important generalizations of the classical rough sets of Pawlak, as neighborhood operators generalize equivalence classes. In this article, we introduce nine neighborhood based operators and we study the partial order relations between twenty-two different neighborhood operators obtained from one covering. Seven ne...
Multi-instance learning is a setting in supervised learning where the data consists of bags of instances. Samples in the dataset are groups of individual instances. In classification problems, a decision value is assigned to the entire bag and the classification of an unseen bag involves the prediction of the decision value based on the instances i...
Imbalanced classification deals with learning from data with a disproportional number of samples in
its classes. Traditional classifiers exhibit a poor behavior when facing this kind of data because they do
not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this
problem: modifying the data distribut...
One of the most powerful, popular and accurate classification techniques is support vector machines (SVMs). In this work, we want to evaluate whether the accuracy of SVMs can be further improved using training set selection (TSS), where only a subset of training instances is used to build the SVM model. By contrast to existing approaches, we focus...
Instance selection methods are a class of preprocessing techniques that have been widely studied in machine learning to remove redundant or noisy instances from a training set. The main focus of such prior efforts has been on the selection of suitable training instances to perform a classification task for crisp class labels. In this paper, we prop...
Size and complexity of Big Data requires advances in machine learning algorithms to adequately learn from such data. While distributed shared-nothing architectures (Hadoop/Spark) are becoming increasingly popular to develop such new algorithms, it is quite challenging to adapt existing machine learning algorithms. In this paper, we propose a soluti...
Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolution...
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory...
Both rough and fuzzy set theories offer interesting tools for dealing with imperfect data: while the former allows us to work with uncertain and incomplete information, the latter provides a formal setting for vague concepts. The two theories are highly compatible, and since the late 1980s many researchers have studied their hybridization. In this...
Covering based rough sets are a generalization of classical rough sets, in which the traditional partition of the universe induced by an equivalence relation is replaced by a covering. Many definitions have been proposed for the lower and upper approximations within this setting. In this paper, we recall the most important ones and organize them in...
Data dimensionality has become a pervasive problem in many areas that require the learning of interpretable models. This has become particularly pronounced in recent years with the seemingly relentless growth in the size of datasets. Indeed, as the number of dimensions increases, the number of data instances required in order to generate accurate m...
The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that...
Web index recommendation systems are designed to help internet users with suggestions for finding relevant information. One way to develop such systems is using the multi-instance learning (MIL) approach: a generalization of the traditional supervised learning where each example is a labeled bag that is composed of unlabeled instances, and the task...
Many different proposals exist for the definition of lower and upper approximation operators in covering-based rough sets. In this paper, we establish relationships between the most commonly used operators, using especially concepts of duality, conjugacy and adjointness (also referred to as Galois connection). We highlight the importance of the adj...
This paper introduces a flexible extension of rough set theory: multi-adjoint fuzzy rough sets, in which a family of adjoint pairs are considered to compute the lower and upper approximations. This new setting increases the number of applications in which rough set theory can be used. An important feature of the presented framework is that the user...
Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distributio...
This book constitutes the refereed proceedings of the 9th International Conference on Rough Sets and Current Trends in Computing, RSCTC 2014, held in Granada and Madrid, Spain, in July 2014. RSCTC 2014 together with the Conference on Rough Sets and Emerging Intelligent Systems Paradigms (RSEISP 2014) was held as a major part of the 2014 Joint Rough...
Within the study field of machine learning, multi-instance classification aims to build a mathematical model from a set of examples to classify objects described by multiple attribute vectors. A problem that hamper the multi-instance classification is the class imbalance problem, which occurs when class sizes are quite different, causing the learni...
Ever since the first hybrid fuzzy rough set model was proposed in the early 1990’s, many researchers have focused on the definition of the lower and upper approximation of a fuzzy set by means of a fuzzy relation. In this paper, we review those proposals which generalize the logical connectives and quantifiers present in the rough set approximation...
The Nearest Neighbor (NN) algorithm is a well-known and effective classification algorithm. Prototype Selection (PS), which provides NN with a good training set to pick its neighbors from, is an important topic as NN is highly susceptible to noisy data. Accurate state-of-the-art PS methods are generally slow, which motivates us to propose a new PS...
Due to the increasing number of conferences, researchers need to spend more and more time browsing through the respective calls for papers (CFPs) to identify those conferences which might be of interest to them. In this paper we study several content-based techniques to filter CFPs retrieved from the web. To this end, we explore how to exploit the...
The k Nearest Neighbour (k NN) method is a widely used classification method that has proven to be very effective. The accuracy of k NN can be improved by means of Prototype Selection (PS), that is, we provide k NN with a reduced but reinforced dataset to pick its neighbours from. We use fuzzy rough set theory to express the quality of the instance...
This paper proposes an approach based on fuzzy rough set theory to improve nearest neighbor based classification. Six measures are introduced to evaluate the quality of the nearest neighbors. This quality is combined with the frequency at which classes occur among the nearest neighbors and the similarity w.r.t. the nearest neighbor, to decide which...
When a Web application with a built-in recommender offers a social networking component which enables its users to form a trust network, it can generate more personalized recommendations by combining user ratings with information from the trust network. These are the so-called trust-enhanced recommendation systems. While research on the incorporati...
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers’ abstract, and some additional features such as their authors, keywords, and the journals in...