Chris Cornelis

Chris Cornelis
Ghent University | UGhent · Department of Applied Mathematics, Computer Science and Statistics

PhD in Computer Science

About

228
Publications
30,762
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,973
Citations
Citations since 2017
38 Research Items
3191 Citations
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
Introduction
Chris Cornelis currently works at the Department of Applied Mathematics, Computer Science and Statistics, Ghent University. His current project is 'fuzzy sets, rough sets and their applications in machine learning.'
Additional affiliations
October 2011 - present
University of Granada
Position
  • Ramón y Cajal postdoctoral researcher
October 2000 - present
Ghent University
Position
  • Guest professor

Publications

Publications (228)
Preprint
Full-text available
One of the weaknesses of classical (fuzzy) rough sets is their sensitivity to noise, which is particularly undesirable for machine learning applications. One approach to solve this issue is by making use of fuzzy quantifiers, as done by the vaguely quantified fuzzy rough set (VQFRS) model. While this idea is intuitive, the VQFRS model suffers from...
Preprint
Full-text available
Fuzzy rough sets are well-suited for working with vague, imprecise or uncertain information and have been succesfully applied in real-world classification problems. One of the prominent representatives of this theory is fuzzy-rough nearest neighbours (FRNN), a classification algorithm based on the classical k-nearest neighbours algorithm. The crux...
Preprint
Full-text available
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values that preserves the information encoded in the distribution of the missing values. Unlike the existing missing-indicator approach, this does not require imputation. We support our proposal with three different arguments. Firstly, po...
Preprint
Full-text available
By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost through imputation. The missing-indicator approach can be used in combination with imputation to instead represent...
Preprint
In this article, a new Fuzzy Granular Approximation Classifier (FGAC) is introduced. The classifier is based on the previously introduced concept of the granular approximation and its multi-class classification case. The classifier is instance-based and its biggest advantage is its local transparency i.e., the ability to explain every individual pr...
Article
Full-text available
In this short communication, we refute the conjecture by Fodor and Yager from [5] that the class of inclusion measures proposed by Kitainik coincides with that of inclusion measures based on contrapositive fuzzy implications. In particular, we show that the conjecture only holds when the considered universe of discourse is finite.
Article
Fuzzy rough set theory can be used as a tool for dealing with inconsistent data when there is a gradual notion of indiscernibility between objects. It does this by providing lower and upper approximations of concepts. In classical fuzzy rough sets, the lower and upper approximations are determined using the minimum and maximum operators, respective...
Preprint
Fuzzy rough set theory can be used as a tool for dealing with inconsistent data when there is a gradual notion of indiscernibility between objects. It does this by providing lower and upper approximations of concepts. In classical fuzzy rough sets, the lower and upper approximations are determined using the minimum and maximum operators, respective...
Preprint
Full-text available
In granular computing, fuzzy sets can be approximated by granularly representable sets that are as close as possible to the original fuzzy set w.r.t. a given closeness measure. Such sets are called granular approximations. In this article, we introduce the concepts of disjoint and adjacent granules and we examine how the new definitions affect the...
Preprint
Full-text available
Inconsistency in prediction problems occurs when instances that relate in a certain way on condition attributes, do not follow the same relation on the decision attribute. For example, in ordinal classification with monotonicity constraints, it occurs when an instance dominating another instance on condition attributes has been assigned to a worse...
Article
One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Sup...
Chapter
We propose an adaptation of fuzzy rough sets to model concepts in datasets with missing values. Upper and lower approximations are replaced by interval-valued fuzzy sets that express the uncertainty caused by incomplete information. Each of these interval-valued fuzzy sets is delineated by a pair of optimistic and pessimistic approximations. We sho...
Chapter
This paper presents some functional dependency relations defined on the attribute set of an information system. We establish some basic relationships between functional dependency relations, attribute reduction, and closure operators. We use the partial order for dependencies to show that reducts of an information system can be obtained from the ma...
Chapter
Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets.
Preprint
Full-text available
Emotion detection is an important task that can be applied to social media data to discover new knowledge. While the use of deep learning methods for this task has been prevalent, they are black-box models, making their decisions hard to interpret for a human operator. Therefore, in this paper, we propose an approach using weighted $k$ Nearest Neig...
Preprint
Full-text available
Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets. Specifically, we develop an approach f...
Article
Granular representations of crisp and fuzzy sets play an important role in rule induction algorithms based on rough set theory. In particular, arbitrary fuzzy sets can be approximated using unions of simple fuzzy sets called granules. These granules, in turn, have a straightforward interpretation in terms of human-readable fuzzy “if..., then...” ru...
Preprint
Full-text available
We provide a thorough treatment of hyperparameter optimisation for three data descriptors with a good track-record in the literature: Support Vector Machine (SVM), Nearest Neighbour Distance (NND) and Average Localised Proximity (ALP). The hyperparameters of SVM have to be optimised through cross-validation, while NND and ALP allow the reuse of a s...
Article
In this paper, we first review existing fuzzy extensions of the dominance-based rough set approach (DRSA), and advance the theory considering additional properties. Moreover, we examine the application of Ordered Weighted Average (OWA) operators to fuzzy DRSA. OWA operators have shown a lot of potential in handling outliers and noisy data in decisi...
Preprint
Full-text available
One-class classification is a challenging subfield of machine learning in which so-called data descriptors are used to predict membership of a class based solely on positive examples of that class, and no counter-examples. A number of data descriptors that have been shown to perform well in previous studies of one-class classification, like the Sup...
Chapter
We present fuzzy-rough-learn, the first Python library of fuzzy rough set machine learning algorithms. It contains three algorithms previously implemented in R and Java, as well as two new algorithms from the recent literature. We briefly discuss the use cases of fuzzy-rough-learn and the design philosophy guiding its development, before providing...
Chapter
Full-text available
In this paper, we present a new view on how the concept of rough sets may be interpreted in terms of statistics and used for reasoning about numerical data. We show that under specific assumptions, neighborhood based rough approximations may be seen as statistical estimations of certain and possible events. We propose a way of choosing the optimal...
Chapter
In this paper, we present a new closure operator defined on the set of attributes of an information system that satisfies the conditions for defining a matroid. We establish some basic relationships between equivalence classes and approximation operators where different sets of attributes are used. It is shown that the reducts of an information sys...
Article
Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. Howeve...
Chapter
Full-text available
This paper presents the concept of lower and upper rough matroids based on approximation operators for covering-based rough sets. This concept is a generalization of lower and upper rough matroids based on coverings. A new definition of lower and upper definable sets related with an approximation operator is presented and these definable sets are u...
Chapter
Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fu...
Article
Fuzzy rough set theory models both vagueness and indiscernibility in data, which makes it a very useful tool for application to various machine learning tasks. In this paper, we focus on one of its robust generalisations, namely ordered weighted average based fuzzy rough sets. This model uses a weighted approach in the definition of the fuzzy rough...
Article
In this paper, we discuss the relationship between different types of reduction and set definability. We recall the definition of a decision reduct, a γ-decision reduct, a decision bireduct and a γ-decision bireduct in a Pawlak approximation space and the notion of set definability both in a Pawlak and a covering approximation space. We extend the...
Article
Full-text available
Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set th...
Article
A multi-label dataset consists of observations associated with one or more outcomes. The traditional classification task generalizes to the prediction of several class labels simultaneously. In this paper, we propose a new nearest neighbor based multi-label method. The nearest neighbor approach remains an intuitive and effective way to solve classi...
Article
In the recent article ``On some types of covering rough sets from topological points of view" [14], the author develops a topological approach to covering-based rough sets. In this context, a number of corresponding approximation operators are introduced, their inclusion relationships are verified, and various conditions under which the operators c...
Article
Fuzzy covering-based rough set models are hybrid models using both rough set and fuzzy set theory. The former is often used to deal with uncertain and incomplete information, while the latter is used to describe vague concepts. The study of fuzzy rough set models has provided very good tools for machine learning algorithms such as feature and insta...
Chapter
In the machine-learning community, the most widely used MIL paradigm is Multi-Instance Classification (MIC). Most contributions in MIL are related to this predictive task and a considerable number of problems have been solved successfully. In Sects. 3.1 and 3.2, we introduce the MIC problem, give a formal definition, and describe the evaluation met...
Chapter
Instance-based classification algorithms perform their main learning process at the instance level. They try to approximate a function that assigns class labels to instances. The instance classifier is combined with an underlying MI assumption, which links the class label of instances inside a bag with the bag class label. Many strategies have been...
Chapter
Class imbalance is widely studied in single-instance learning and refers to the situation where the data observations are unevenly distributed among the possible classes. This phenomenon can present itself in MIL as well. Section 9.1 presents a general introduction to the topic of class imbalance, list the types of solutions to deal with it, and th...
Chapter
In bag-based multi-instance methods, the main learning process occurs at the level of bags. In this chapter, we analyze two important subcategories of bag-based MIL classifiers. On the one hand, in Sect. 5.2, we examine classifiers that define a distance or similarity measure between bags to work directly in the original bag space. On the other han...
Chapter
As applications grow more complex, proper data representation becomes more relevant. Experience shows that a representation accurately reflecting existing relations and interactions in the data renders the learning task easier to solve. In this context, multiple instance multiple label learning (MIMLL) appears as a flexible learning framework. The...
Chapter
Unsupervised MIL is a descriptive task where the learning process is carried out without information about the labels of bags. This is a common setting when it is hard or costly to obtain labeled data or when the objective is to find inherent or unknown relations in data. As for supervised learning techniques studied in this book, unsupervised MIL...
Chapter
An increase in dataset dimensionality and size implies a large computational complexity and possible estimation errors. In this context, data reduction methods try to construct a new and more compact data subset. This subset should maintain the most representative information and remove redundant, irrelevant, and/or noisy information. The inherent...
Chapter
This chapter provides a general introduction to the main subject matter of this work: multiple instance or multi-instance learning. The two terms are used interchangeably in the literature and they both convey the crucial point of difference with traditional (single-instance) learning. A formal description of multiple instance learning is provided...
Chapter
Regression is a popular machine learning task that aims to predict a numerical outcome. In multi-instance regression (MIR), each observation can be described by several instances. After a brief introduction to this topic in Sect. 6.1, we present a formal definition of MIR and its appropriate evaluation measures in Sect. 6.2. We organize the MIR met...
Book
This book provides a general overview of multiple instance learning (MIL), defining the framework and covering the central paradigms. The authors discuss the most important algorithms for MIL such as classification, regression and clustering. With a focus on classification, a taxonomy is set and the most relevant proposals are specified. Efficient...
Article
Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored...
Conference Paper
Semi-supervised learning incorporates aspects of both supervised and unsupervised learning. In semi-supervised classification, only some data instances have associated class labels, while others are unlabelled. One particular group of semi-supervised classification approaches are those known as self-labelling techniques, which attempt to assign cla...
Article
In this paper, we discuss a semantically sound approach to covering-based rough sets. We recall and elaborate on a conceptual approach to Pawlak's rough set model, in which we consider a two-part descriptive language. The first part of the language is used to describe conjunctive concepts, while in the second part disjunctions are allowed as well....
Conference Paper
There exist two formulations of rough sets: the conceptual and computational one. The conceptual or semantical approach of rough set theory focuses on the meaning and interpretation of concepts, while algorithms to compute those concepts are studied in the computational formulation. However, the research on the former is rather limited. In this pap...
Article
One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost....
Article
In many data mining processes, neighborhood operators play an important role as they are generalizations of equivalence classes which were used in the original rough set model of Pawlak. In this article, we introduce the notion of fuzzy neighborhood system of an object based on a given fuzzy covering, as well as the notion of the fuzzy minimal and...
Article
For any electric power system, it is crucial to guarantee a reliable performance of its High Voltage Circuit Breaker (HCVB). Determining when the HCVB needs maintenance is an important and non-trivial problem, since these devices are used over extensive periods of time. In this paper, we propose the use of data mining techniques in order to predict...
Chapter
This book reviews the multiple instance learning paradigm. This concept was introduced as a type of supervised learning, dealing with datasets that are more complex than traditionally encountered and presented. Before formally describing multiple instance learning, its methods, developments and applications, this introductory chapter first recalls...
Chapter
Full-text available
With the continued and relentless growth in dataset sizes in recent times, feature or attribute selection has become a necessary step in tackling the resultant intractability. Indeed, as the number of dimensions increases, the number of corresponding data instances required in order to generate accurate models increases exponentially. Fuzzy-rough s...
Article
Data used in machine learning applications is prone to contain both vague and incomplete information. Many authors have proposed to use fuzzy rough set theory in the development of new techniques tackling these characteristics. Fuzzy sets deal with vague data, while rough sets allow to model incomplete information. As such, the hybrid setting of th...
Article
Covering-based rough sets are important generalizations of the classical rough sets of Pawlak. A common way to shape lower and upper approximations within this framework is by means of a neighborhood operator. In this article, we study 24 such neighborhood operators that can be derived from a single covering. We verify equalities between them, redu...
Article
In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class...
Conference Paper
Neighborhood based rough sets are important generalizations of the classical rough sets of Pawlak, as neighborhood operators generalize equivalence classes. In this article, we introduce nine neighborhood based operators and we study the partial order relations between twenty-two different neighborhood operators obtained from one covering. Seven ne...
Article
Full-text available
Multi-instance learning is a setting in supervised learning where the data consists of bags of instances. Samples in the dataset are groups of individual instances. In classification problems, a decision value is assigned to the entire bag and the classification of an unseen bag involves the prediction of the decision value based on the instances i...
Article
Full-text available
Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit a poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distribut...
Article
One of the most powerful, popular and accurate classification techniques is support vector machines (SVMs). In this work, we want to evaluate whether the accuracy of SVMs can be further improved using training set selection (TSS), where only a subset of training instances is used to build the SVM model. By contrast to existing approaches, we focus...
Conference Paper
Instance selection methods are a class of preprocessing techniques that have been widely studied in machine learning to remove redundant or noisy instances from a training set. The main focus of such prior efforts has been on the selection of suitable training instances to perform a classification task for crisp class labels. In this paper, we prop...
Conference Paper
Size and complexity of Big Data requires advances in machine learning algorithms to adequately learn from such data. While distributed shared-nothing architectures (Hadoop/Spark) are becoming increasingly popular to develop such new algorithms, it is quite challenging to adapt existing machine learning algorithms. In this paper, we propose a soluti...
Conference Paper
Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolution...
Article
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory...
Article
Both rough and fuzzy set theories offer interesting tools for dealing with imperfect data: while the former allows us to work with uncertain and incomplete information, the latter provides a formal setting for vague concepts. The two theories are highly compatible, and since the late 1980s many researchers have studied their hybridization. In this...
Article
Covering based rough sets are a generalization of classical rough sets, in which the traditional partition of the universe induced by an equivalence relation is replaced by a covering. Many definitions have been proposed for the lower and upper approximations within this setting. In this paper, we recall the most important ones and organize them in...
Article
Data dimensionality has become a pervasive problem in many areas that require the learning of interpretable models. This has become particularly pronounced in recent years with the seemingly relentless growth in the size of datasets. Indeed, as the number of dimensions increases, the number of data instances required in order to generate accurate m...
Article
The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that...
Article
Web index recommendation systems are designed to help internet users with suggestions for finding relevant information. One way to develop such systems is using the multi-instance learning (MIL) approach: a generalization of the traditional supervised learning where each example is a labeled bag that is composed of unlabeled instances, and the task...
Article
Many different proposals exist for the definition of lower and upper approximation operators in covering-based rough sets. In this paper, we establish relationships between the most commonly used operators, using especially concepts of duality, conjugacy and adjointness (also referred to as Galois connection). We highlight the importance of the adj...
Article
This paper introduces a flexible extension of rough set theory: multi-adjoint fuzzy rough sets, in which a family of adjoint pairs are considered to compute the lower and upper approximations. This new setting increases the number of applications in which rough set theory can be used. An important feature of the presented framework is that the user...
Article
Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distributio...
Book
This book constitutes the refereed proceedings of the 9th International Conference on Rough Sets and Current Trends in Computing, RSCTC 2014, held in Granada and Madrid, Spain, in July 2014. RSCTC 2014 together with the Conference on Rough Sets and Emerging Intelligent Systems Paradigms (RSEISP 2014) was held as a major part of the 2014 Joint Rough...
Technical Report
Full-text available
Within the study field of machine learning, multi-instance classification aims to build a mathematical model from a set of examples to classify objects described by multiple attribute vectors. A problem that hamper the multi-instance classification is the class imbalance problem, which occurs when class sizes are quite different, causing the learni...
Conference Paper
Ever since the first hybrid fuzzy rough set model was proposed in the early 1990’s, many researchers have focused on the definition of the lower and upper approximation of a fuzzy set by means of a fuzzy relation. In this paper, we review those proposals which generalize the logical connectives and quantifiers present in the rough set approximation...
Conference Paper
The Nearest Neighbor (NN) algorithm is a well-known and effective classification algorithm. Prototype Selection (PS), which provides NN with a good training set to pick its neighbors from, is an important topic as NN is highly susceptible to noisy data. Accurate state-of-the-art PS methods are generally slow, which motivates us to propose a new PS...
Conference Paper
Due to the increasing number of conferences, researchers need to spend more and more time browsing through the respective calls for papers (CFPs) to identify those conferences which might be of interest to them. In this paper we study several content-based techniques to filter CFPs retrieved from the web. To this end, we explore how to exploit the...
Article
The k Nearest Neighbour (k NN) method is a widely used classification method that has proven to be very effective. The accuracy of k NN can be improved by means of Prototype Selection (PS), that is, we provide k NN with a reduced but reinforced dataset to pick its neighbours from. We use fuzzy rough set theory to express the quality of the instance...
Conference Paper
Full-text available
This paper proposes an approach based on fuzzy rough set theory to improve nearest neighbor based classification. Six measures are introduced to evaluate the quality of the nearest neighbors. This quality is combined with the frequency at which classes occur among the nearest neighbors and the similarity w.r.t. the nearest neighbor, to decide which...
Article
When a Web application with a built-in recommender offers a social networking component which enables its users to form a trust network, it can generate more personalized recommendations by combining user ratings with information from the trust network. These are the so-called trust-enhanced recommendation systems. While research on the incorporati...
Article
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers’ abstract, and some additional features such as their authors, keywords, and the journals in...