Clarisse Dhaenens

Clarisse Dhaenens
  • Professor (Full) at University of Lille

About

163
Publications
13,669
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,033
Citations
Introduction
Current institution
University of Lille
Current position
  • Professor (Full)

Publications

Publications (163)
Chapter
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a...
Preprint
Full-text available
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a...
Article
Full-text available
Biclustering is an unsupervised machine-learning technique that simultaneously clusters rows and columns in a data matrix. Over the past two decades, the field of biclustering has emerged and grown significantly, and currently plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, find...
Chapter
Neural architecture search (NAS) is a subdomain of AutoML that consists of automating the design of neural networks. NAS has become a hot topic in the last few years. As a result, many methods are being developed in this area. Local search (LS), on the other hand, is a famous heuristic that has been around for many years. It is extensively used for...
Chapter
Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem t...
Chapter
Electronic health records (EHRs) involve heterogeneous data types such as binary, numeric and categorical attributes. As traditional clustering approaches require the definition of a single proximity measure, different data types are typically transformed into a common format or amalgamated through a single distance function. Unfortunately, this ea...
Article
Full-text available
In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization...
Preprint
Full-text available
Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem t...
Article
Background Multi-drug resistant (MDR) bacteria are a major health concern. In this retrospective study, a rule-based classification algorithm, MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data) is used to identify hospitalized patients at risk of testing positive for multidrug-resistant (MDR) bacteria, including Methicillin-resis...
Article
Background Multi-drug resistant (MDR) bacteria are a major health concern. In this retrospective study, a rule-based classification algorithm, MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data) is used to identify hospitalized patients at risk of testing positive for multidrug-resistant (MDR) bacteria, including Methicillin-resis...
Article
We address the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, symbolic, temporal). We propose a new method, HBC-t (Heterogeneous BiClustering for temporal data), designed to extract biclusters from heterogeneous, temporal, large-scale, sparse data matrices. HBC-t is based on HBC, using similar mechan...
Article
In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization...
Preprint
Full-text available
The no-wait flowshop scheduling problem is a variant of the classical permutation flowshop problem, with the additional constraint that jobs have to be processed by the successive machines without waiting time. To efficiently address this NP-hard combinatorial optimization problem we conduct an analysis of the structure of good quality solutions. T...
Conference Paper
A similarity measure or distance is a successful key in data mining process and knowledge discovery including biclustering. Many measures are proposed on the specific data type. However, there are different types and characteristics of data in real world. In the paper, we study the performance of a variety of measures on different kind of data and...
Article
Full-text available
Context: A better understanding of "patient pathway" thanks to data analysis can lead to better treatments for patients. The ClinMine project, supported by the French National Research Agency (ANR), aims at proposing, from various case studies, algorithmic and statistical models able to handle this type of pathway data, focusing primarily on hospit...
Conference Paper
Constructive heuristics have a great interest as they manage to find in a very short time, solutions of relatively good quality. Such solutions may be used as initial solutions for metaheuristics for example. In this work, we propose a new efficient constructive heuristic for the No-Wait Flowshop Scheduling Problem. This proposed heuristic is based...
Conference Paper
In multi-objective optimization approaches, considering neutral neighbors during the exploration has already proved its efficiency. The aim of this article is to go further in the comprehensibility of neutrality. In particular, we propose a definition of most promising neutral neighbors and study in details their distribution within neutral neighbo...
Article
This study focuses on the problem of supervised classification on heterogeneous temporal data featuring a mixture of attribute types (numeric, binary, symbolic, temporal). We present a model for classification rules designed to use both non-temporal attributes and sequences of temporal events as predicates. We also propose an efficient local search...
Chapter
Supervised classification and mathematical optimization have strong links. This chapter first provides a description of the classification task and a presentation of standard classification methods. Later, it discusses the use of metaheuristics to optimize those standard classification methods. The chapter focuses on the use of metaheuristics for t...
Chapter
Frameworks are useful tools that can speed up the development of optimization-based problem solving projects, reducing their development time and costs. They may also be applied by non-expert users as well to extend the user base and the applications scope for metaheuristics techniques. The objective of this chapter is to briefly present the tools...
Chapter
This chapter first provides some basic information on combinatorial optimization problems and their resolution methods. Solving a combinatorial optimization problem requires three main points: definition of the set of feasible solutions, determination of the objective function to optimize, and choice of the optimization method. The chapter then foc...
Chapter
Clustering is one of the most commonly used descriptive tasks in data mining. In partition-based clustering, the objective is to partition a dataset into K disjoint sets of points, such that points of a set are as homogeneous as possible. There exist two distinct types of hierarchical methods the agglomerative ones, which start with singleton clust...
Chapter
This chapter explains how feature selection can be realized with metaheuristics and why metaheuristics can help realize the feature selection task in a Big Data context. Feature selection, also known as variable selection, attribute selection or variable subset selection, aims at selecting an optimum relevant set of features or attributes that are...
Chapter
Association rules mining is a widely used approach for discovering interesting relationships between columns of large databases. The first applications for such an approach dealt with the market-basket analysis problem, where the aim is to identify a set of items frequently purchased simultaneously in the same transaction. Since this first applicat...
Chapter
The growth of data and the need for performance lead to the need for power. In computer science, to obtain power, it is common to use parallelism and parallel computation. Parallelism is often associated with Big Data. For metaheuristics, it is also very common as parallelized metaheuristics allow us not only to solve larger problems, but also to o...
Chapter
This chapter provides definitions of Big Data, the main challenges induced by this context, and focuses on Big Data analytics. Big Data enables organizations to store, manage and manipulate vast amounts of data at the right speed and at the right time to get the right insights. Many companies use one or several relational database management system...
Conference Paper
We define the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, etc.). This problem has not yet been investigated in the biclustering literature. We propose a new method, HBC (Heterogeneous BiClustering), designed to extract biclusters from heterogeneous, large-scale, sparse data matrices. The goal of t...
Book
Big Data is a new field, with many technological challenges to be understood in order to use it to its full potential. These challenges arise at all stages of working with Big Data, beginning with data generation and acquisition. The storage and management phase presents two critical challenges: infrastructure, for storage and transportation, and c...
Conference Paper
There is a significant body of research on neutrality and its effects in single-objective optimization. Particularly, the neutrality concept has been precisely defined and the neutrality between neighboring solutions efficiently exploited in local search algorithms. The extension of neutrality to multi-objective optimization is not straightforward...
Conference Paper
Feature selection in classification can be modeled as a combinatorial optimization problem. One of the main particularities of this problem is the large amount of time that may be needed to evaluate the quality of a subset of features. In this paper, we propose to solve this problem with a tabu search algorithm integrating a learning mechanism. To...
Article
Business games are training educational tools that create virtual economic environments and put in competition several teams of participants. At each turn of the game, participants must take a set of economical decisions in order to develop the firm with the best financial performance. These decisions deal with production aspects as well as distrib...
Article
Full-text available
Classification on medical data raises several problems such as class imbalance, double meaning of missing data, volumetry or need of highly interpretable results. In this paper a new algorithm is proposed: MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data), a multi-objective local search algorithm that is conceived to deal with t...
Conference Paper
This work extends the concept of neutrality used in single-objective optimization to the multi-objective context and investigates its effects on the performance of multi-objective dominance-based local search methods. We discuss neutrality in single-objective optimization and fitness assignment in multi-objective algorithms to provide a general def...
Conference Paper
This article presents \(MO-Mine_{clust}\) a first package of the platform in development \(MO-Mine\). This platform aims at providing optimization algorithms, and in particular multi-objective approaches, to deal with classical datamining tasks (Classification, association rules...). This package \(MO-Mine_{clust}\) is dedicated to clustering. Inde...
Article
Full-text available
Background: In the context of genomic selection in animal breeding, an important objective is to look for explicative markers for a phenotype under study. The challenge of this study was to propose a model, based on a small number of markers, to predict a quantitative trait. To deal with a high number of markers, we propose using combinatorial opti...
Article
Biomedical research progresses rapidly, in particular in the area of genomic and postgenomic research. Hence many challenges appear for bio-statistics and bioinformatics to deal with the large amount of data generated. After presenting some of these challenges, this chapter aims at presenting evolutionary combinatorial optimization approaches propo...
Article
Full-text available
In train operations, a timetable is used to establish the departure and arrival times for the trains at the stations or other relevant locations in the rail network or a subset of this network. The elaboration of a timetable responds to the commercial needs of the customers, for both passenger and freight traffic, but also, it must respect some sec...
Conference Paper
A large number of rule interestingness measures have been used as objectives in multi-objective classification rule mining algorithms. Aggregation or Pareto dominance are commonly used to deal with these multiple objectives. This paper compares these approaches on a partial classification problem over discrete and imbalanced data. After performing...
Article
Full-text available
In the context of genomic selection in animal breeding, an important objective consists in looking for explicative markers for a phenotype under study. In order to deal with a high number of markers, we propose to use combinatorial optimization to perform variable selection. Results show that our approach outperforms some classical and widely used...
Article
Full-text available
The structure of the search space explains the behavior of multiobjective search algorithms, and helps to design well-performing approaches. In this work, we analyze the properties of multiobjective combinatorial search spaces, and we pay a particular attention to the correlation between the objective functions. To do so, we extend the multiobjecti...
Article
Landscape analysis has been identified as a promising way to develop efficient optimization methods. Nevertheless, the links between properties of the landscape and efficiency of methods is not easy to understand. In this article, we propose to give a contribution in this field using a vehicle routing problem as an illustration. Metaheuristics use...
Conference Paper
Full-text available
The graph coloring problem is often investigated in the literature. Many insights about many neighboring solutions with the same fitness value are raised but as far as we know, no deep analysis of this neutrality has ever been conducted in the literature. In this paper, we quantify the neutrality of some hard instances of the graph coloring problem...
Conference Paper
Full-text available
This paper focuses on the modeling and the implementation as a multi-objective optimization problem of a Pittsburgh classification rule mining algorithm adapted to large and imbalanced datasets, as encountered in hospital data. We associate to this algorithm an original post-processing method based on ROC curve to help the decision maker to choose...
Conference Paper
Full-text available
This abstract presents a modeling of the classification rule mining problem as a dominance-based multi-objective local search, with Pittsburgh solution encoding, using accuracy and the number of terms as objectives. This solution is then compared to results from literature of 22 rule mining classification algorithms.
Article
Operations research and data mining already have a long-established common history. Indeed, with the growing size of databases and the amount of data available, data mining has become crucial in modern science and industry. Data mining problems raise interesting challenges for several research domains, and in particular for operations research, as...
Preprint
Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important o...
Preprint
In multiobjective combinatorial optimization, there exists two main classes of metaheuristics, based either on multiple aggregations, or on a dominance relation. As in the single objective case, the structure of the search space can explain the difficulty for multiobjective metaheuristics, and guide the design of such methods. In this work we analy...
Preprint
Recently, the property of connectedness has been claimed to give a strong motivation on the design of local search techniques for multiobjective combinatorial optimization (MOCO). Indeed, when connectedness holds, a basic Pareto local search, initialized with at least one non-dominated solution, allows to identify the efficient set exhaustively. Ho...
Preprint
VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology proposed to deal with the neutrality property of some optimization problems. ts main feature is to consider the whole neutral network rather than an arbitrary solution. Moreover, VEGAS is designed to escape from plateaus based on the evolvability of solution and a multi-armed...
Article
Full-text available
This paper presents a new methodology that exploits specific characteristics from the fitness landscape. In particular, we are interested in the property of neutrality, that deals with the fact that the same fitness value is assigned to numerous solutions from the search space. Many combinatorial optimization problems share this property, that is g...
Preprint
In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension t...
Preprint
Fitness landscape analysis aims to understand the geometry of a given optimization problem in order to design more efficient search algorithms. However, there is a very little knowledge on the landscape of multiobjective problems. In this work, following a recent proposal by Zitzler et al. (2010), we consider multiobjective optimization as a set pr...
Article
Full-text available
Fitness landscape analysis aims to understand the geometry of a given optimization problem in order to design more efficient search algorithms. However, there is a very little knowledge on the landscape of multiobjective problems. In this work, following a recent proposal by Zitzler et al. (2010), we consider multiobjective optimization as a set pr...
Article
Full-text available
La génomique a grandement évolué avec le développement récent des technologies hautdébit en séquençage puis en génotypage. Dans le domaine animal, nous sommes aujourd’hui capables de lire les informations génomiques sur près de 800 000 marqueurs sur des ensembles d’individus de plus en plus larges (de 3 000 à 10 000). Ces données peuvent donner lie...
Article
Demand responsive transport allows customers to be carried to their destination as with a taxi service, provided that the customers are grouped in the same vehicles in order to reduce operational costs. This kind of service is related to the dial-a-ride problem. However, in order to improve the quality of service, demand responsive transport needs...
Article
Full-text available
In this article, we model a linkage disequilibrium study (genomic study) as an optimization problem where a given objective function has to be optimized. The objective of the study is to discover haplotypes (associations of genetic markers) candidate to explain multi-factorial diseases such as diabetes or obesity. To determine what kind of algorith...
Article
Full-text available
VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology proposed to deal with the neutrality property of some optimization problems. ts main feature is to consider the whole neutral network rather than an arbitrary solution. Moreover, VEGAS is designed to escape from plateaus based on the evolvability of solution and a multi-armed...
Conference Paper
Full-text available
This paper presents a new methodology that exploits specific characteristics from the fitness landscape. In particular, we are interested in the property of neutrality, that deals with the fact that the same fitness value is assigned to numerous solutions from the search space. Many combinatorial optimization problems share this property, that is g...
Conference Paper
Full-text available
In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension t...
Conference Paper
Full-text available
Cette communication expose la problématique du recrutement (inclusion) dans les essais cliniques. Après une brève présentation des données disponibles dans le système hospitalier, un scénario avec un critère d'inclusion sera présenté. L'apport du datamining et de l'optimisation combinatoire dans ce cas sera ensuite présenté, ainsi qu'une partie de...
Conference Paper
Full-text available
To solve a multiobjective combinatorial problem, two main classes of metaheuristics are used which are based either on the aggregation of criteria, or on the dominance relation between solutions. Like in single-objective optimization, the structure of the search space can explain the optimization difficulty for multiobjective metaheuristics, and gu...
Conference Paper
Full-text available
Recently, the property of connectedness has been claimed to give a strong motivation on the design of local search techniques for multiobjective combinatorial optimization (MOCO). Indeed, when connectedness holds, a basic Pareto local search, initialized with at least one non-dominated solution, allows to identify the efficient set exhaustively. Ho...
Conference Paper
Full-text available
Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important o...
Conference Paper
Full-text available
This paper deals with a dial-a-ride problem with time windows applied to a demand responsive transport service. An evolutionary approach as well as new original representation and variation operators are proposed and detailed. Such mechanisms are used with three state-of-the-art multi-objective evolutionary algorithms: NSGA-II, IBEA and SPEA2. Afte...
Conference Paper
Vehicle Routing Problems (VRP) are widely studied as they represent challenges for the future. However, most of the routing problems encountered in the literature are quite far from real life problems. Therefore, this work will be dedicated to the Heterogeneous Fixed Fleet Asymmetric Vehicle Routing Problem (HFF-AVRP), a variant of the VRP, which i...
Article
Full-text available
The Heterogeneous Fixed Fleet Asymmetric Vehicle Routing Prob- lem (HFF-AVRP) is a N P-hard optimization problem. Instances analysis and in particular, fitness landscape analysis, may help problem solving. Such anal- ysis require the definition of a distance between feasible solutions. Such a dis- tance does not exist for the HFF-AVRP and this repo...
Article
One of the most commonly-used metaphors to describe the process of heuristic search methods in solving combinatorial optimization problems is the Fitness Landscape (FiL). This landscape metaphor appears most commonly in works related to single-objective optimization: the search space can then be regarded as a spatial structure where each point (sol...
Article
Full-text available
Le problème de tournées de véhicules avec contraintes de capacité (CVRP) a pour but est de satisfaire la demande d'un ensemble de clients grâce à une flotte de véhicules ayant chacun une capacité limitée tout en minimisant la distance totale parcourue. Le problème asymétrique de tournées de véhicules avec contraintes de capacité (ACVRP) est un cas...
Chapter
Introduction Classical optimization problems Complexity Solution of hard problems Conclusion Bibliography
Article
Full-text available
In this paper we propose an exact method able to solve multi-objective combinatorial optimization problems. This method is an extension, for any number of objectives, of the 2-Parallel Partitioning Method (2-PPM) we previously proposed. Like 2-PPM, this method is based on splitting of the search space into several areas, leading to elementary searc...
Article
The present chapter aims to serve as a brief introduction for the rest of the chapters in this volume. The main goal is to provide a general overview of multi-objective combinatorial optimization, including its main basic definitions and some notions regarding the incorporation of user's preferences. Additionally, we also present short descriptions...
Article
Peptide microarray technology requires bioinformatics and statistical tools to manage, store, and analyze the large amount of data produced. To address these needs, we developed a system called protein array software environment (PASE) that provides an integrated framework to manage and analyze microarray information from polypeptide chip technolog...
Book
The purpose of this book is to collect contributions that deal with the use of nature inspired metaheuristics for solving multi-objective combinatorial optimization problems. Such a collection intends to provide an overview of the state-of-the-art developments in this field, with the aim of motivating more researchers in operations research, engine...
Conference Paper
The integration of information provided by an a priori landscape analysis as a guiding tool for interactive EMO methods is proposed. For this purpose, a new type of a priori landscape analysis is introduced, namely ellipse enclosure of the feasible solutions set in the solution space. The interaction takes place in the solution space, the user havi...
Article
Full-text available
An important task of knowledge discovery deals with discovering association rules. This very general model has been widely studied and efficient algorithms have been proposed. But most of the time, only frequent rules are seeked. Here we propose to consider this problem as a multi-objective combinatorial optimization problem in order to be able to...

Network

Cited By