
Clarisse Dhaenens- Professor (Full) at University of Lille
Clarisse Dhaenens
- Professor (Full) at University of Lille
About
163
Publications
13,669
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,033
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (163)
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a...
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a...
Biclustering is an unsupervised machine-learning technique that simultaneously clusters rows and columns in a data matrix. Over the past two decades, the field of biclustering has emerged and grown significantly, and currently plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, find...
Neural architecture search (NAS) is a subdomain of AutoML that consists of automating the design of neural networks. NAS has become a hot topic in the last few years. As a result, many methods are being developed in this area. Local search (LS), on the other hand, is a famous heuristic that has been around for many years. It is extensively used for...
Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem t...
Electronic health records (EHRs) involve heterogeneous data types such as binary, numeric and categorical attributes. As traditional clustering approaches require the definition of a single proximity measure, different data types are typically transformed into a common format or amalgamated through a single distance function. Unfortunately, this ea...
In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization...
Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem t...
Background
Multi-drug resistant (MDR) bacteria are a major health concern. In this retrospective study, a rule-based classification algorithm, MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data) is used to identify hospitalized patients at risk of testing positive for multidrug-resistant (MDR) bacteria, including Methicillin-resis...
Background
Multi-drug resistant (MDR) bacteria are a major health concern. In this retrospective study, a rule-based classification algorithm, MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data) is used to identify hospitalized patients at risk of testing positive for multidrug-resistant (MDR) bacteria, including Methicillin-resis...
We address the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, symbolic, temporal). We propose a new method, HBC-t (Heterogeneous BiClustering for temporal data), designed to extract biclusters from heterogeneous, temporal, large-scale, sparse data matrices. HBC-t is based on HBC, using similar mechan...
In the context of big data, many scientific communities aim to provide efficient approaches to accommodate large-scale datasets. This is the case of the machine-learning community, and more generally, the artificial intelligence community. The aim of this article is to explain how data mining problems can be considered as combinatorial optimization...
The no-wait flowshop scheduling problem is a variant of the classical permutation flowshop problem, with the additional constraint that jobs have to be processed by the successive machines without waiting time. To efficiently address this NP-hard combinatorial optimization problem we conduct an analysis of the structure of good quality solutions. T...
A similarity measure or distance is a successful key in data mining process and knowledge discovery including biclustering. Many measures are proposed on the specific data type. However, there are different types and characteristics of data in real world. In the paper, we study the performance of a variety of measures on different kind of data and...
Context: A better understanding of "patient pathway" thanks to data analysis can lead to better treatments for patients. The ClinMine project, supported by the French National Research Agency (ANR), aims at proposing, from various case studies, algorithmic and statistical models able to handle this type of pathway data, focusing primarily on hospit...
Constructive heuristics have a great interest as they manage to find in a very short time, solutions of relatively good quality. Such solutions may be used as initial solutions for metaheuristics for example. In this work, we propose a new efficient constructive heuristic for the No-Wait Flowshop Scheduling Problem. This proposed heuristic is based...
In multi-objective optimization approaches, considering neutral neighbors during the exploration has already proved its efficiency. The aim of this article is to go further in the comprehensibility of neutrality. In particular, we propose a definition of most promising neutral neighbors and study in details their distribution within neutral neighbo...
This study focuses on the problem of supervised classification on heterogeneous temporal data featuring a mixture of attribute types (numeric, binary, symbolic, temporal). We present a model for classification rules designed to use both non-temporal attributes and sequences of temporal events as predicates. We also propose an efficient local search...
Supervised classification and mathematical optimization have strong links. This chapter first provides a description of the classification task and a presentation of standard classification methods. Later, it discusses the use of metaheuristics to optimize those standard classification methods. The chapter focuses on the use of metaheuristics for t...
Frameworks are useful tools that can speed up the development of optimization-based problem solving projects, reducing their development time and costs. They may also be applied by non-expert users as well to extend the user base and the applications scope for metaheuristics techniques. The objective of this chapter is to briefly present the tools...
This chapter first provides some basic information on combinatorial optimization problems and their resolution methods. Solving a combinatorial optimization problem requires three main points: definition of the set of feasible solutions, determination of the objective function to optimize, and choice of the optimization method. The chapter then foc...
Clustering is one of the most commonly used descriptive tasks in data mining. In partition-based clustering, the objective is to partition a dataset into K disjoint sets of points, such that points of a set are as homogeneous as possible. There exist two distinct types of hierarchical methods the agglomerative ones, which start with singleton clust...
This chapter explains how feature selection can be realized with metaheuristics and why metaheuristics can help realize the feature selection task in a Big Data context. Feature selection, also known as variable selection, attribute selection or variable subset selection, aims at selecting an optimum relevant set of features or attributes that are...
Association rules mining is a widely used approach for discovering interesting relationships between columns of large databases. The first applications for such an approach dealt with the market-basket analysis problem, where the aim is to identify a set of items frequently purchased simultaneously in the same transaction. Since this first applicat...
The growth of data and the need for performance lead to the need for power. In computer science, to obtain power, it is common to use parallelism and parallel computation. Parallelism is often associated with Big Data. For metaheuristics, it is also very common as parallelized metaheuristics allow us not only to solve larger problems, but also to o...
This chapter provides definitions of Big Data, the main challenges induced by this context, and focuses on Big Data analytics. Big Data enables organizations to store, manage and manipulate vast amounts of data at the right speed and at the right time to get the right insights. Many companies use one or several relational database management system...
We define the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, etc.). This problem has not yet been investigated in the biclustering literature. We propose a new method, HBC (Heterogeneous BiClustering), designed to extract biclusters from heterogeneous, large-scale, sparse data matrices. The goal of t...
Big Data is a new field, with many technological challenges to be understood in order to use it to its full potential. These challenges arise at all stages of working with Big Data, beginning with data generation and acquisition. The storage and management phase presents two critical challenges: infrastructure, for storage and transportation, and c...
There is a significant body of research on neutrality and its effects in single-objective optimization. Particularly, the neutrality concept has been precisely defined and the neutrality between neighboring solutions efficiently exploited in local search algorithms. The extension of neutrality to multi-objective optimization is not straightforward...
Feature selection in classification can be modeled as a combinatorial optimization problem. One of the main particularities of this problem is the large amount of time that may be needed to evaluate the quality of a subset of features. In this paper, we propose to solve this problem with a tabu search algorithm integrating a learning mechanism. To...
Business games are training educational tools that create virtual economic environments and put in competition several teams of participants. At each turn of the game, participants must take a set of economical decisions in order to develop the firm with the best financial performance. These decisions deal with production aspects as well as distrib...
Classification on medical data raises several problems such as class imbalance, double meaning of missing data, volumetry or need of highly interpretable results. In this paper a new algorithm is proposed: MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data), a multi-objective local search algorithm that is conceived to deal with t...
This work extends the concept of neutrality used in single-objective optimization to the multi-objective context and investigates its effects on the performance of multi-objective dominance-based local search methods. We discuss neutrality in single-objective optimization and fitness assignment in multi-objective algorithms to provide a general def...
This article presents \(MO-Mine_{clust}\) a first package of the platform in development \(MO-Mine\). This platform aims at providing optimization algorithms, and in particular multi-objective approaches, to deal with classical datamining tasks (Classification, association rules...). This package \(MO-Mine_{clust}\) is dedicated to clustering. Inde...
Background: In the context of genomic selection in animal breeding, an
important objective is to look for explicative markers for a phenotype under
study. The challenge of this study was to propose a model, based on a small
number of markers, to predict a quantitative trait. To deal with a high number of
markers, we propose using combinatorial opti...
Biomedical research progresses rapidly, in particular in the area of genomic and postgenomic research. Hence many challenges appear for bio-statistics and bioinformatics to deal with the large amount of data generated. After presenting some of these challenges, this chapter aims at presenting evolutionary combinatorial optimization approaches propo...
In train operations, a timetable is used to establish the departure and arrival times for the trains at the stations or other relevant locations in the rail network or a subset of this network. The elaboration of a timetable responds to the commercial needs of the customers, for both passenger and freight traffic, but also, it must respect some sec...
A large number of rule interestingness measures have been used as objectives in multi-objective classification rule mining algorithms. Aggregation or Pareto dominance are commonly used to deal with these multiple objectives. This paper compares these approaches on a partial classification problem over discrete and imbalanced data. After performing...
In the context of genomic selection in animal breeding, an
important objective consists in looking for explicative markers for a phenotype
under study. In order to deal with a high number of markers, we
propose to use combinatorial optimization to perform variable selection.
Results show that our approach outperforms some classical and widely
used...
The structure of the search space explains the behavior of multiobjective search algorithms, and helps to design well-performing approaches. In this work, we analyze the properties of multiobjective combinatorial search spaces, and we pay a particular attention to the correlation between the objective functions. To do so, we extend the multiobjecti...
Landscape analysis has been identified as a promising way to develop efficient optimization methods. Nevertheless, the links between properties of the landscape and efficiency of methods is not easy to understand. In this article, we propose to give a contribution in this field using a vehicle routing problem as an illustration. Metaheuristics use...
The graph coloring problem is often investigated in the literature. Many
insights about many neighboring solutions with the same fitness value are
raised but as far as we know, no deep analysis of this neutrality has ever been
conducted in the literature. In this paper, we quantify the neutrality of some
hard instances of the graph coloring problem...
This paper focuses on the modeling and the implementation as a multi-objective optimization problem of a Pittsburgh classification rule mining algorithm adapted to large and imbalanced datasets, as encountered in hospital data. We associate to this algorithm an original post-processing method based on ROC curve to help the decision maker to choose...
This abstract presents a modeling of the classification rule mining problem as a dominance-based multi-objective local search, with Pittsburgh solution encoding, using accuracy and the number of terms as objectives. This solution is then compared to results from literature of 22 rule mining classification algorithms.
Operations research and data mining already have a long-established common history. Indeed, with the growing size of databases and the amount of data available, data mining has become crucial in modern science and industry. Data mining problems raise interesting challenges for several research domains, and in particular for operations research, as...
Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important o...
In multiobjective combinatorial optimization, there exists two main classes of metaheuristics, based either on multiple aggregations, or on a dominance relation. As in the single objective case, the structure of the search space can explain the difficulty for multiobjective metaheuristics, and guide the design of such methods. In this work we analy...
Recently, the property of connectedness has been claimed to give a strong motivation on the design of local search techniques for multiobjective combinatorial optimization (MOCO). Indeed, when connectedness holds, a basic Pareto local search, initialized with at least one non-dominated solution, allows to identify the efficient set exhaustively. Ho...
VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology proposed to deal with the neutrality property of some optimization problems. ts main feature is to consider the whole neutral network rather than an arbitrary solution. Moreover, VEGAS is designed to escape from plateaus based on the evolvability of solution and a multi-armed...
This paper presents a new methodology that exploits specific characteristics from the fitness landscape. In particular, we are interested in the property of neutrality, that deals with the fact that the same fitness value is assigned to numerous solutions from the search space. Many combinatorial optimization problems share this property, that is g...
In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension t...
Fitness landscape analysis aims to understand the geometry of a given optimization problem in order to design more efficient search algorithms. However, there is a very little knowledge on the landscape of multiobjective problems. In this work, following a recent proposal by Zitzler et al. (2010), we consider multiobjective optimization as a set pr...
Fitness landscape analysis aims to understand the geometry of a given optimization problem in order to design more efficient search algorithms. However, there is a very little knowledge on the landscape of multiobjective problems. In this work, following a recent proposal by Zitzler et al. (2010), we consider multiobjective optimization as a set pr...
La génomique a grandement évolué avec le développement récent des technologies hautdébit en séquençage puis en génotypage. Dans le domaine animal, nous sommes aujourd’hui
capables de lire les informations génomiques sur près de 800 000 marqueurs sur des ensembles
d’individus de plus en plus larges (de 3 000 à 10 000). Ces données peuvent donner lie...
Demand responsive transport allows customers to be carried to their destination as with a taxi service, provided that the customers are grouped in the same vehicles in order to reduce operational costs. This kind of service is related to the dial-a-ride problem. However, in order to improve the quality of service, demand responsive transport needs...
In this article, we model a linkage disequilibrium study (genomic study) as an optimization problem where a given objective function has to be optimized. The objective of the study is to discover haplotypes (associations of genetic markers) candidate to explain multi-factorial diseases such as diabetes or obesity. To determine what kind of algorith...
VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology proposed to deal with the neutrality property of some optimization problems. ts main feature is to consider the whole neutral network rather than an arbitrary solution. Moreover, VEGAS is designed to escape from plateaus based on the evolvability of solution and a multi-armed...
This paper presents a new methodology that exploits specific
characteristics from the fitness landscape. In particular, we are
interested in the property of neutrality, that deals with the fact that
the same fitness value is assigned to numerous solutions from the search
space. Many combinatorial optimization problems share this property,
that is g...
In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension t...
Cette communication expose la problématique du recrutement (inclusion) dans les essais cliniques. Après une brève présentation des données disponibles dans le système hospitalier, un scénario avec un critère d'inclusion sera présenté. L'apport du datamining et de l'optimisation combinatoire dans ce cas sera ensuite présenté, ainsi qu'une partie de...
To solve a multiobjective combinatorial problem, two main classes of metaheuristics are used which are based either on the aggregation of criteria, or on the dominance relation between solutions. Like in single-objective optimization, the structure of the search space can explain the optimization difficulty for multiobjective metaheuristics, and gu...
Recently, the property of connectedness has been claimed to give a strong motivation on the design of local search techniques for multiobjective combinatorial optimization (MOCO). Indeed, when connectedness holds, a basic Pareto local search, initialized with at least one non-dominated solution, allows to identify the efficient set exhaustively. Ho...
Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important o...
This paper deals with a dial-a-ride problem with time windows applied to a demand responsive transport service. An evolutionary approach as well as new original representation and variation operators are proposed and detailed. Such mechanisms are used with three state-of-the-art multi-objective evolutionary algorithms: NSGA-II, IBEA and SPEA2. Afte...
Vehicle Routing Problems (VRP) are widely studied as they represent challenges for the future. However, most of the routing problems encountered in the literature are quite far from real life problems. Therefore, this work will be dedicated to the Heterogeneous Fixed Fleet Asymmetric Vehicle Routing Problem (HFF-AVRP), a variant of the VRP, which i...
The Heterogeneous Fixed Fleet Asymmetric Vehicle Routing Prob- lem (HFF-AVRP) is a N P-hard optimization problem. Instances analysis and in particular, fitness landscape analysis, may help problem solving. Such anal- ysis require the definition of a distance between feasible solutions. Such a dis- tance does not exist for the HFF-AVRP and this repo...
One of the most commonly-used metaphors to describe the process of heuristic search methods in solving combinatorial optimization problems is the Fitness Landscape (FiL). This landscape metaphor appears most commonly in works related to single-objective optimization: the search space can then be regarded as a spatial structure where each point (sol...
Le problème de tournées de véhicules avec contraintes de capacité (CVRP) a pour but est de satisfaire la demande d'un ensemble de clients grâce à une flotte de véhicules ayant chacun une capacité limitée tout en minimisant la distance totale parcourue. Le problème asymétrique de tournées de véhicules avec contraintes de capacité (ACVRP) est un cas...
Introduction Classical optimization problems Complexity Solution of hard problems Conclusion Bibliography
In this paper we propose an exact method able to solve multi-objective combinatorial optimization problems. This method is an extension, for any number of objectives, of the 2-Parallel Partitioning Method (2-PPM) we previously proposed. Like 2-PPM, this method is based on splitting of the search space into several areas, leading to elementary searc...
The present chapter aims to serve as a brief introduction for the rest of the chapters in this volume. The main goal is to provide a general overview of multi-objective combinatorial optimization, including its main basic definitions and some notions regarding the incorporation of user's preferences. Additionally, we also present short descriptions...
Peptide microarray technology requires bioinformatics and statistical tools to manage, store, and analyze the large amount of data produced. To address these needs, we developed a system called protein array software environment (PASE) that provides an integrated framework to manage and analyze microarray information from polypeptide chip technolog...
The purpose of this book is to collect contributions that deal with the use of nature inspired metaheuristics for solving multi-objective combinatorial optimization problems. Such a collection intends to provide an overview of the state-of-the-art developments in this field, with the aim of motivating more researchers in operations research, engine...
The integration of information provided by an a priori landscape analysis as a guiding tool for interactive EMO methods is proposed. For this purpose, a new type of a priori landscape analysis is introduced, namely ellipse enclosure of the feasible solutions set in the solution space. The interaction takes place in the solution space, the user havi...
in: JOBIM : Journées Ouvertes Biologie Informatique Mathématiques, 2008
An important task of knowledge discovery deals with discovering association rules. This very general model has been widely studied and efficient algorithms have been proposed. But most of the time, only frequent rules are seeked. Here we propose to consider this problem as a multi-objective combinatorial optimization problem in order to be able to...