Vincent Lemaire

Vincent Lemaire
Orange Labs · Orange Labs Research

PhD - HDR

About

219
Publications
96,418
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,399
Citations
Introduction
http://www.vincentlemaire-labs.fr/ Main topics : Machine Learning, Data Science, Neural Networks, Time Series Classification, Model Interpretation, ...
Additional affiliations
January 2004 - December 2018
Orange Labs
Position
  • Orange Labs
Education
December 2008 - December 2008
University of Paris-Sud
Field of study
  • Computer Science
October 1996 - September 1999
Sorbonne University
Field of study
  • Computer Science

Publications

Publications (219)
Conference Paper
Full-text available
The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of “supervision deficiencies”. In WSL use cases, a variety of situations exists where the collected “information” is imperfect. The paradigm of WSL attempts to list and cover these problems with associated solutions...
Preprint
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Preprint
Full-text available
Novel Class Discovery (NCD) is a growing field where we are given during training a labeled set of known classes and an unlabeled set of different classes that must be discovered. In recent years, many methods have been proposed to address this problem, and the field has begun to mature. In this paper, we provide a comprehensive survey of the state...
Article
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Preprint
Full-text available
(This paper is now published in TMLR 2024). Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framewor...
Preprint
Full-text available
Recent research in machine learning has given rise to a flourishing literature on the quantification and decomposition of model uncertainty. This information can be very useful during interactions with the learner, such as in active learning or adaptive learning, and especially in uncertainty sampling. To allow a simple representation of these tota...
Preprint
Full-text available
\texttt{ml\_edm} is a Python 3 library, designed for early decision making of any learning tasks involving temporal/sequential data. The package is also modular, providing researchers an easy way to implement their own triggering strategy for classification, regression or any machine learning task. As of now, many Early Classification of Time Serie...
Preprint
Full-text available
Quantitative systems pharmacology (QSP) models of cancer immunity offer a mechanistic understanding of cellular dynamics and drug effects that are often challenging to investigate clinically. Despite their success, these models are limited by their inability to mechanistically represent patient survival as an output, which restricts their utility i...
Article
Full-text available
Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considere...
Preprint
Full-text available
In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known...
Article
Full-text available
The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number o...
Preprint
Full-text available
(Paper Accepted at IJCNN 2024) - There are now many comprehension algorithms for understanding the decisions of a machine learning algorithm. Among these are those based on the generation of counterfactual examples. This article proposes to view this generation process as a source of creating a certain amount of knowledge that can be stored to be u...
Chapter
Time series segmentation (TSS) is a research problem that focuses on dividing long multivariate sensor data into smaller, homogeneous subsequences. This task is critical for various real-world data analysis applications, such as energy consumption monitoring, climate change assessment, and human activity recognition (HAR). Despite its importance, e...
Preprint
Full-text available
The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number o...
Preprint
Full-text available
Recent research in active learning, and more precisely in uncertainty sampling, has focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, we propose to simplify the computational phase and remove the dependence on observations, but more importantly to take into account the uncertainty already...
Article
Full-text available
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such a...
Conference Paper
Full-text available
(Preprint version) - Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algor...
Conference Paper
Full-text available
Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes. The majority of NCD methods proposed so far only deal with image data, despite tabular data being among the most widely used type of data in practical applications. To interpret the results of clu...
Preprint
Full-text available
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such a...
Preprint
Full-text available
The democratization of Data Mining has been widely successful thanks in part to powerful and easy-to-use Machine Learning libraries. These libraries have been particularly tailored to tackle Supervised Learning. However, strong supervision signals are scarce in practice, and practitioners must resort to weak supervision. In addition to weaknesses o...
Preprint
Full-text available
This paper has been accepted at the workshop AIMLAI of ECML-PKDD 2023 - "Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms availab...
Preprint
Full-text available
Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes. The majority of NCD methods proposed so far only deal with image data, despite tabular data being among the most widely used type of data in practical applications. To interpret the results of clu...
Conference Paper
Full-text available
In this paper we show that the combination of a Contrastive representation with a label noise-robust classification head requires fine-tuning the representation in order to achieve state-of-the-art performances. Since fine-tuned representations are shown to outperform frozen ones, one can conclude that noise-robust classification heads are indeed a...
Preprint
Full-text available
This paper has been accepted at IJCNN 2023 - Time Series Classification (TSC) has received much attention in the past two decades and is still a crucial and challenging problem in data science and knowledge engineering. Indeed, along with the increasing availability of time series data, many TSC algorithms have been suggested by the research commun...
Book
Full-text available
Proceedings de l'atelier TextMine 2023. Le but de cet atelier est de réunir des chercheurs sur la thématique large de la fouille de textes. Cet atelier vise à offrir une occasion de rencontres pour les universitaires et les industriels, appartenant aux différentes communautés de l'intelligence artificielle, l'apprentissage automatique, le traitemen...
Conference Paper
Full-text available
Dans le domaine du Novel Class Discovery (NCD), le but est de trouver de nouvelles classes dans un ensemble non étiqueté lorsqu'un ensemble étiqueté de classes connues mais différentes est disponible. Bien que le NCD ait récemment attiré l'attention de la communauté scientifique, aucune solution n'a encore été proposée pour les données tabulaires,...
Conference Paper
Hospitals face high occupation rates resulting in a longer boarding time and more complex bed management. This task could be facilitated by anticipating the unscheduled admissions. We study the capability of information from French electronic health records of an emergency department (ED) to predict patient disposition decisions. We compare the per...
Preprint
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Presentation
Full-text available
Presentation : brief introduction to "weakly supervised learning" and then a focus on "active learning" For more details see : https://www.researchgate.net/publication/354719650_From_Weakly_Supervised_Learning_to_Biquality_Learning_an_Introduction
Conference Paper
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Chapter
In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is...
Preprint
Full-text available
This paper has been published in SIGKDD Newsletter exploration (december 2022) . ..... More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Suc...
Preprint
Full-text available
Learning to predict ahead of time events in open time series is challenging. While Early Classification of Time Series (ECTS) tackles the problem of balancing online the accuracy of the prediction with the cost of delaying the decision when the individuals are time series of finite length with a unique label for the whole time series. Surprisingly,...
Conference Paper
Full-text available
Cet article propose une vision originale et globale de l'Apprentissage Faiblement Supervisé, menant à la conception d'approches génériques capable de traiter tout type de faiblesses en supervision. Un nouveau cadre appelé "Données Biqualité" est introduit, qui suppose qu'un petit jeu de données fiable d'exemples correctement étiquetés est disponibl...
Conference Paper
Full-text available
Cet article propose une méthode de création automatique de variables (pour la régression) qui viennent compléter les informations contenues dans le vecteur initial des variables explicatives. Notre méthode fonctionne comme une étape de prétraitement dans laquelle les valeurs continues de la variable a régresser sont discrétisées en un ensemble d'in...
Conference Paper
Due to an ever-increasing demand for analyzing the large volumes of information issuing from high-speed data streams, multi-label stream classification is replacing the traditional offline multi-label classification system and has thus become a focal point in recent years. In this paper, we propose a new algorithm for multi-label stream classificat...
Preprint
Full-text available
This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value t...
Conference Paper
En apprentissage automatique, la performance d’un modèle supervisé dépend souvent du volume de données étiquetées. Entraîner un modèle sur un grand nombre de données nécessite donc l’étiquetage de beaucoup d’observations et requiert souvent une expertise coûteuse en temps et en argent. Une des solutions consiste alors à externaliser le travail d’ét...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of itssignificance in a wide range of applications including healthcare, transportation and fi-nance. Until now, the early classification problem has been dealt with by considering onlyirrevocable decisions. This paper introduces a new problem calledearly and revoca...
Conference Paper
Full-text available
Active learning is a subfield of machine learning which allows to reduce the amount of data necessary to train a classifier. The training set is built in an iterative way such that only the most significant and informative data are used and labeled by an external person called oracle. It is furthermore possible to use active learning with the theor...
Book
Full-text available
Science, technology, and commerce increasingly recognise the importance of ma- chine learning approaches for data-intensive, evidence-based decision making. This is accompanied by increasing numbers of machine learning applications and volumes of data. Nevertheless, the capacities of processing systems or hu- man supervisors or domain experts remai...
Preprint
Full-text available
This paper has been accepted at the IAL@ECML Workshop 2021 (https://www.activeml.net/ial2021/index.html) -------- "In this paper we show that the combination of a Contrastive representation with a label noise-robust classification head requires fine-tuning the representation in order to achieve state-of-the-art performances. Since fine-tuned repres...
Conference Paper
Full-text available
In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is...
Preprint
Full-text available
https://arxiv.org/abs/2012.09632 (this paper has been accepted at IJCNN 2021) The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies". In WSL use cases, a variety of situations exists where the collected "information" is imperfect. The parad...
Conference Paper
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as \textit{time series extrinsic regression} (TSER). In the literature, some well-k...
Chapter
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance. However, recently a preprint saved on Arxiv claim that all research done for almost 20 years now on the Early Classification of Time Series is useless, or, at the v...
Presentation
Full-text available
This talk gives a 'brief overview' of Weakly supervised learning. The choice was made to present things in a hierarchical way for simplicity because it is more 'didactic'. But the view via the cube on the last slide is more appropriate, more general. For more details see : https://www.researchgate.net/publication/354719650_From_Weakly_Supervised_L...
Presentation
Full-text available
This a talk about some insights of technical aspects of Khiops Interpretation for the Inria Team 'Lacodam' - March 2021 you may find other details about this tool on http://vincentlemaire-labs.fr/iki.html
Preprint
Full-text available
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Book
Full-text available
C'est une évidence que de dire que nous sommes entrés dans une ère où la donnée textuelle sous toute ses formes submerge chacun de nous que ce soit dans son environnement personnel ou professionnel : l'augmentation croissante de documents nécessaires aux entreprises ou aux administrations, la profusion de données textuelles disponibles via Internet...
Book
This book constitutes the refereed proceedings of the 6th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2021, held during September 13-17, 2021. The workshop was planned to take place in Bilbao, Spain, but was held virtually due to the COVID-19 pandemic. The 12 full papers presented in this book were carefully review...
Preprint
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Preprint
Full-text available
This paper has been published at the Workshop IAL@ECML (http://ceur-ws.org/Vol-2660/ialatecml_paper3.pdf) -- Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label. Although this field is quite old, several important challenges to using active learning in real-world settings still remain un...
Preprint
Full-text available
Paper accepted at the Workshop « Data Quality Assessment for Machine Learning (DQAML)” SIGKDD 2021 --- In some industrial application as fraud detection common supervision techniques may not be efficient because they rely on the quality of labels. In concrete cases, these labels may be weak in quantity, quality or trustworthiness. We propose a ben...
Preprint
Full-text available
https://arxiv.org/abs/2010.09621 (this paper has been accepted at IJCNN 2021). The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of ``supervision deficiencies'', namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label n...
Chapter
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Conference Paper
Paper available here : https://link.springer.com/chapter/10.1007/978-3-030-59065-9_25 --- Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle wi...
Conference Paper
Full-text available
Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label. Although this field is quite old, several important challenges to using active learning in real-world settings still remain unsolved. In particular, most selection strategies are hand-designed, and it has become clear that there is no...
Chapter
Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle with MTSC: a relational way. The multiple dimensions of MTS are represented in a relational d...
Preprint
Full-text available
In some application areas, the ability to understand (describe) the results given by a classifier is as an important condition as its predictive performance is. In this case, the classifier is considered as important if it can produce comprehensible results with a good predictive performance. This is referred as a trade-off “interpretation vs. perf...
Conference Paper
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Conference Paper
Full-text available
Cet article présente une méthode de classification de séries temporelle qui sélectionne des représentations alternatives (telles que les dérivées, les intégrales cumulatives, le spectre de puissance) et en extrait des descripteurs informatifs. L'approche proposée est décomposée en trois étapes : i) les séries temporelles originales sont transformée...
Chapter
Full-text available
We address the problem of event classification for proactive fiber break detection in high-speed optical communication systems. The proposed approach is based on monitoring the State of Polarization (SOP) via digital signal processing in a coherent receiver. We describe in details the design of a classifier providing interpretable decision rules an...
Book
This book constitutes the refereed proceedings of the 4th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2019, held in Würzburg, Germany, in September 2019. The 7 full papers presented together with 9 poster papers were carefully reviewed and selected from 31 submissions. The papers cover topics such as temporal data...
Book
This book constitutes the refereed proceedings of the 4th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2019, held in Ghent, Belgium, in September 2020. The 15 full papers presented in this book were carefully reviewed and selected from 29 submissions. The selected papers are devoted to topics such as Temporal Data C...
Research Proposal
Full-text available
Intitulé du Stage Comparaison de méthodes d'apprentissage faiblement supervisées dans le cas de la fraude - Mission: Le contexte général du stage est la classification faiblement supervisée [1] dans le cas de la fraude (classes très déséquilibrées et bruit d'étiquetage). De nombreux services distribués par Orange peuvent faire l'objet de tentatives...