Vincent Lemaire

Vincent Lemaire
Orange Labs · Orange Labs Research

PhD - HDR

About

195
Publications
85,437
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,155
Citations
Citations since 2017
81 Research Items
683 Citations
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
Introduction
I work on "Machine Learning" and "Data Mining" methods. http://www.vincentlemaire-labs.fr/ Main topics : Machine Learning, Data Mining, Stream Mining, Supervised Clustering, Statistics, Neural Networks
Additional affiliations
January 2004 - December 2018
Orange Labs
Position
  • Orange Labs
Education
December 2008 - December 2008
Université Paris-Sud 11
Field of study
  • Computer Science
October 1996 - September 1999
Sorbonne Université
Field of study
  • Computer Science

Publications

Publications (195)
Article
Full-text available
Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade,...
Preprint
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Preprint
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Preprint
Full-text available
Novel Class Discovery (NCD) is a growing field where we are given during training a labeled set of known classes and an unlabeled set of different classes that must be discovered. In recent years, many methods have been proposed to address this problem, and the field has begun to mature. In this paper, we provide a comprehensive survey of the state...
Book
Full-text available
Proceedings de l'atelier TextMine 2023. Le but de cet atelier est de réunir des chercheurs sur la thématique large de la fouille de textes. Cet atelier vise à offrir une occasion de rencontres pour les universitaires et les industriels, appartenant aux différentes communautés de l'intelligence artificielle, l'apprentissage automatique, le traitemen...
Conference Paper
Full-text available
Dans le domaine du Novel Class Discovery (NCD), le but est de trouver de nouvelles classes dans un ensemble non étiqueté lorsqu'un ensemble étiqueté de classes connues mais différentes est disponible. Bien que le NCD ait récemment attiré l'attention de la communauté scientifique, aucune solution n'a encore été proposée pour les données tabulaires,...
Conference Paper
Hospitals face high occupation rates resulting in a longer boarding time and more complex bed management. This task could be facilitated by anticipating the unscheduled admissions. We study the capability of information from French electronic health records of an emergency department (ED) to predict patient disposition decisions. We compare the per...
Article
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Preprint
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Presentation
Full-text available
Presentation : brief introduction to "weakly supervised learning" and then a focus on "active learning" For more details see : https://www.researchgate.net/publication/354719650_From_Weakly_Supervised_Learning_to_Biquality_Learning_an_Introduction
Conference Paper
Full-text available
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNC...
Preprint
Full-text available
Learning to predict ahead of time events in open time series is challenging. While Early Classification of Time Series (ECTS) tackles the problem of balancing online the accuracy of the prediction with the cost of delaying the decision when the individuals are time series of finite length with a unique label for the whole time series. Surprisingly,...
Conference Paper
Full-text available
Cet article propose une vision originale et globale de l'Apprentissage Faiblement Supervisé, menant à la conception d'approches génériques capable de traiter tout type de faiblesses en supervision. Un nouveau cadre appelé "Données Biqualité" est introduit, qui suppose qu'un petit jeu de données fiable d'exemples correctement étiquetés est disponibl...
Conference Paper
Full-text available
Cet article propose une méthode de création automatique de variables (pour la régression) qui viennent compléter les informations contenues dans le vecteur initial des variables explicatives. Notre méthode fonctionne comme une étape de prétraitement dans laquelle les valeurs continues de la variable a régresser sont discrétisées en un ensemble d'in...
Conference Paper
Due to an ever-increasing demand for analyzing the large volumes of information issuing from high-speed data streams, multi-label stream classification is replacing the traditional offline multi-label classification system and has thus become a focal point in recent years. In this paper, we propose a new algorithm for multi-label stream classificat...
Preprint
Full-text available
This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value t...
Conference Paper
En apprentissage automatique, la performance d’un modèle supervisé dépend souvent du volume de données étiquetées. Entraîner un modèle sur un grand nombre de données nécessite donc l’étiquetage de beaucoup d’observations et requiert souvent une expertise coûteuse en temps et en argent. Une des solutions consiste alors à externaliser le travail d’ét...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of itssignificance in a wide range of applications including healthcare, transportation and fi-nance. Until now, the early classification problem has been dealt with by considering onlyirrevocable decisions. This paper introduces a new problem calledearly and revoca...
Conference Paper
Full-text available
Active learning is a subfield of machine learning which allows to reduce the amount of data necessary to train a classifier. The training set is built in an iterative way such that only the most significant and informative data are used and labeled by an external person called oracle. It is furthermore possible to use active learning with the theor...
Book
Full-text available
Science, technology, and commerce increasingly recognise the importance of ma- chine learning approaches for data-intensive, evidence-based decision making. This is accompanied by increasing numbers of machine learning applications and volumes of data. Nevertheless, the capacities of processing systems or hu- man supervisors or domain experts remai...
Preprint
Full-text available
This paper has been accepted at the IAL@ECML Workshop 2021 (https://www.activeml.net/ial2021/index.html) -------- "In this paper we show that the combination of a Contrastive representation with a label noise-robust classification head requires fine-tuning the representation in order to achieve state-of-the-art performances. Since fine-tuned repres...
Conference Paper
Full-text available
In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is...
Preprint
Full-text available
https://arxiv.org/abs/2012.09632 (this paper has been accepted at IJCNN 2021) The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies". In WSL use cases, a variety of situations exists where the collected "information" is imperfect. The parad...
Conference Paper
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as \textit{time series extrinsic regression} (TSER). In the literature, some well-k...
Chapter
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance. However, recently a preprint saved on Arxiv claim that all research done for almost 20 years now on the Early Classification of Time Series is useless, or, at the v...
Presentation
Full-text available
This talk gives a 'brief overview' of Weakly supervised learning. The choice was made to present things in a hierarchical way for simplicity because it is more 'didactic'. But the view via the cube on the last slide is more appropriate, more general. For more details see : https://www.researchgate.net/publication/354719650_From_Weakly_Supervised_L...
Presentation
Full-text available
This a talk about some insights of technical aspects of Khiops Interpretation for the Inria Team 'Lacodam' - March 2021 you may find other details about this tool on http://vincentlemaire-labs.fr/iki.html
Preprint
Full-text available
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Book
Full-text available
C'est une évidence que de dire que nous sommes entrés dans une ère où la donnée textuelle sous toute ses formes submerge chacun de nous que ce soit dans son environnement personnel ou professionnel : l'augmentation croissante de documents nécessaires aux entreprises ou aux administrations, la profusion de données textuelles disponibles via Internet...
Book
This book constitutes the refereed proceedings of the 6th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2021, held during September 13-17, 2021. The workshop was planned to take place in Bilbao, Spain, but was held virtually due to the COVID-19 pandemic. The 12 full papers presented in this book were carefully review...
Preprint
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Preprint
Full-text available
This paper has been published at the Workshop IAL@ECML (http://ceur-ws.org/Vol-2660/ialatecml_paper3.pdf) -- Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label. Although this field is quite old, several important challenges to using active learning in real-world settings still remain un...
Preprint
Full-text available
Paper accepted at the Workshop « Data Quality Assessment for Machine Learning (DQAML)” SIGKDD 2021 --- In some industrial application as fraud detection common supervision techniques may not be efficient because they rely on the quality of labels. In concrete cases, these labels may be weak in quantity, quality or trustworthiness. We propose a ben...
Preprint
Full-text available
https://arxiv.org/abs/2010.09621 (this paper has been accepted at IJCNN 2021). The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of ``supervision deficiencies'', namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label n...
Chapter
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Conference Paper
Paper available here : https://link.springer.com/chapter/10.1007/978-3-030-59065-9_25 --- Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle wi...
Conference Paper
Full-text available
Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label. Although this field is quite old, several important challenges to using active learning in real-world settings still remain unsolved. In particular, most selection strategies are hand-designed, and it has become clear that there is no...
Chapter
Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle with MTSC: a relational way. The multiple dimensions of MTS are represented in a relational d...
Preprint
Full-text available
In some application areas, the ability to understand (describe) the results given by a classifier is as an important condition as its predictive performance is. In this case, the classifier is considered as important if it can produce comprehensible results with a good predictive performance. This is referred as a trade-off “interpretation vs. perf...
Conference Paper
Full-text available
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds...
Conference Paper
Full-text available
Cet article présente une méthode de classification de séries temporelle qui sélectionne des représentations alternatives (telles que les dérivées, les intégrales cumulatives, le spectre de puissance) et en extrait des descripteurs informatifs. L'approche proposée est décomposée en trois étapes : i) les séries temporelles originales sont transformée...
Chapter
Full-text available
We address the problem of event classification for proactive fiber break detection in high-speed optical communication systems. The proposed approach is based on monitoring the State of Polarization (SOP) via digital signal processing in a coherent receiver. We describe in details the design of a classifier providing interpretable decision rules an...
Book
This book constitutes the refereed proceedings of the 4th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2019, held in Würzburg, Germany, in September 2019. The 7 full papers presented together with 9 poster papers were carefully reviewed and selected from 31 submissions. The papers cover topics such as temporal data...
Book
This book constitutes the refereed proceedings of the 4th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2019, held in Ghent, Belgium, in September 2020. The 15 full papers presented in this book were carefully reviewed and selected from 29 submissions. The selected papers are devoted to topics such as Temporal Data C...
Research Proposal
Full-text available
Intitulé du Stage Comparaison de méthodes d'apprentissage faiblement supervisées dans le cas de la fraude - Mission: Le contexte général du stage est la classification faiblement supervisée [1] dans le cas de la fraude (classes très déséquilibrées et bruit d'étiquetage). De nombreux services distribués par Orange peuvent faire l'objet de tentatives...
Research Proposal
Intitulé du Stage Apprentissage Actif sur Données Transactionnelles - Mission: Le contexte général du stage est l'utilisation de l'apprentissage actif sur des données transactionnelles. L'apprentissage actif [2] est un modèle d'apprentissage semi-supervisé où un oracle intervient au cours du processus. Plus précisément, contrairement au cadre class...
Conference Paper
Full-text available
http://proceedings.mlr.press/v101/bondu19a.html This paper presents a method which extracts informative features while selecting simultaneously adequate representations for Time Series Classification. This method simultaneously (i) selects alternative representations, such as derivatives, cumulative integrals, power spectrum ... (ii) and extracts...
Conference Paper
Full-text available
https://rd.springer.com/chapter/10.1007%2F978-3-030-33607-3_36 Seasonal behaviours are widely encountered in various applications. For instance, requests on web servers are highly influenced by our daily activities. Seasonal forecasting consists in forecasting the whole next season for a given seasonal time series. It may help a service provider t...
Chapter
Full-text available
Seasonal behaviours are widely encountered in various applications. For instance, requests on web servers are highly influenced by our daily activities. Seasonal forecasting consists in forecasting the whole next season for a given seasonal time series. It may help a service provider to provision correctly the potentially required resources, avoidi...
Conference Paper
Full-text available
We address the problem of event classification for pro\-active fiber break detection in high-speed optical communication systems. The proposed approach is based on monitoring the State of Polarization (SOP) via digital signal processing in a coherent receiver. We describe in details the design of a classifier providing interpretable decision rules...
Conference Paper
Full-text available
Science, technology, and commerce increasingly recognise the importance of machine learning approaches for data-intensive, evidence-based decision making. This is accompanied by increasing numbers of machine learning applications and volumes of data. Nevertheless, the capacities of processing systems or human supervisors or domain experts remain li...
Conference Paper
Full-text available
https://www.aclweb.org/anthology/W19-5915.pdf We present Graph2Bots, a tool for assisting conversational agent designers. It extracts a graph representation from human-human conversations by using unsupervised learning. The generated graph contains the main stages of the dialogue and their inner transitions. The graphical user interface (GUI) then...
Preprint
Full-text available
Since the introduction and the public availability of the \textsc{ucr} time series benchmark data sets, numerous Time Series Classification (TSC) methods has been designed, evaluated and compared to each others. We suggest a critical view of TSC performance evaluation protocols put in place in recent TSC literature. The main goal of this `position'...
Conference Paper
Full-text available
Les smartphones sont omniprésents dans notre quotidien. Ils constituent une ressource informatique à portée de la main avec un accès direct à une quantité considérable d’informations personnelles. Ils représentent une source de données très précieuse pour les opérateurs de télécommunication, mais la nature très décentralisée de ces données et les a...
Preprint
Full-text available
In the context of capacity planning, forecasting the evolution of informatics servers usage enables companies to better manage their computational resources. We address this problem by collecting key indicator time series and propose to forecast their evolution a day-ahead. Our method assumes that data is structured by a daily seasonality, but also...
Conference Paper
Full-text available
We present an experimental proof-of-concept on just-in-time resource allocation in elastic optical networks to provide seamless path restoration. Our method relies on state of polarization monitoring via standard coherent receiver paired with machine learning for proactive fiber cut detection.
Conference Paper
Full-text available
Since the introduction and the public availability of the UCR time series benchmark data sets, numerous Time Series Classification (TSC) methods has been designed, evaluated and compared to each others. We suggest a critical view of TSC performance evaluation protocols put in place in recent TSC literature. The main goal of this "position" paper is...
Conference Paper
Full-text available
In the context of capacity planning, forecasting the evolution of informatics servers usage enables companies to better manage their computational resources. We address this problem by collecting key indicator time series and propose to forecast their evolution a day-ahead. Our method assumes that data is structured by a daily seasonality, but also...
Conference Paper
Full-text available
This paper presents a triclustering based outlier-shape score for time series in the context of a fraud detection platform for wholesale traffic for a telecommunications carrier. We propose to use triclustering as an exploration module for outlier shape detection using whole time series. Three main steps compose this approach: (1) projection of dat...
Conference Paper
Full-text available
Science, technology, and commerce increasingly recognize the importance of machine learning approaches for data-intensive, evidence-based decision making. This is accompanied by increasing numbers of machine learning applications and volumes of data. Nevertheless, the capacities of processing systems or human supervisors or domain experts remain l...