Alexis Bondu

Alexis Bondu
  • Phd
  • Researcher at Orange Labs, France, Châtillon

About

71
Publications
24,313
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
511
Citations
Current institution
Orange Labs, France, Châtillon
Current position
  • Researcher

Publications

Publications (71)
Preprint
Full-text available
Habilitation to Direct Research manuscript (HDR)
Preprint
Full-text available
Early Classification of Time Series (ECTS) has been recognized as an important problem in many areas where decisions have to be taken as soon as possible, before the full data availability, while time pressure increases. Numerous ECTS approaches have been proposed, based on different triggering functions, each taking into account various pieces of...
Preprint
Full-text available
(This paper is now published in TMLR 2024). Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framewor...
Preprint
Full-text available
\texttt{ml\_edm} is a Python 3 library, designed for early decision making of any learning tasks involving temporal/sequential data. The package is also modular, providing researchers an easy way to implement their own triggering strategy for classification, regression or any machine learning task. As of now, many Early Classification of Time Serie...
Preprint
Full-text available
In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known...
Article
Full-text available
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such a...
Preprint
Full-text available
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most complex distributional shifts. We think the biquality data setup is a suitable framework for designing such a...
Preprint
Full-text available
The democratization of Data Mining has been widely successful thanks in part to powerful and easy-to-use Machine Learning libraries. These libraries have been particularly tailored to tackle Supervised Learning. However, strong supervision signals are scarce in practice, and practitioners must resort to weak supervision. In addition to weaknesses o...
Conference Paper
Full-text available
In this paper we show that the combination of a Contrastive representation with a label noise-robust classification head requires fine-tuning the representation in order to achieve state-of-the-art performances. Since fine-tuned representations are shown to outperform frozen ones, one can conclude that noise-robust classification heads are indeed a...
Preprint
Full-text available
This paper has been accepted at IJCNN 2023 - Time Series Classification (TSC) has received much attention in the past two decades and is still a crucial and challenging problem in data science and knowledge engineering. Indeed, along with the increasing availability of time series data, many TSC algorithms have been suggested by the research commun...
Article
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Chapter
In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is...
Preprint
Full-text available
This paper has been published in SIGKDD Newsletter exploration (december 2022) . ..... More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Suc...
Preprint
Full-text available
Learning to predict ahead of time events in open time series is challenging. While Early Classification of Time Series (ECTS) tackles the problem of balancing online the accuracy of the prediction with the cost of delaying the decision when the individuals are time series of finite length with a unique label for the whole time series. Surprisingly,...
Conference Paper
Full-text available
Cet article propose une vision originale et globale de l'Apprentissage Faiblement Supervisé, menant à la conception d'approches génériques capable de traiter tout type de faiblesses en supervision. Un nouveau cadre appelé "Données Biqualité" est introduit, qui suppose qu'un petit jeu de données fiable d'exemples correctement étiquetés est disponibl...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of itssignificance in a wide range of applications including healthcare, transportation and fi-nance. Until now, the early classification problem has been dealt with by considering onlyirrevocable decisions. This paper introduces a new problem calledearly and revoca...
Preprint
Full-text available
This paper has been accepted at the IAL@ECML Workshop 2021 (https://www.activeml.net/ial2021/index.html) -------- "In this paper we show that the combination of a Contrastive representation with a label noise-robust classification head requires fine-tuning the representation in order to achieve state-of-the-art performances. Since fine-tuned repres...
Conference Paper
Full-text available
In this article, we propose a framework for seasonal time series probabilistic forecasting. It aims at forecasting (in a probabilistic way) the whole next season of a time series, rather than only the next value. Probabilistic forecasting consists in forecasting a probability distribution function for each future position. The proposed framework is...
Conference Paper
Full-text available
The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of “supervision deficiencies”. In WSL use cases, a variety of situations exists where the collected “information” is imperfect. The paradigm of WSL attempts to list and cover these problems with associated solutions...
Preprint
Full-text available
https://arxiv.org/abs/2012.09632 (this paper has been accepted at IJCNN 2021) The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies". In WSL use cases, a variety of situations exists where the collected "information" is imperfect. The parad...
Article
Full-text available
An increasing number of applications require to recognize the class of an incoming time series as quickly as possible without unduly compromising the accuracy of the prediction. In this paper, we put forward a new optimization criterion which takes into account both the cost of misclassification and the cost of delaying the decision. Based on this...
Conference Paper
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as \textit{time series extrinsic regression} (TSER). In the literature, some well-k...
Chapter
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Preprint
Full-text available
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance. However, recently a preprint saved on Arxiv claim that all research done for almost 20 years now on the Early Classification of Time Series is useless, or, at the v...
Presentation
Full-text available
This talk gives a 'brief overview' of Weakly supervised learning. The choice was made to present things in a hierarchical way for simplicity because it is more 'didactic'. But the view via the cube on the last slide is more appropriate, more general. For more details see : https://www.researchgate.net/publication/354719650_From_Weakly_Supervised_L...
Preprint
Full-text available
Supervised learning of time series data has been extensively studied for the case of a categorical target variable. In some application domains, e.g., energy, environment and health monitoring, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER). In the literature, some well-known time...
Preprint
Full-text available
https://arxiv.org/abs/2010.09621 (this paper has been accepted at IJCNN 2021). The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of ``supervision deficiencies'', namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label n...
Conference Paper
Paper available here : https://link.springer.com/chapter/10.1007/978-3-030-59065-9_25 --- Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle wi...
Chapter
Multivariate Time Series Classification (MTSC) has attracted increasing research attention in the past years due to the wide range applications in e.g., action/activity recognition, EEG/ECG classification, etc. In this paper, we open a novel path to tackle with MTSC: a relational way. The multiple dimensions of MTS are represented in a relational d...
Preprint
Full-text available
An increasing number of applications require to recognize the class of an incoming time series as quickly as possible without unduly compromising the accuracy of the prediction. In this paper, we put forward a new optimization criterion which takes into account both the cost of misclassification and the cost of delaying the decision. Based on this...
Conference Paper
Full-text available
Cet article présente une méthode de classification de séries temporelle qui sélectionne des représentations alternatives (telles que les dérivées, les intégrales cumulatives, le spectre de puissance) et en extrait des descripteurs informatifs. L'approche proposée est décomposée en trois étapes : i) les séries temporelles originales sont transformée...
Chapter
Full-text available
We address the problem of event classification for proactive fiber break detection in high-speed optical communication systems. The proposed approach is based on monitoring the State of Polarization (SOP) via digital signal processing in a coherent receiver. We describe in details the design of a classifier providing interpretable decision rules an...
Book
This book constitutes the refereed proceedings of the 4th ECML PKDD Workshop on Advanced Analytics and Learning on Temporal Data, AALTD 2019, held in Würzburg, Germany, in September 2019. The 7 full papers presented together with 9 poster papers were carefully reviewed and selected from 31 submissions. The papers cover topics such as temporal data...
Conference Paper
Full-text available
http://proceedings.mlr.press/v101/bondu19a.html This paper presents a method which extracts informative features while selecting simultaneously adequate representations for Time Series Classification. This method simultaneously (i) selects alternative representations, such as derivatives, cumulative integrals, power spectrum ... (ii) and extracts...
Conference Paper
Full-text available
https://rd.springer.com/chapter/10.1007%2F978-3-030-33607-3_36 Seasonal behaviours are widely encountered in various applications. For instance, requests on web servers are highly influenced by our daily activities. Seasonal forecasting consists in forecasting the whole next season for a given seasonal time series. It may help a service provider t...
Chapter
Full-text available
Seasonal behaviours are widely encountered in various applications. For instance, requests on web servers are highly influenced by our daily activities. Seasonal forecasting consists in forecasting the whole next season for a given seasonal time series. It may help a service provider to provision correctly the potentially required resources, avoidi...
Conference Paper
Full-text available
We address the problem of event classification for pro\-active fiber break detection in high-speed optical communication systems. The proposed approach is based on monitoring the State of Polarization (SOP) via digital signal processing in a coherent receiver. We describe in details the design of a classifier providing interpretable decision rules...
Conference Paper
Full-text available
The choice of an appropriate representation remains crucial for mining time series, particularly to reach a good trade-off between the dimensionality reduction and the stored information. Symbolic representations constitute a simple way of reducing the dimensionality by turning time series into sequences of symbols. SAXO is a data-driven symbolic r...
Article
Full-text available
Aiding to make decisions as early as possible by learning from past experiences is becoming increasingly important in many application domains. In these settings , information can be gained by waiting for more evidences to arrive, thus helping to make better decisions that incur lower misclassification costs, but, meanwhile, the cost associated wit...
Conference Paper
Full-text available
Classification of time series as early as possible is a valuable goal. Indeed, in many application domains, the earliest the decision, the more rewarding it can be. Yet, often, gathering more information allows one to get a better decision. The optimization of this time vs. accuracy tradeoff must generally be solved online and is a complex problem....
Conference Paper
Full-text available
The incoming smart grid represents a significant break for the European utilities in terms of data volume to be processed. In France, one year of individual consumptions represents more than 600 billion data points. Since real data is not yet available, our objective consists in simulating realistic individual consumptions. A new generative model o...
Article
Full-text available
The last ten years were prolific in the statistical learning and data mining field and it is now easy to find learning algorithms which are fast and automatic. Historically a strong hypothesis was that all examples were available or can be loaded into memory so that learning algorithms can use them straight away. But recently new use cases generati...
Conference Paper
Full-text available
Early classification approaches deal with the problem of re-liably labeling incomplete time series as soon as possible given a level of confidence. While developing new approaches for this problem has been getting increasing attention recently, their evaluation are still not thor-oughly considered. In this article, we propose a new evaluation proto...
Conference Paper
Full-text available
Early classification approaches deal with the problem of reliably labeling incomplete time series as soon as possible given a level of confidence. While developing new approaches for this problem has been getting increasing attention recently, their evaluation are still not thoroughly considered. In this article, we propose a new evaluation protoco...
Article
Full-text available
The emergence of Smart Grids is posing a wide range of challenges for electric utility companies and network operators: Integration of non-dispatchable power from renewable energy sources (e.g., photovoltaics, hydro and wind), fundamental changes in the way energy is consumed (e.g., due to dynamic pricing, demand response and novel electric applian...
Article
Full-text available
EDF hires special contracts with costumers to flatten the consumption peaks. Smart meters are able to record consumptions and will be set up over 35 millions households. In this paper, we highlight the interest of early classification for detecting the households which probably contribute to the evening peak. The proposed approach is based on a col...
Conference Paper
Full-text available
In France, the currently emerging “smart grid” and more particularly the 35 millions of “smart meters” will produce a large amount of daily updated metering data. The main french provider of electricity (EDF) is interested by compact and generic representations of time series which allow to accelerate the processing of data. This article proposes a...
Conference Paper
In recent years, the amount of data to process has increased in many application areas such as network monitoring, web click and sensor data analysis. Data stream mining answers to the challenge of massive data processing, this paradigm allows for treating pieces of data on the fly and overcomes exhaustive data storage. The detection of changes in...
Conference Paper
Full-text available
The labeling of training examples could be a costly task in numerous cases of supervised learning. Active learning strategies address this problem and select unlabeled examples which are considered as the most useful for the training of a predictive model. The choice of examples to be labeled can be considered as a dilemma between the exploration a...
Conference Paper
Full-text available
L’étiquetage des exemples d’apprentissage peut s’avérer être une tâche coûteuse lors d’une classification supervisée. Les stratégies d’apprentissage actif répondent à cette problématique en sélectionnant les exemples non-étiquetés les plus utiles à l’entraînement d’un modèle prédictif. Le choix des exemples à étiqueter peut être vu comme un dilemme...
Conference Paper
Full-text available
Semi-supervised classification methods aim to exploit labeled and unlabeled examples to train a predictive model. Most of these approaches make assumptions on the distribution of classes. This article first proposes a new semi-supervised discretization method, which adopts very low informative prior on data. This method discretizes the numerical do...
Article
Full-text available
Supervised classification problems requires labelled examples, and labelling step can be costly in practice. Active learning strategies reduce the cost of preparing learning data. These strategies aim to label only the most useful examples for the learning of the predictive model. This thesis proposes a new active learning strategy which carries ou...
Article
Full-text available
L’apprentissage statistique désigne un vaste ensemble de méthodes et d’algorithmes qui permettent à un modèle d’apprendre un comportement grâce à des exemples. L’apprentissage actif regroupe un ensemble de méthodes de sélection d’exemples utilisées pour construire l’ensemble d’apprentissage du modèle de manière itérative, en intéraction avec un exp...
Conference Paper
Full-text available
Exploratory activities seem to be crucial for our cognitive development. According to psychologists, exploration is an intrinsically rewarding behaviour. The developmental robotics aims to design computational systems that are endowed with such an intrinsic motivation mechanism. There are possible links between developmental robotics and machine le...
Conference Paper
Full-text available
L’apprentissage statistique désigne un vaste ensemble de méthodes et d’algorithmes qui permettent à un modéle d’apprendre un comportement grâce à des exemples. La fenêtre de Parzen est un modèle possible pour l’apprentissage actif. Dans cet article seule la fenêtre de Parzen munie du noyau gaussien à norme L2 est considérée. La variance du noyau ga...
Conference Paper
Full-text available
Exploratory activities seems to be crucial for our cognitive development. According to spychologists, exploration is an intrinsically rewarding behaviour. That explains the autonomous and active development of children. The developmental robotics aim to design computational systems that are endowed with such an intrinsic motivation mechanism. There...
Conference Paper
Full-text available
Machine learning indicates methods and algorithms which allow a model to learn a behavior thanks to examples. Active learning gathers methods which select examples used to build a training set for the predictive model. All the strategies aim to use the less examples as possible and to select the most informative examples. After having formalized t...
Conference Paper
Full-text available
Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Active learning gathers methods which select examples to build a training set for a predictive model, aim to the most informative examples. The number of examples to be...
Conference Paper
Full-text available
La prise en compte des émotions dans les interactions Homme-machine permet de concevoir des systèmes intelligents, capables de s’adapter aux utilisateurs. Les techniques de redirection d’appels dans les centres téléphoniques automatisés se basent sur la détection des émotions dans la parole. Les principales difficultés pour mettre en oeuvre de tels...
Conference Paper
Full-text available
For high-fidelity Virtual Auditory Space (VAS), binaural synthesis requires individualized Head-Related Transfer Functions (HRTF). An alternative to exhaustive measurement of HRTF consists in measuring a set of representative HRTF in a few directions. These selected HRTF are considered as representative because they summarize all the necessary spat...
Article
Full-text available
Résumé : Exploratory activities seem to be crucial for our cognitive develop-ment. According to psychologists, exploration is an intrinsically rewarding be-haviour. The developmental robotics aims to design computational systems that are endowed with such an intrinsic motivation mechanism. There are possible links between developmental robotics and...
Conference Paper
Full-text available
Cet article propose une nouvelle utilisation de l’approche de regroupement de modalités bivariée MODL sur des données géographiques. Les données exploitées décrivent la localisation et l’activité des entreprises de Paris intra-muros. Les regroupements géographiques et les regroupements d’activités définis conjointement sont visualisés sur des carte...
Article
Full-text available
Les méthodes d'apprentissage statistiques exploitent des exemples, pour enseigner un comportement à un modèle prédictif. La classification supervisée requiert des exemples étiquetés. En pratique, l'étiquetage des exemples peut se révélé coûteux. Dans certain cas, l'étiquetage implique un expert humain, un instrument de mesure, un temps de calcul él...

Network

Cited By