About
80
Publications
44,284
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,276
Citations
Citations since 2017
Introduction
Currently my research interests are machine learning, reinforcement learning and online learning. I have a past in image processing and specially in face detection. I still have a weakness for neural networks.
Additional affiliations
January 2004 - present
Publications
Publications (80)
To address the contextual bandit problem, we propose an online random forest algorithm. The analysis of the proposed algorithm is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are recursively stacked in a random collection of decision trees, BANDIT FOREST. We show that the proposed algorithm is...
Dialogue systems rely on a careful reinforcement learning design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforementioned process, this article formalises the pr...
We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm sw...
We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to ensure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service , we advo...
In this paper, we consider the problem of sequential change-point detection where both the change-points and the distributions before and after the change are assumed to be unknown. For this problem of primary importance in statistical and sequential learning theory, we derive a variant of the Bayesian Online Change Point Detector proposed by (Fear...
In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy up...
One characteristic of the Cloud is elasticity: it provides the ability to adapt resources allocated to applications as needed at run-time. This capacity relies on scaling and scheduling. In this article online horizontal scaling is studied. The aim is to determine dynamically applications deployment parameters and to adjust them in order to respect...
In various recommender system applications, from medical diagnosis to dialog systems, due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. In this paper, we analyze and extend an online learning framew...
In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a f...
In reinforcement learning, an agent chooses actions in order to maximize the rewards given by a dynamic environment. As the environment is initially unknown, the agent has to interact with it to gather information. Moreover, only the reward of the chosen actions is revealed. That is why the agent faces the exploration/exploitation dilemma: she has...
Presentation of Decentralized Exploration in Multi-Armed Bandits
The Industrial Internet of Things (IIoT) faces multiple challenges to achieve high reliability, low-latency and low power consumption. The IEEE 802.15.4 Time-Slotted Channel Hopping (TSCH) protocol aims to address these issues by using frequency hopping to improve the transmission quality when coping with low-quality channels. However, an optimized...
Reinforcement Learning, Multi-Armed Bandits, and AB testing
The use of Low Power Wide Area Networks (LPWANs) is growing due to their advantages in terms of low cost, energy efficiency and range. Although LPWANs attract the interest of industry and network operators, it faces certain constraints related to energy consumption, network coverage and quality of service. In this paper we demonstrate the possibili...
The optimization of LoRa transmission is cast as a reinforcement learning problem: several Multi-Armed bandit algorithms are compared with Adaptive Data Rate, which is the algorithm defined in LoRa Network. On experiments done on a realistic LoRa Network simulator, ADR is dominated by MAB both in terms of energy consumption and packet losses.
Présentation IA et handicap pour les trophées Femmes En Entreprise Adaptée
The Thompson Sampling exhibits excellent results in practice and it has been shown to be asymptotically optimal. The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit problem, proposed in [13], is a Thompson Sampling equiped with a Bayesian online changepoint detector [1]. In this paper, we propose another extension of th...
The Thompson Sampling exhibits excellent results in practice and it has been shown
to be asymptotically optimal. The extension of Thompson Sampling algorithm
to the Switching Multi-Armed Bandit problem, proposed in [13], is a Thompson
Sampling equiped with a Bayesian online changepoint detector [1]. In this paper,
we propose another extension of th...
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeli...
The contextual bandits can be viewed as a generalization of online classification models, where only the chosen class is observed. The selection of learning experts allows to find the best parametrization of an expert during its learning, within a set of predefined parameters, and reduces the bias of the hypothesis space, and hence improves the per...
The purpose of this code is to test BanditForest algorithm, and to reproduce experiments of the AISTATS paper "Random forest for the contextual bandit problem."
Long version available at: https://www.researchgate.net/publication/315522010_The_Non-stationary_Stochastic_Multi-armed_Bandit_Problem
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when...
Bandit Forest and use cases
The multi-armed bandit is a model of exploration and exploitation, where one must select, within a finite set of arms, the one which maximizes the cumulative reward up to the time horizon T. For the adversarial multi-armed bandit problem, where the sequence of rewards is chosen by an oblivious adversary, the notion of best arm during the time horiz...
Dans le problème des bandits manchots, un joueur possède le choix entre plusieurs bras possédant des espérances de gain différentes. Son but est de maximiser la récompense obtenue après T essais. Il doit alors explorer pour estimer les récompenses de chaque machine tout en exploitant le bras qu'il estime le meilleur. C'est le dilemme exploration/ex...
Dans le problème des bandits manchots, un joueur possède le choix entre plusieurs bras possédant des espérances de gain différentes. Son but est de maximiser la récompense obtenue après T essais. Il doit alors explorer pour estimer les récompenses de chaque machine tout en exploitant le bras qu'il estime le meilleur. C'est le dilemme exploration/ex...
The labelling of training examples is a costly task in a supervised classification. Active learning strategies answer this problem by selecting the most useful unlabelled examples to train a predictive model. The choice of examples to label can be seen as a dilemma between the exploration and the exploitation over the data space representation. In...
To address the contextual bandit problem, we propose an online decision tree
algorithm. We show that the proposed algorithm, KMD-Tree, incurs an expected
cumulated regret in the order of O(log T) against the greedy decision tree
built knowing the joint distribution of contexts and rewards. We show that this
problem dependent regret bound is optimal...
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt e a l...
This paper presents a new contextual bandit algorithm, NeuralBandit, which
does not need hypothesis on stationarity of contexts and rewards. Several
neural networks are trained to modelize the value of rewards knowing the
context. Two variants, based on multi-experts approach, are proposed to choose
online the parameters of multi-layer perceptrons....
Dans cet article, nous proposons un nouvel algorithme de bandits contextuels, NeuralBandit, ne faisant aucune hypothèse de stationnarité sur les contextes et les récompenses. L'algorithme proposé utilise plusieurs perceptrons multicouches, chacun apprenant la probabilité qu'une action, étant donné le contexte, entraine une récompense. Afin de régle...
Optimization of online decisions and its applications.
Presentation done at the Laboratoire de Recherche en Informatique of Orsay, Paris XI.
We consider a variant of the multi-armed bandit model, which we call scratch games, where the sequences of rewards are finite and drawn in advance with unknown starting dates. This new problem is motivated by online advertising applications where the number of ad displays is fixed according to a contract between the advertiser and the publisher, an...
We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions
from multi-armed bandits settings to dueling bandits. The primary application of this s...
Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budge...
One of the most critical operators for a Data Stream Management System is the join operator. Unfortunately, the join operator
between the stream A and B is a blocking operator: for each current tuple of the stream A, the entire stream B have to be
scanned. The usual technique used for unblocking stream operators consists to restrict the processing...
This paper addresses a task of variable selection which consists in choosing a subset of variables that is sufficient to predict
the target label well. Here instead of trying to directly determine which variables are better, we make use of prior knowledge
to learn the properties of good variables and guide the selection towards the most relevant d...
In itself, the continuous exponential increase of the data-warehouses size does not necessarily lead to a richer and finer-grained
information since the processing capabilities do not increase at the same rate. Current state-of-the-art technologies require
the user to strike a delicate balance between the processing cost and the information quality...
Résumé : Une tendance lourde depuis la fin du siècle dernier est l'augmentation exponentielle du volume des données stockées. Cette augmentation ne se traduit pas nécessairement par une information plus riche puisque la capacité à traiter ces données ne progresse pas aussi rapidement. Avec les technologies actuelles, un difficile compromis doit êtr...
This paper presents a method to interpret the output of a classification (or regression) model. The interpretation is based on two concepts: the variable importance and the value importance of the variable. Unlike most of the state of art interpretation methods, our approach allows the interpretation of the model output for every instance. Understa...
Résumé. Cet article présente une méthode permettant d'interpréter la sortie d'un modèle de classification ou de régression. L'interprétation de la sortie du modèle se base sur deux grandeurs : l'importance de la variable et l'importance de la valeur de la variable. Contrairement à la plupart des méthodes d'interpré-tation de l'état de l'art, notre...
L'afflux de données sur les usages des produits et services nécessite
des traitements lourds pour les transformer en information. Or la capacité à
traiter les données ne peut pas suivre l'augmentation exponentielle des volumes
stockés. Avec les technologies actuelles, un difficile compromis doit être trouvé
entre le coût de mise en oeuvre et la qua...
In the field of neural networks, feature selection has been studied for the last ten years and classical as well as original methods have been employed. This paper reviews the efficiency of four approaches to feature selection applied on neural networks. We assess the efficiency of these methods when the number of examples is significantly lower th...
In the field of neural networks, feature selection has been studied for the last ten years and classical as well as original methods have been employed. This paper reviews the efficiency of four approaches to do a driven forward features selection on neural networks . We assess the efficiency of these methods compare to the simple Pearson criterion...
L'invention concerne un procédé d'extraction d'un tableau croisé (22) d'une base de données (24). Le tableau croisé comprend en ligne, respectivement colonne, des instances et en colonne, respectivement ligne, des indicateurs caractérisant les instances. Le procédé comprend les étapes suivantes :
a) obtention (51) d'une spécification initiale d'une...
Neural networks are still frustrating tools in the data mining arsenal. They exhibit excellent modelling performance, but do not give a clue about the structure of their models. We propose a methodology to explain the classification obtained by a multilayer perceptron. We introduce the concept of 'causal importance' and define a saliency measuremen...
Detecting faces in images with complex backgrounds is a difficult
task. Our approach, which obtains state of the art results, is based on
a neural network model: the constrained generative model (CGM).
Generative, since the goal of the learning process is to evaluate the
probability that the model has generated the input data, and constrained
since...
The invention concerns an automatic system for sound and image recording in particular for videoconference, comprising means controlling (20) image and sound recording sensors (10) and sequence analysing means (40) monitoring said control means (20) to obtain automatic framing of the sequence being filmed. The invention is characterised in that an...
Detecting faces in images with complex backgrounds is a difficult task. Our approach, which obtains state-of-the-art results, is based on a generative neural network model: the constrained generative model (CGM). To detect side-view faces and to decrease the number of false alarms, a conditional mixture of networks is used. To decrease the computat...
Allocating resources to data trafic in telecommunication networks is a difficult problem because of the complex dynamics exhibited by this kind of traffic and because of the difficult trade-off between the delivered quality of service and the wasted bandwidth.
We describe and compare the performances of two controllers of different designs (a Kalma...
Realtime face detection
In a face to face meeting, participants are free to choose who
they look at and who they listen to. In a videoconference, participants
can only see or hear the audio-visual signal that the distant
participants have decided or have been able to transmit. We present
Panorama, a contactless visual man machine interface which allows to
explore a distan...
A real-time system is described for automatic detection and
tracking of multiple persons, in the context of video-conferencing
systems. This system, called MULTRAK (multiperson locating and tracking
automatic kernel) is able to continuously detect and track the position
of faces in its field of view. The heart of the system as a modular
neural netw...
Les réseaux de neurones sont des modèles statistiques, qui permettent l'apprentissage numérique par l'exemple. Un modèle pour l'apprentissage d'un réseau de neurones est développé et appliqué à la détection de visages. Cette approche est basée sur les réseaux de neurones autoassociatifs. Nous verrons, que pour utiliser ce type de réseau comme un es...
We present a neural network approach to human face detection. Using a modular system, a conditional mixture of networks, we are able to detect front view faces as well as turned faces (up to 50 degrees) with excellent performances. This modular network is integrated into LISTEN, our face tracking system. It enables this system to detect and track i...
Les informations visuelles et acoustiques sont au coeur de la (télé)communication entre les personnes. Le visage est la principale source d'information. Des techniques de détection du mouvement et de la teinte de la peau délimitent des régions d'intérêt où peuvent se trouver des visages. Un réseau de neurones détecte le visage et fournit la positio...
A generative neural network model, constrained by non-face examples chosen by an iterative algorithm, is applied to face detection. To extend the detection ability in orientation and to decrease the number of false alarms, different combinations of networks are tested: ensemble, conditional ensemble and conditional mixture of networks. The use of a...
A generative neural network model, constrained by non-face examples chosen by an iterative algorithm, is applied to face detection. To improve the generalization ability of the model, another constraint based on the minimum description length is added. This model is tested and compared with another state-of-the-art face detection system on a large...
A new learning model based on autoassociative neural networks is developped and applied to face detection. To extend the detection ability in orientation and to decrease the number of false alarms, different combinations of networks are tested: ensemble, conditional ensemble and conditional mixture of networks. The use of a conditional mixture of n...
Both visual and acoustical informations provide effective means of telecommunication between persons. In this context, the face is the most important part of the person both visually and acoustically. We describe how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces. We pr...
Questions
Questions (7)
Network
Cited