Olivier Teytaud

National University of Tainan, 臺南市, Taiwan, Taiwan

Are you Olivier Teytaud?

Claim your profile

Publications (162)34.01 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Direct Policy Search (DPS) is a widely used tool for reinforcement learning; however, it is usually not suitable for handling high-dimensional constrained action spaces such as those arising in power system control (unit commitment problems). We propose Direct Value Search, an hybridization of DPS with Bellman decomposition techniques. We prove runtime properties, and apply the results to an energy management problem.
    9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium); 05/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to simplicity and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochasticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search.
    European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgique; 04/2014
  • Source
    Nicolas Galichet, Michèle Sebag, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The game of Go is a board game with a long history that is much more complex than chess. The uncertainties of this game will be higher when the board size gets bigger. For evaluating the human performance on Go games, one human could be advanced to a higher rank based on the number of winning games via a formal human against human competition. However, a human Go player's performance could be influenced by factors such as the on-the-spot environment, as well as physical and mental situations of the day, which causes difficulty and uncertainty in certificating the human's rank. Thanks to a sample of one player's games, evaluating his/her strength by classical models such as the Bradley-Terry model is possible. However, due to inhomogeneous game conditions and limited access to archives of games, such estimates can be imprecise. In addition, classical rankings (1 Dan, 2 Dan, ...) are integers, which lead to a rather imprecise estimate of the opponent's strengths. Therefore, we propose to use a sample of games played against a computer to estimate the human's strength. In order to increase the precision, the strength of the computer is adapted from one move to the next by increasing or decreasing the computational power based on the current situation and the result of games. The human can decide some specific conditions, such as komi and board size. In this paper, we use type-2 fuzzy sets (T2FSs) with parameters optimized by a genetic algorithm for estimating the rank in a stable manner, independently of board size. More precisely, an adaptive Monte Carlo tree search (MCTS) estimates the number of simulations, corresponding to the strength of its opponents. Next, the T2FS-based adaptive linguistic assessment system infers the human performance and presents the results using the linguistic description. The experimental results show that the proposed approach is feasible for application to the adaptive linguistic assessment on a human Go player's performance.
    IEEE Transactions on Fuzzy Systems 01/2014; 23(2):1-1. DOI:10.1109/TFUZZ.2014.2312989 · 6.31 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The certificated rank of the human Go player is a number with a high uncertainty so the performance of the human Go player does not always meet the level of the certificated rank. However, the performance of the human Go player, especially for children, may be affected by the on-the-spot environment as well as physical and mental situations of the day. Combined with the technologies of the particle swarm optimization, fuzzy markup language (FML)-based fuzzy inference, and genetic learning algorithm, an adaptive assessment system is presented in this paper to evaluate the performance of the human Go player. The experimental results show the proposed approach is feasible for the application to the adaptive assessment on human Go player's performance.
    2013 International Conference on Fuzzy Theory and Its Applications (iFUZZY); 12/2013
  • Jérémie Decock, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the linear convergence of a simple evolutionary algorithm on non quasi-convex functions on continuous domains. Assumptions include an assumption on the sampling performed by the evolutionary algorithm (supposed to cover efficiently the neighborhood of the current search point), the conditioning of the objective function (so that the probability of improvement is not too low at each time step, given a correct step size), and the unicity of the optimum.
    EA - 11th Biennal International Conference on Artificial Evolution - 2013, Bordeaux, France; 10/2013
  • David Auger, Adrien Couetoux, Olivier Teytaud
  • David Auger, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.
    International Journal of Foundations of Computer Science 01/2013; 23(07). DOI:10.1142/S0129054112400576 · 0.33 Impact Factor
  • Jérémie Decock, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: In spite of various recent publications on the subject, there are still gaps between upper and lower bounds in evolutionary optimization for noisy objective function. In this paper we reduce the gap, and get tight bounds within logarithmic factors in the case of small noise and no long-distance influence on the objective function.
    Proceedings of the twelfth workshop on Foundations of genetic algorithms XII; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Go game is one game with a long history. Two players alternatively play their black or white stone at a vacant intersection of the board. Usually, weaker player holds Black. In the end, the player with bigger territory wins the game. For the learning in Go games, humans could be advanced to higher rank, for example, by winning 4 out of 5 games. However, a Go player's performance could be influenced by some factors, such as the spot environment as well as physical and mental situations of the day, which causes the difficulty and uncertainty in certificating the human's rank. In this paper, a type-2 fuzzy markup language (T2FML)-based system is proposed to infer the human's rank according to simulation number, komi, and board size. Based on the adaptive Upper Confidence Bounds for Trees (UCT)-based Go-ranking mechanism, the number of simulations for each move of the game is collected when the invited Go players are against the computer Go program, MoGoTW. At the same time, the strength of the human is also estimated by using the Bradley-Terry and Particle Swarm Optimization (PSO) models. The experimental results show that the proposed approach is feasible for estimating the strength of a human.
    Fuzzy Systems (FUZZ), 2013 IEEE International Conference on; 01/2013
  • IEEE Computational Intelligence Magazine 11/2012; 7(4):10-12. DOI:10.1109/MCI.2012.2215493 · 2.71 Impact Factor
  • Communications of the ACM 03/2012; 55(3):106. DOI:10.1145/2093548.2093574 · 2.86 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Low-discrepancy sequences provide a way to generate quasi-random numbers of high dimensionality with a very high level of uniformity. The nearly orthogonal Latin hypercube and the generalized Halton sequence are two popular methods when it comes to generate low-discrepancy sequences. In this article, we propose to use evolutionary algorithms in order to find optimized solutions to the combinatorial problem of configuring generators of these sequences. Experimental results show that the optimized sequence generators behave at least as well as generators from the literature for the Halton sequence and significantly better for the nearly orthogonal Latin hypercube.
    ACM Transactions on Modeling and Computer Simulation 03/2012; 22(2):1-25. DOI:10.1145/2133390.2133393 · 0.83 Impact Factor
  • Michèle Sebag, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Many reactive planning tasks are tackled through myopic optimization-based approaches. Specifically, the problem is simplified by only considering the observations available at the current time step and an estimate of the future system behavior; the optimal decision on the basis of this information is computed and the simplified problem description is updated on the basis of the new observations available in each time step. While this approach does not yield optimal strategies stricto sensu, it indeed gives good results at a reasonable computational cost for highly intractable problems, whenever fast off-the-shelf solvers are available for the simplified problem. The increase of available computational power − even though the search for optimal strategies remains intractable with brute-force approaches − makes it however possible to go beyond the intrinsic limitations of myopic reactive planning approaches. A consistent reactive planning approach is proposed in this paper, embedding a solver with an Upper Confidence Tree algorithm. While the solver is used to yield a consistent estimate of the belief state, the UCT exploits this estimate (both in the tree nodes and through the Monte-Carlo simulator) to achieve an asymptotically optimal policy. The paper shows the consistency of the proposed Upper Confidence Tree-based Consistent Reactive Planning algorithm and presents a proof of principle of its performance on a classical success of the myopic approach, the MineSweeper game.
    Proceedings of the 6th international conference on Learning and Intelligent Optimization; 01/2012
  • Adrien Couëtoux, Hassen Doghmen, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value (BV). It only requires the access to a function that randomly draws feasible decisions. We also implement it and compare it to the original version of continuous UCT. Our results show that it gives a significant increase in convergence speed, in dimensions 12 and 80.
    Proceedings of the 6th international conference on Learning and Intelligent Optimization; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).
    Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.
  • Source
    Nataliya Sokolovska, Olivier Teytaud, Mario Milone
    [Show abstract] [Hide abstract]
    ABSTRACT: Discretization of state and action spaces is a critical issue in Q-Learning. In our contribution, we propose a real-time adaptation of the discretization by the progressive widening technique which has been already used in bandit-based methods. Results are consistently converging to the optimum of the problem, without changing the parametrization for each new problem.
    Neural Information Processing - 18th International Conference, ICONIP 2011, Shanghai, China, November 13-17, 2011, Proceedings, Part III; 11/2011

Publication Stats

937 Citations
34.01 Total Impact Points


  • 2011–2013
    • National University of Tainan
      臺南市, Taiwan, Taiwan
  • 2005–2013
    • Université Paris-Sud 11
      • Laboratoire de Mathématiques d'Orsay
      Orsay, Île-de-France, France
  • 2010
    • National Institute for Research in Computer Science and Control
      Le Chesney, Île-de-France, France
  • 2001–2010
    • French National Centre for Scientific Research
      Lutetia Parisorum, Île-de-France, France
    • Université Lumiere Lyon 2
      Rhône-Alpes, France
  • 2009
    • Laval University
      Québec, Quebec, Canada