Olivier Teytaud

Université Paris-Sud 11, Orsay, Île-de-France, France

Are you Olivier Teytaud?

Claim your profile

Publications (153)23.26 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Direct Policy Search (DPS) is a widely used tool for reinforcement learning; however, it is usually not suitable for handling high-dimensional constrained action spaces such as those arising in power system control (unit commitment problems). We propose Direct Value Search, an hybridization of DPS with Bellman decomposition techniques. We prove runtime properties, and apply the results to an energy management problem.
    9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium); 05/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Due to simplicity and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochasticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search.
    European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgique; 04/2014
  • Source
    Nicolas Galichet, Michèle Sebag, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.
    01/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The certificated rank of the human Go player is a number with a high uncertainty so the performance of the human Go player does not always meet the level of the certificated rank. However, the performance of the human Go player, especially for children, may be affected by the on-the-spot environment as well as physical and mental situations of the day. Combined with the technologies of the particle swarm optimization, fuzzy markup language (FML)-based fuzzy inference, and genetic learning algorithm, an adaptive assessment system is presented in this paper to evaluate the performance of the human Go player. The experimental results show the proposed approach is feasible for the application to the adaptive assessment on human Go player's performance.
    2013 International Conference on Fuzzy Theory and Its Applications (iFUZZY); 12/2013
  • Jérémie Decock, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the linear convergence of a simple evolutionary algorithm on non quasi-convex functions on continuous domains. Assumptions include an assumption on the sampling performed by the evolutionary algorithm (supposed to cover efficiently the neighborhood of the current search point), the conditioning of the objective function (so that the probability of improvement is not too low at each time step, given a correct step size), and the unicity of the optimum.
    EA - 11th Biennal International Conference on Artificial Evolution - 2013, Bordeaux, France; 10/2013
  • Jérémie Decock, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: In spite of various recent publications on the subject, there are still gaps between upper and lower bounds in evolutionary optimization for noisy objective function. In this paper we reduce the gap, and get tight bounds within logarithmic factors in the case of small noise and no long-distance influence on the objective function.
    Proceedings of the twelfth workshop on Foundations of genetic algorithms XII; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Go game is one game with a long history. Two players alternatively play their black or white stone at a vacant intersection of the board. Usually, weaker player holds Black. In the end, the player with bigger territory wins the game. For the learning in Go games, humans could be advanced to higher rank, for example, by winning 4 out of 5 games. However, a Go player's performance could be influenced by some factors, such as the spot environment as well as physical and mental situations of the day, which causes the difficulty and uncertainty in certificating the human's rank. In this paper, a type-2 fuzzy markup language (T2FML)-based system is proposed to infer the human's rank according to simulation number, komi, and board size. Based on the adaptive Upper Confidence Bounds for Trees (UCT)-based Go-ranking mechanism, the number of simulations for each move of the game is collected when the invited Go players are against the computer Go program, MoGoTW. At the same time, the strength of the human is also estimated by using the Bradley-Terry and Particle Swarm Optimization (PSO) models. The experimental results show that the proposed approach is feasible for estimating the strength of a human.
    Fuzzy Systems (FUZZ), 2013 IEEE International Conference on; 01/2013
  • Michèle Sebag, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Many reactive planning tasks are tackled through myopic optimization-based approaches. Specifically, the problem is simplified by only considering the observations available at the current time step and an estimate of the future system behavior; the optimal decision on the basis of this information is computed and the simplified problem description is updated on the basis of the new observations available in each time step. While this approach does not yield optimal strategies stricto sensu, it indeed gives good results at a reasonable computational cost for highly intractable problems, whenever fast off-the-shelf solvers are available for the simplified problem. The increase of available computational power − even though the search for optimal strategies remains intractable with brute-force approaches − makes it however possible to go beyond the intrinsic limitations of myopic reactive planning approaches. A consistent reactive planning approach is proposed in this paper, embedding a solver with an Upper Confidence Tree algorithm. While the solver is used to yield a consistent estimate of the belief state, the UCT exploits this estimate (both in the tree nodes and through the Monte-Carlo simulator) to achieve an asymptotically optimal policy. The paper shows the consistency of the proposed Upper Confidence Tree-based Consistent Reactive Planning algorithm and presents a proof of principle of its performance on a classical success of the myopic approach, the MineSweeper game.
    Proceedings of the 6th international conference on Learning and Intelligent Optimization; 01/2012
  • Adrien Couëtoux, Hassen Doghmen, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value (BV). It only requires the access to a function that randomly draws feasible decisions. We also implement it and compare it to the original version of continuous UCT. Our results show that it gives a significant increase in convergence speed, in dimensions 12 and 80.
    Proceedings of the 6th international conference on Learning and Intelligent Optimization; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Low-discrepancy sequences provide a way to generate quasi-random numbers of high dimensionality with a very high level of uniformity. The nearly orthogonal Latin hypercube and the generalized Halton sequence are two popular methods when it comes to generate low-discrepancy sequences. In this article, we propose to use evolutionary algorithms in order to find optimized solutions to the combinatorial problem of configuring generators of these sequences. Experimental results show that the optimized sequence generators behave at least as well as generators from the literature for the Halton sequence and significantly better for the nearly orthogonal Latin hypercube.
    ACM Transactions on Modeling and Computer Simulation 01/2012; 22(2):1-25. · 0.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).
    Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.
    Communications of the ACM. 01/2012; 55(3):106-113.
  • Source
    Marc Schoenauer, Fabien Teytaud, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-Modal Optimization (MMO) is ubiquitous in engineering, machine learning and artificial intelligence applications. Many algorithms have been proposed for multimodal optimization, and many of them are based on restart strategies. However, only few works address the issue of initialization in restarts. Furthermore, very few comparisons have been done, between different MMO algorithms, and against simple baseline methods. This paper proposes an analysis of restart strategies, and provides a restart strategy for any local search algorithm for which theoretical guarantees are derived. This restart strategy is to decrease some 'step-size', rather than to increase the population size, and it uses quasi-random initialization, that leads to a rigorous proof of improvement with respect to random restarts or restarts with constant initial step-size. Furthermore, when this strategy encapsulates a (1+1)-ES with 1/5th adaptation rule, the resulting algorithm outperforms state of the art MMO algorithms while being computationally faster.
    Proceedings of the 10th international conference on Artificial Evolution; 10/2011
  • Source
    Conference Paper: Random positions in Go
    [Show abstract] [Hide abstract]
    ABSTRACT: It is known that in chess, random positions are harder to memorize for humans. We here reproduce these experiments in the Asian game of Go, in which computers are much weaker than humans. We survey families of positions, discussing the relative strength of humans and computers, and then experiment random positions. The result is that computers are at the best amateur level for random positions. We also provide a protocol for generating interesting random positions (avoiding unfair situations).
    Computational Intelligence and Games (CIG), 2011 IEEE Conference on; 10/2011
  • Source
    F. Teytaud, O. Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: Solving games is usual in the fully observable case. The partially observable case is much more difficult; whenever the number of strategies is finite (which is not necessarily the case, even when the state space is finite), the main tool for the exact solving is the construction of the full matrix game and its solving by linear programming. We here propose tools for approximating the value of partially observable games. The lemmas are relatively general, and we apply them for deriving rigorous bounds on the Nash equilibrium of phantom-tic-tac-toe and phantom-Go.
    Computational Intelligence and Games (CIG), 2011 IEEE Conference on; 10/2011
  • Source
    Chang-Shing Lee, O. Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: In order to stimulate the development and research in computer Go, several Taiwanese Go players were invited to play against some famous computer Go programs. Those competitions revealed that the ontology model for Go game might resolve problems happened in the competitions. Therefore, this tutorial will present a Go game record ontology and Go board ontology schemes. An ontology-based fuzzy inference system is also developed to provide the regional alarm level for a Go beginner or a computer Go program in order to place the stone at the much more appropriate position. Experimental results indicate that the proposed approach is feasible for computer Go application. Hopefully, advances in the intelligent agent and the fuzzy ontology model can provide a significant amount of knowledge to make a progress in computer Go program and achieve as much as computer chess or Chinese chess in the future.
    Intelligent Agent (IA), 2011 IEEE Symposium on; 05/2011
  • Source
    Hervé Fournier, Olivier Teytaud
    [Show abstract] [Hide abstract]
    ABSTRACT: We derive lower bounds on the convergence rate of comparison based or selection based algorithms, improving existing results in the continuous setting, and extending them to non-trivial results in the discrete case. This is achieved by considering the VC-dimension of the level sets of the fitness functions; results are then obtained through the use of the shatter function lemma. In the special case of optimization of the sphere function, improved lower bounds are obtained by an argument based on the number of sign patterns.
    Algorithmica 01/2011; · 0.49 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.
    01/2011;
  • Source
    Nataliya Sokolovska, Olivier Teytaud, Mario Milone
    Neural Information Processing - 18th International Conference, ICONIP 2011, Shanghai, China, November 13-17, 2011, Proceedings, Part III; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we will consider questions related to blindfolded play: (i) the impact (in various conditions) of playing blindfolded in the level of Go players in 9x9 Go (ii) the influence of a visual support (the visual support is a board with no stone) (iii) which modifications are required for making a program strong in the blind variant of the game (and, somehow surprisingly, implementing a program for playing blind go is not equivalent to implementing a program for playing go) (iv) some conclusions on the rules of blind Go for making it interesting and pedagogically efficient. Computational intelligence design question: should a program play differently against a human opponent than against a computer? Our hypothesis here is that the same-strength assumption, which is the basis for many computer programs (alpha-beta or Monte-Carlo Tree Search) does not hold in blind games. Two counter- examples to this "same strength" assumption are already known in some non-blindfolded games:
    01/2011;