[Show abstract][Hide abstract] ABSTRACT: In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints.
PLoS ONE 09/2015; 10(9):e0134683. DOI:10.1371/journal.pone.0134683 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper we study the use of a portfolio of policies for adversarial problems. We use two different portfolios of policies and apply it to the game of Go. The first portfolio is composed of different version of the GnuGo agent. The second portfolio is composed of fixed random seeds. First we demonstrate that learning an offline combination of these policies using the notion of Nash Equilibrium generates a stronger opponent. Second, we show that we can learn online such distributions through a bandit approach. The advantages of our approach are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the opponent) (iii) simplicity (no computational overhead) (iv) increased performance. Due to the importance of games on mobile devices, designing artificial intelligences for small computational power is crucial; our approach is particularly suited for mobile device since it create a stronger opponent simply by biaising the distribution over the policies and moreover it generalizes quite well.
[Show abstract][Hide abstract] ABSTRACT: Direct Policy Search (DPS) is a widely used tool for reinforcement learning; however, it is usually not suitable for handling high-dimensional constrained action spaces such as those arising in power system control (unit commitment problems). We propose Direct Value Search, an hybridization of DPS with Bellman decomposition techniques. We prove runtime properties, and apply the results to an energy management problem.
9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium); 05/2014
[Show abstract][Hide abstract] ABSTRACT: Due to simplicity and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochasticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search.
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgique; 04/2014
[Show abstract][Hide abstract] ABSTRACT: Motivated by applications in energy management, this paper presents the
Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the
exploration of risky arms, MARAB takes as arm quality its conditional value at
risk. When the user-supplied risk level goes to 0, the arm quality tends toward
the essential infimum of the arm distribution density, and MARAB tends toward
the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal
value. As a first contribution, this paper presents a theoretical analysis of
the MIN algorithm under mild assumptions, establishing its robustness
comparatively to UCB. The analysis is supported by extensive experimental
validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB
algorithms on artificial and real-world problems.
[Show abstract][Hide abstract] ABSTRACT: The game of Go is a board game with a long history that is much more complex than chess. The uncertainties of this game will be higher when the board size gets bigger. For evaluating the human performance on Go games, one human could be advanced to a higher rank based on the number of winning games via a formal human against human competition. However, a human Go player's performance could be influenced by factors such as the on-the-spot environment, as well as physical and mental situations of the day, which causes difficulty and uncertainty in certificating the human's rank. Thanks to a sample of one player's games, evaluating his/her strength by classical models such as the Bradley-Terry model is possible. However, due to inhomogeneous game conditions and limited access to archives of games, such estimates can be imprecise. In addition, classical rankings (1 Dan, 2 Dan, ...) are integers, which lead to a rather imprecise estimate of the opponent's strengths. Therefore, we propose to use a sample of games played against a computer to estimate the human's strength. In order to increase the precision, the strength of the computer is adapted from one move to the next by increasing or decreasing the computational power based on the current situation and the result of games. The human can decide some specific conditions, such as komi and board size. In this paper, we use type-2 fuzzy sets (T2FSs) with parameters optimized by a genetic algorithm for estimating the rank in a stable manner, independently of board size. More precisely, an adaptive Monte Carlo tree search (MCTS) estimates the number of simulations, corresponding to the strength of its opponents. Next, the T2FS-based adaptive linguistic assessment system infers the human performance and presents the results using the linguistic description. The experimental results show that the proposed approach is feasible for application to the adaptive linguistic assessment on a human Go player's performance.
IEEE Transactions on Fuzzy Systems 01/2014; 23(2):1-1. DOI:10.1109/TFUZZ.2014.2312989 · 8.75 Impact Factor