December 2023
·
30 Reads
·
1 Citation
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2023
·
30 Reads
·
1 Citation
June 2023
·
1,913 Reads
·
117 Citations
Nature
Fundamental algorithms such as sorting or hashing are used trillions of times on any given day¹. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been achieved in the past², making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks. These algorithms have been integrated into the LLVM standard C++ sort library³. This change to this part of the sort library represents the replacement of a component with an algorithm that has been automatically discovered using reinforcement learning. We also present results in extra domains, showcasing the generality of the approach.
February 2023
·
401 Reads
Model-based reinforcement learning has proven highly successful. However, learning a model in isolation from its use during planning is problematic in complex environments. To date, the most effective techniques have instead combined value-equivalent model learning with powerful tree-search methods. This approach is exemplified by MuZero, which has achieved state-of-the-art performance in a wide range of domains, from board games to visually rich environments, with discrete and continuous action spaces, in online and offline settings. However, previous instantiations of this approach were limited to the use of deterministic models. This limits their performance in environments that are inherently stochastic, partially observed, or so large and complex that they appear stochastic to a finite agent. In this paper we extend this approach to learn and plan with stochastic models. Specifically , we introduce a new algorithm, Stochastic MuZero, that learns a stochastic model incorporating afterstates, and uses this model to perform a stochastic tree search. Stochastic MuZero matched or exceeded the state of the art in a set of canonical single and multi-agent environments, including 2048 and backgammon, while maintaining the superhuman performance of standard MuZero in the game of Go.
December 2022
·
285 Reads
·
133 Citations
Science
We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.
October 2022
·
5,451 Reads
·
461 Citations
Nature
Improving the efficiency of algorithms for fundamental computations can have a widespread impact, as it can affect the overall speed of a large amount of computations. Matrix multiplication is one such primitive task, occurring in many systems—from neural networks to scientific computing routines. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. However, automating the algorithm discovery procedure is intricate, as the space of possible algorithms is enormous. Here we report a deep reinforcement learning approach based on AlphaZero1 for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. Our agent, AlphaTensor, is trained to play a single-player game where the objective is finding tensor decompositions within a finite factor space. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor’s algorithm improves on Strassen’s two-level algorithm for the first time, to our knowledge, since its discovery 50 years ago2. We further showcase the flexibility of AlphaTensor through different use-cases: algorithms with state-of-the-art complexity for structured matrix multiplication and improved practical efficiency by optimizing matrix multiplication for runtime on specific hardware. Our results highlight AlphaTensor’s ability to accelerate the process of algorithmic discovery on a range of problems, and to optimize for different criteria. A reinforcement learning approach based on AlphaZero is used to discover efficient and provably correct algorithms for matrix multiplication, finding faster algorithms for a variety of matrix sizes.
June 2022
·
148 Reads
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of nodes, i.e., times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
April 2022
·
300 Reads
·
335 Citations
Neural Networks
Deep learning (DL) and reinforcement learning (RL) methods seem to be a part of indispensable factors to achieve human-level or super-human AI systems. On the other hand, both DL and RL have strong connections with our brain functions and with neuroscientific findings. In this review, we summarize talks and discussions in the “Deep Learning and Reinforcement Learning” session of the symposium, International Symposium on Artificial Intelligence and Brain Science. In this session, we discussed whether we can achieve comprehensive understanding of human intelligence based on the recent advances of deep learning and reinforcement learning algorithms. Speakers contributed to provide talks about their recent studies that can be key technologies to achieve human-level intelligence.
October 2021
·
99 Reads
Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly \emph{self-consistent}. Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model. We propose multiple self-consistency updates, evaluate these in both tabular and function approximation settings, and find that, with appropriate choices, self-consistency helps both policy evaluation and control.
October 2021
·
228 Reads
·
344 Citations
Proteins: Structure, Function, and Bioinformatics
We describe the operation and improvement of AlphaFold*, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end-to-end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors’ ranking by summed z-scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modelling targets, and represents a significant improvement in the state of the art in protein structure prediction. We report how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large scale structure prediction. This article is protected by copyright. All rights reserved.
September 2021
·
140 Reads
·
1 Citation
Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic meta-objectives. We propose an algorithm that tackles these issues by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in a Q-learning agent.
... One important limitation of the original DQN solution is that the operations performed for action selection and evaluation make use of the same values, hence increasing the chances of obtaining overestimated values, which, in turn, create overoptimistic value estimates. The Double DQN [33] addressed this issue by decoupling the selection and evaluation processes. For this, two value functions are trained by dividing the total number of experiences randomly, and from the resulting pair of weights, one is used to determine the policy in a greedy manner, while the second is used solely for value evaluation. ...
September 2015
... Novel better-performing DQN networks can be created by merging different independent improvements proposed throughout the years. One such architecture is represented by Rainbow DQN [35], merging complementary improvements to the training and design process, or Light-Q-Network (LQN) and Binary-Q-Network (BQN) [36], which focus on limiting the required resources. Another recent solution, PQN [37], has completely eliminated the need for a replay buffer and produced a much faster DQN variant by optimizing the traditional temporal difference learning approach. ...
October 2017
... Sorting algorithms have long been a fundamental area of study in computer science [1], influencing both theoretical analysis [2] and practical applications across diverse domains [3]. The modeling, analysis, and comparative study of these techniques provide deep insights into algorithm efficiency [4], computational complexity, and resource utilization. ...
June 2023
Nature
... Multi-agent reinforcement learning (MARL) is a framework for sequential decision-making, where multiple agents make decisions in a non-stationary environment to maximize their cumulative rewards. MARL has a wide range of applications, e.g., robotics, distributed control, game AI, and so on (Shalev-Shwartz, Shammah, and Shashua 2016;Silver et al. 2016Silver et al. , 2017Brown and Sandholm 2018;Perolat et al. 2022). Such an environment is often modeled as two-player zero-sum Markov games (TZMGs) (Littman 1994) and computing the equilibria is said to be empirically tractable. ...
December 2022
Science
... RL has achieved unprecedented success within the gaming industry thanks to its ability to automate the algorithmic discovery process, resulting in superhuman performance in complex strategic games such as Chess (Silver et al. 2017), Go (Silver et al. 2016), and Dota 2 (Berner et al. 2019). By formulating scientific problems within an RL framework ("gamification"), researchers have achieved substantial breakthroughs, including predicting protein structures with atomic-level accuracy (Jumper et al. 2021), developing novel turbulence modeling strategies via multi-agent RL (Novati et al. 2021), and discovering faster algorithms for fundamental basic computations such as matrix multiplication (Fawzi et al. 2022). These examples illustrate the potential of RL in the sciences, beyond traditional game environments. ...
October 2022
Nature
... Plasticity loss -where prolonged training diminishes a network's capacity to learn new tasks -has been studied in off-policy RL (Lyle et al., 2022) and model-based RL (Qiao et al., 2024), with successful applications in DMC (Nikishin et al., 2022;D'Oro et al., 2023). High replay ratios in off-policy RL are known to exacerbate plasticity issues, as both the predictive model and the learned Q-function face continually changing data distributions and using a high replay ratio forces them to learn to solve a sequence of similar, but distinct, tasks (Dabney et al., 2021). Qiao et al. (2024) demonstrate that periodic reinitialization of the learned model parameters can mitigate the loss of plasticity in model-based RL and enhance model accuracy. ...
May 2021
Proceedings of the AAAI Conference on Artificial Intelligence
... This agent would have an advantage over an agent who uses pure episodic memory in cases such as right after experiencing two intersecting trajectories that each lead to reward (e.g., A > B > C > reward) and noreward (e.g., D > B > E > no-reward); while the ERLAM agent will be able to leverage the graph to plan an unexperienced route (e.g., D > B > C > reward), an agent that only relies on episodic reinforcement learning would associate D with reward only after the direct experience. Recently, expected eligibility traces have been introduced as a form of leveraging counterfactual trajectories to accelerate learning (van Hasselt et al., 2021). Eligibility trace is a mechanism in reinforcement learning that provides a hindsight credit assignment with regard to the current state by keeping a trace of past experiences weighted by their recency (Singh & Sutton, 1996;Sutton & Barto, 2018). ...
May 2021
Proceedings of the AAAI Conference on Artificial Intelligence
... Unsupervised learning involves models that can recognize similarities, recurrent patterns or differences in "unlabelled data" without prior training, allowing patterns and/or relationships to be identified, and data clustering and association analyses to be performed (e.g., image classification or the identification of patients with similar symptoms). Finally, "reinforcement learning" [16] involves techniques that allow the machine to make better decisions over time by following a trial-and-error method and positive or negative feedback approach to improve the final outcome (e.g., text summarization). ...
April 2022
Neural Networks
... Progress in this field is monitored as part of the CASP (Critical Assessment of proteins Structure Prediction) project [2,3]. Recently, significant progress has been recorded with the introduction of methods based on Artificial Intelligence (AI) [4]. The deep learning technique applied in AlphaFold model allows the prediction the structure of any protein for given sequence. ...
Reference:
Chameleon sequences—Structural effects
October 2021
Proteins: Structure, Function, and Bioinformatics
... Third, PBPs require flexible interpretive processes that are a hallmark of adaptive intelligence and are still challenging for modern machine learning systems. There have recently been striking advances in systems that can generate richly and compositionally structured images from text descriptions (Ramesh et al., 2022), learn how to improve their own learning across tasks (Flennerhag et al., 2022), and appropriately respond to a broad range of natural language queries (Bubeck et al., 2023). Despite these remarkable successes, PBPs provide a challenging testbed for cognition because (a) scenes that belong to one category look superficially similar to scenes that belong to the other category; (b) highly specific interpretations and simulations are needed to solve a categorization problem; (c) novel PBP problems can be created, even automatically, that are not in any preexisting training set; and (d) because of the difficulty in precomputing all of the possible interpretations of a scene that might be involved in a categorization rule, it is practically necessary for a successful system to flexibly generate new interpretations of a scene during problem solving. ...
September 2021