David Silver’s research while affiliated with KWS UK Ltd and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (123)


Simulation-based search
  • Chapter

December 2023

·

30 Reads

·

1 Citation

Andre Barreto

·

David Silver

The relationship between C++ and assembly programs
a, A C++ implementation of a variable sort 2 function that sorts any input sequence of up to two elements. b, The C++ implementation in a is compiled to this equivalent low-level assembly representation.
The AssemblyGame and algorithm correctness computation
a, The AssemblyGame is played by AlphaDev, which receives as input the current assembly algorithm generated thus far St and plays the game by selecting an action to execute. In this example, the action is a mov<Register0,Memory1> assembly instruction, which is appended to the current algorithm. The agent receives a reward that is a function of the algorithm’s correctness, discussed in b, as well as the algorithm’s latency. The game is won by the player discovering a low latency, correct algorithm. b, The program correctness and latency computations are used to compute the reward rt. In this example, test sequences are input to the algorithm; for example, in the case of sorting three elements, test inputs comprise all sequences of unsorted elements of length 3. For each sequence, the algorithm output is compared to the expected output (in the case of sorting, the expected output is the sorted elements). In this example, the output D′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{D}}{\boldsymbol{{\prime} }}$$\end{document} does not match the expected output B′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bf{B}}{\boldsymbol{{\prime} }}$$\end{document} and the algorithm is therefore incorrect.
Sorting networks and algorithmic improvements discovered by AlphaDev
a, An optimal classic sorting network for three inputs. The circled comparators have been improved by AlphaDev. See the AlphaDev swap move for more details. b,c, The assembly pseudocode before applying the AlphaDev swap move (b) and after applying the AlphaDev swap move (c), resulting in the removal of a single instruction. d, An optimal classic sorting network comparator configuration that has been improved by AlphaDev. See the AlphaDev copy move for more details. e,f, The assembly pseudocode before applying the AlphaDev copy move (e) and after applying the AlphaDev copy move (f), resulting in the removal of a single instruction.
Fundamentally different algorithms discovered by AlphaDev
a, A flow diagram of the variable sort 4 (VarSort4) human benchmark algorithm. In this algorithm, a sequence of unsorted numbers are input into the algorithm. If the sequence length is four, three or two numbers, then the corresponding sort 4, sort 3 or sort 2 sorting network is called that sorts the resulting sequence. The result is then returned and output by the function. b, The VarSort4 algorithm discovered by AlphaDev. This algorithm also receives sequences of length four, three or two numbers as input. In this case, if the length is two, then it calls the sort 2 sorting network and returns. If the length is three then it calls sort 3 to sort the first three numbers and returns. If, however, the length is greater than three, then it calls sort 3, followed by a simplified sort 4 routine that sorts the remaining unsorted number. It is this part of the routine that results in significant latency savings.
Faster sorting algorithms discovered using deep reinforcement learning
  • Article
  • Full-text available

June 2023

·

1,913 Reads

·

117 Citations

Nature

·

Andrea Michi

·

Anton Zhernov

·

[...]

·

David Silver

Fundamental algorithms such as sorting or hashing are used trillions of times on any given day¹. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been achieved in the past², making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks. These algorithms have been integrated into the LLVM standard C++ sort library³. This change to this part of the sort library represents the replacement of a component with an algorithm that has been automatically discovered using reinforcement learning. We also present results in extra domains, showcasing the generality of the approach.

Download

Published as a conference paper at ICLR 2022 PLANNING IN STOCHASTIC ENVIRONMENTS WITH A LEARNED MODEL

February 2023

·

401 Reads

Model-based reinforcement learning has proven highly successful. However, learning a model in isolation from its use during planning is problematic in complex environments. To date, the most effective techniques have instead combined value-equivalent model learning with powerful tree-search methods. This approach is exemplified by MuZero, which has achieved state-of-the-art performance in a wide range of domains, from board games to visually rich environments, with discrete and continuous action spaces, in online and offline settings. However, previous instantiations of this approach were limited to the use of deterministic models. This limits their performance in environments that are inherently stochastic, partially observed, or so large and complex that they appear stochastic to a finite agent. In this paper we extend this approach to learn and plan with stochastic models. Specifically , we introduce a new algorithm, Stochastic MuZero, that learns a stochastic model incorporating afterstates, and uses this model to perform a stochastic tree search. Stochastic MuZero matched or exceeded the state of the art in a set of canonical single and multi-agent environments, including 2048 and backgammon, while maintaining the superhuman performance of standard MuZero in the game of Go.


Mastering the game of Stratego with model-free multiagent reinforcement learning

December 2022

·

285 Reads

·

133 Citations

Science

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.


Matrix multiplication tensor and algorithms
a, Tensor T2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathscr{T}}}_{2}$$\end{document} representing the multiplication of two 2 × 2 matrices. Tensor entries equal to 1 are depicted in purple, and 0 entries are semi-transparent. The tensor specifies which entries from the input matrices to read, and where to write the result. For example, as c1 = a1b1 + a2b3, tensor entries located at (a1, b1, c1) and (a2, b3, c1) are set to 1. b, Strassen's algorithm² for multiplying 2 × 2 matrices using 7 multiplications. c, Strassen's algorithm in tensor factor representation. The stacked factors U, V and W (green, purple and yellow, respectively) provide a rank-7 decomposition of T2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathscr{T}}}_{2}$$\end{document} (equation (1)). The correspondence between arithmetic operations (b) and factors (c) is shown by using the aforementioned colours.
Overview of AlphaTensor
The neural network (bottom box) takes as input a tensor St\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathscr{S}}}_{t}$$\end{document}, and outputs samples (u, v, w) from a distribution over potential next actions to play, and an estimate of the future returns (for example, of −Rank(St)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-{\rm{Rank}}\,({{\mathscr{S}}}_{t})$$\end{document}). The network is trained on two data sources: previously played games and synthetic demonstrations. The updated network is sent to the actors (top box), where it is used by the MCTS planner to generate new games.
Comparison between the complexity of previously known matrix multiplication algorithms and the ones discovered by AlphaTensor
Left: column (n, m, p) refers to the problem of multiplying n × m with m × p matrices. The complexity is measured by the number of scalar multiplications (or equivalently, the number of terms in the decomposition of the tensor). ‘Best rank known’ refers to the best known upper bound on the tensor rank (before this paper), whereas ‘AlphaTensor rank’ reports the rank upper bounds obtained with our method, in modular arithmetic (Z2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbb{Z}}}_{2}$$\end{document}) and standard arithmetic. In all cases, AlphaTensor discovers algorithms that match or improve over known state of the art (improvements are shown in red). See Extended Data Figs. 1 and 2 for examples of algorithms found with AlphaTensor. Right: results (for arithmetic in R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb{R}}$$\end{document}) of applying AlphaTensor-discovered algorithms on larger tensors. Each red dot represents a tensor size, with a subset of them labelled. See Extended Data Table 1 for the results in table form. State-of-the-art results are obtained from the list in ref. ⁶⁴.
Algorithm discovery beyond standard matrix multiplication
a, Decompositions found by AlphaTensor for the tensors of size n(n−1)2×n×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{n(n-1)}{2}\times n\times n$$\end{document} (with n = 3, 4, 5, 6) representing the skew-symmetric matrix-vector multiplication. The red pixels denote 1, the blue pixels denote −1 and the white pixels denote 0. Extrapolation to n = 10 is shown in the rightmost figure. b, Skew-symmetric matrix-by-vector multiplication algorithm, obtained from the examples solved by AlphaTensor. The wij and qi terms in steps 3 and 5 correspond to the mr terms in Algorithm 1. It is noted that steps 6–9 do not involve any multiplications.
Speed-ups of the AlphaTensor-discovered algorithm
a,b, Speed-ups (%) of the AlphaTensor-discovered algorithms tailored for a GPU (a) and a TPU (b), optimized for a matrix multiplication of size 8,192 × 8,192. Speed-ups are measured relative to standard (for example, cuBLAS for the GPU) matrix multiplication on the same hardware. Speed-ups are reported for various matrix sizes (despite optimizing the algorithm only on one matrix size). We also report the speed-up of the Strassen-square algorithm. The median speed-up is reported over 200 runs. The standard deviation over runs is <0.4 percentage points (see Supplementary Information for more details). c, Speed-up of both algorithms (tailored to a GPU and a TPU) benchmarked on both devices.
Discovering faster matrix multiplication algorithms with reinforcement learning

October 2022

·

5,451 Reads

·

461 Citations

Nature

Improving the efficiency of algorithms for fundamental computations can have a widespread impact, as it can affect the overall speed of a large amount of computations. Matrix multiplication is one such primitive task, occurring in many systems—from neural networks to scientific computing routines. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. However, automating the algorithm discovery procedure is intricate, as the space of possible algorithms is enormous. Here we report a deep reinforcement learning approach based on AlphaZero1 for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. Our agent, AlphaTensor, is trained to play a single-player game where the objective is finding tensor decompositions within a finite factor space. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor’s algorithm improves on Strassen’s two-level algorithm for the first time, to our knowledge, since its discovery 50 years ago2. We further showcase the flexibility of AlphaTensor through different use-cases: algorithms with state-of-the-art complexity for structured matrix multiplication and improved practical efficiency by optimizing matrix multiplication for runtime on specific hardware. Our results highlight AlphaTensor’s ability to accelerate the process of algorithmic discovery on a range of problems, and to optimize for different criteria. A reinforcement learning approach based on AlphaZero is used to discover efficient and provably correct algorithms for matrix multiplication, finding faster algorithms for a variety of matrix sizes.


Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

June 2022

·

148 Reads

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of 1053510^{535} nodes, i.e., 1017510^{175} times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of 1016410^{164} nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.


Deep learning, reinforcement learning, and world models

April 2022

·

300 Reads

·

335 Citations

Neural Networks

Deep learning (DL) and reinforcement learning (RL) methods seem to be a part of indispensable factors to achieve human-level or super-human AI systems. On the other hand, both DL and RL have strong connections with our brain functions and with neuroscientific findings. In this review, we summarize talks and discussions in the “Deep Learning and Reinforcement Learning” session of the symposium, International Symposium on Artificial Intelligence and Brain Science. In this session, we discussed whether we can achieve comprehensive understanding of human intelligence based on the recent advances of deep learning and reinforcement learning algorithms. Speakers contributed to provide talks about their recent studies that can be key technologies to achieve human-level intelligence.


Atari parameters. In general, we follow the recommendations by Machado et al. [34].
Default hyperparameters for deep RL experiments.
Self-Consistent Models and Values

October 2021

·

99 Reads

Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly \emph{self-consistent}. Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model. We propose multiple self-consistency updates, evaluate these in both tabular and function approximation settings, and find that, with appropriate choices, self-consistency helps both policy evaluation and control.


Applying and improving AlphaFold at CASP14

October 2021

·

228 Reads

·

344 Citations

Proteins: Structure, Function, and Bioinformatics

We describe the operation and improvement of AlphaFold*, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end-to-end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors’ ranking by summed z-scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modelling targets, and represents a significant improvement in the state of the art in protein structure prediction. We report how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large scale structure prediction. This article is protected by copyright. All rights reserved.


Bootstrapped Meta-Learning

September 2021

·

140 Reads

·

1 Citation

Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic meta-objectives. We propose an algorithm that tackles these issues by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in a Q-learning agent.


Citations (76)


... One important limitation of the original DQN solution is that the operations performed for action selection and evaluation make use of the same values, hence increasing the chances of obtaining overestimated values, which, in turn, create overoptimistic value estimates. The Double DQN [33] addressed this issue by decoupling the selection and evaluation processes. For this, two value functions are trained by dividing the total number of experiences randomly, and from the resulting pair of weights, one is used to determine the policy in a greedy manner, while the second is used solely for value evaluation. ...

Reference:

Trade-Offs in Navigation Problems Using Value-Based Methods
Deep Reinforcement Learning with Double Q-learning
  • Citing Preprint
  • September 2015

... Novel better-performing DQN networks can be created by merging different independent improvements proposed throughout the years. One such architecture is represented by Rainbow DQN [35], merging complementary improvements to the training and design process, or Light-Q-Network (LQN) and Binary-Q-Network (BQN) [36], which focus on limiting the required resources. Another recent solution, PQN [37], has completely eliminated the need for a replay buffer and produced a much faster DQN variant by optimizing the traditional temporal difference learning approach. ...

Rainbow: Combining Improvements in Deep Reinforcement Learning
  • Citing Preprint
  • October 2017

... Sorting algorithms have long been a fundamental area of study in computer science [1], influencing both theoretical analysis [2] and practical applications across diverse domains [3]. The modeling, analysis, and comparative study of these techniques provide deep insights into algorithm efficiency [4], computational complexity, and resource utilization. ...

Faster sorting algorithms discovered using deep reinforcement learning

Nature

... Multi-agent reinforcement learning (MARL) is a framework for sequential decision-making, where multiple agents make decisions in a non-stationary environment to maximize their cumulative rewards. MARL has a wide range of applications, e.g., robotics, distributed control, game AI, and so on (Shalev-Shwartz, Shammah, and Shashua 2016;Silver et al. 2016Silver et al. , 2017Brown and Sandholm 2018;Perolat et al. 2022). Such an environment is often modeled as two-player zero-sum Markov games (TZMGs) (Littman 1994) and computing the equilibria is said to be empirically tractable. ...

Mastering the game of Stratego with model-free multiagent reinforcement learning
  • Citing Article
  • December 2022

Science

... RL has achieved unprecedented success within the gaming industry thanks to its ability to automate the algorithmic discovery process, resulting in superhuman performance in complex strategic games such as Chess (Silver et al. 2017), Go (Silver et al. 2016), and Dota 2 (Berner et al. 2019). By formulating scientific problems within an RL framework ("gamification"), researchers have achieved substantial breakthroughs, including predicting protein structures with atomic-level accuracy (Jumper et al. 2021), developing novel turbulence modeling strategies via multi-agent RL (Novati et al. 2021), and discovering faster algorithms for fundamental basic computations such as matrix multiplication (Fawzi et al. 2022). These examples illustrate the potential of RL in the sciences, beyond traditional game environments. ...

Discovering faster matrix multiplication algorithms with reinforcement learning

Nature

... Plasticity loss -where prolonged training diminishes a network's capacity to learn new tasks -has been studied in off-policy RL (Lyle et al., 2022) and model-based RL (Qiao et al., 2024), with successful applications in DMC (Nikishin et al., 2022;D'Oro et al., 2023). High replay ratios in off-policy RL are known to exacerbate plasticity issues, as both the predictive model and the learned Q-function face continually changing data distributions and using a high replay ratio forces them to learn to solve a sequence of similar, but distinct, tasks (Dabney et al., 2021). Qiao et al. (2024) demonstrate that periodic reinitialization of the learned model parameters can mitigate the loss of plasticity in model-based RL and enhance model accuracy. ...

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning
  • Citing Article
  • May 2021

Proceedings of the AAAI Conference on Artificial Intelligence

... This agent would have an advantage over an agent who uses pure episodic memory in cases such as right after experiencing two intersecting trajectories that each lead to reward (e.g., A > B > C > reward) and noreward (e.g., D > B > E > no-reward); while the ERLAM agent will be able to leverage the graph to plan an unexperienced route (e.g., D > B > C > reward), an agent that only relies on episodic reinforcement learning would associate D with reward only after the direct experience. Recently, expected eligibility traces have been introduced as a form of leveraging counterfactual trajectories to accelerate learning (van Hasselt et al., 2021). Eligibility trace is a mechanism in reinforcement learning that provides a hindsight credit assignment with regard to the current state by keeping a trace of past experiences weighted by their recency (Singh & Sutton, 1996;Sutton & Barto, 2018). ...

Expected Eligibility Traces
  • Citing Article
  • May 2021

Proceedings of the AAAI Conference on Artificial Intelligence

... Unsupervised learning involves models that can recognize similarities, recurrent patterns or differences in "unlabelled data" without prior training, allowing patterns and/or relationships to be identified, and data clustering and association analyses to be performed (e.g., image classification or the identification of patients with similar symptoms). Finally, "reinforcement learning" [16] involves techniques that allow the machine to make better decisions over time by following a trial-and-error method and positive or negative feedback approach to improve the final outcome (e.g., text summarization). ...

Deep learning, reinforcement learning, and world models
  • Citing Article
  • April 2022

Neural Networks

... Progress in this field is monitored as part of the CASP (Critical Assessment of proteins Structure Prediction) project [2,3]. Recently, significant progress has been recorded with the introduction of methods based on Artificial Intelligence (AI) [4]. The deep learning technique applied in AlphaFold model allows the prediction the structure of any protein for given sequence. ...

Applying and improving AlphaFold at CASP14

Proteins: Structure, Function, and Bioinformatics

... Third, PBPs require flexible interpretive processes that are a hallmark of adaptive intelligence and are still challenging for modern machine learning systems. There have recently been striking advances in systems that can generate richly and compositionally structured images from text descriptions (Ramesh et al., 2022), learn how to improve their own learning across tasks (Flennerhag et al., 2022), and appropriately respond to a broad range of natural language queries (Bubeck et al., 2023). Despite these remarkable successes, PBPs provide a challenging testbed for cognition because (a) scenes that belong to one category look superficially similar to scenes that belong to the other category; (b) highly specific interpretations and simulations are needed to solve a categorization problem; (c) novel PBP problems can be created, even automatically, that are not in any preexisting training set; and (d) because of the difficulty in precomputing all of the possible interpretations of a scene that might be involved in a categorization rule, it is practically necessary for a successful system to flexibly generate new interpretations of a scene during problem solving. ...

Bootstrapped Meta-Learning