Conference PaperPDF Available

Training artificial neural networks to learn a nondeterministic game

Authors:
Conference Paper

Training artificial neural networks to learn a nondeterministic game

Abstract and Figures

It is well known that artificial neural networks (ANNs) can learn deterministic automata. Learning nondeterministic automata is another matter. This is important because much of the world is nondeterministic, taking the form of unpredictable or probabilistic events that must be acted upon. If ANNs are to engage such phenomena, then they must be able to learn how to deal with nondeterminism. In this project the game of Pong poses a nondeterministic environment. The learner is given an incomplete view of the game state and underlying deterministic physics, resulting in a nondeterministic game. Three models were trained and tested on the game: Mona, Elman, and Numenta's NuPIC.
Content may be subject to copyright.
A preview of the PDF is not available
Article
Full-text available
We consider the problem of learning a finite automaton with recurrent neural networks, given a training set of sentences in a language. We train Elman recurrent neural networks on the prediction task and study experimentally what these networks learn. We found that the network tends to encode an approximation of the minimum automaton that accepts only the sentences in the training set. 1 Introduction 1.1 The problem of inducing a deterministic finite automaton (DFA) The interest in DFA inference is partly induced from the larger goal of explaining how humans learn the grammar rules of their native language. There have been debates on whether children learn in an unsupervised mode, just by listening to other language speakers, or if they have innate knowledge of language. Therefore, it is an interesting problem to see what can be learned just by "listening to others", that is, from a set of grammatically correct sentences. While the complex syntactic rules of natural language cannot b...
Conference Paper
Full-text available
In this paper, we propose some improvements for the problem of time series prediction with neural networks where a medium-term prediction horizon is needed. In particular, the ionospheric prediction service of the french Centre National d' ' Etudes des T'el'ecommunications needs a six-month ahead prediction of a sunspots related time series which has a strong influence on wave propagation in ionosphere.
Article
Full-text available
An application of time series prediction, to traffic forecasting in ATM networks, using neural nets is described. One key issue, the number of data points needed to be included in the input representation to the net is discussed from a theoretical point of view, and the results are applied in the model under discussion. Experimental results are discussed and analysed.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
Natural world interaction (NWI), the pursuit of arbitrary goals in unstructured physical environments, is an ex-cellent motivating problem for the reintegration of ar-tificial intelligence. It is the problem set that humans struggle to solve. At a minimum it entails perception, learning, planning, and control, and can also involve language and social behavior. An agent's fitness in NWI is achieved by being able to perform a wide variety of tasks, rather than being able to excel at one. In an at-tempt to address NWI, a brain-emulating cognition and control architecture (BECCA) was developed. It uses a combination of feature creation and model-based re-inforcement learning to capture structure in the envi-ronment in order to maximize reward. BECCA avoids making common assumptions about its world, such as stationarity, determinism, and the Markov assumption. BECCA has been demonstrated performing a set of tasks which is non-trivially broad, including a vision-based robotics task. Current development activity is fo-cused on applying BECCA to the problem of general Search and Retrieve, a representative natural world in-teraction task.
Article
Introduction to Languages and the Theory of Computation is an introduction to the theory of computation that emphasizes formal languages, automata and abstract models of computation, and computability; it also includes an introduction to computational complexity and NP-completeness. Through the study of these topics, students encounter profound computational questions and are introduced to topics that will have an ongoing impact in computer science. Once students have seen some of the many diverse technologies contributing to computer science, they can also begin to appreciate the field as a coherent discipline. A distinctive feature of this text is its gentle and gradual introduction of the necessary mathematical tools in the context in which they are used. Martin takes advantage of the clarity and precision of mathematical language but also provides discussion and examples that make the language intelligible to those just learning to read and speak it. The material is designed to be accessible to students who do not have a strong background in discrete mathematics, but it is also appropriate for students who have had some exposure to discrete math but whose skills in this area need to be consolidated and sharpened.Table of contentsI Mathematical Notation and Techniques1 Basic Mathematical Objects2 Mathematical Induction and Recursive DefinitionsII Regular Languages and Finite Automata3 Regular Expressions and Finite Automata4 Nondeterminism and Kleene's Theorem5 Regular and Nonregular LanguagesIII Context-Free Languages and Pushdown Automata6 Context-Free Grammars7 Pushdown Automata8 Context-Free and Non-Context-Free LanguagesIV Turing Machines and Their Languages9 Turing Machines10 Recursively Enumerable LanguagesV Unsolvable Problems and Computable Functions11 Unsolvable Problems12 Computable FunctionsVI Introduction to Computational Complexity13 Measuring and Classifying Complexity14 Tractable and Intractable Problems
Article
Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implicitly by its effects on processing rather than explicitly (as in a spatial representation). The current report develops a proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory. In this approach, hidden unit patterns are fed back to themselves; the internal representations which develop thus reflect task demands in the context of prior internal states. A set of simulations is reported which range from relatively simple problems (temporal version of XOR) to discovering syntactic/semantic features for words. The networks are able to learn interesting internal representations which incorporate task demands with memory demands; indeed, in this approach the notion of memory is inextricably bound up with task processing. These representations reveal a rich structure, which allows them to be highly context-dependent, while also expressing generalizations across classes of items. These representations suggest a method for representing lexical categories and the type/token distinction.
Conference Paper
We review two previous simulations in which opponent modelling was performed within the computer game of pong. These results suggested that sums of local models were better than a single global model on this data set. We compare two supervised methods, the multilayered perceptron, which is global, and the radial basis function network which is a sum of local models on this data and again find that the latter gives better performance. Finally we introduce a new topology preserving network which can give very local or more global estimates of results and show that, while the local estimates are more accurate, they result in game play which is less human-like in behaviour.
Article
We show that a recurrent, second-order neural network using a real-time, forward training algorithm readily learns to infer small regular grammars from positive and negative string training samples. We present simulations that show the effect of initial conditions, training set size and order, and neural network architecture. All simulations were performed with random initial weight strengths and usually converge after approximately a hundred epochs of training. We discuss a quantization algorithm for dynamically extracting finite state automata during and after training. For a well-trained neural net, the extracted automata constitute an equivalence class of state machines that are reducible to the minimal machine of the inferred grammar. We then show through simulations that many of the neural net state machines are dynamically stable, that is, they correctly classify many long unseen strings. In addition, some of these extracted automata actually outperform the trained neural network for classification of unseen strings.
Article
This study compares the maze learning performance of three artificial neural network architectures: an Elman recurrent neural network, a long short-term memory (LSTM) network, and Mona, a goal-seeking neural network. The mazes are networks of distinctly marked rooms randomly interconnected by doors that open probabilistically. The mazes are used to examine two important problems related to artificial neural networks: (1) the retention of long-term state information and (2) the modular use of learned information. For the former, mazes impose a context learning demand: at the beginning of the maze, an initial door choice forms a context that must be remembered until the end of the maze, where the same numbered door must be chosen again in order to reach the goal. For the latter, the effect of modular and non-modular training is examined. In modular training, the door associations are trained in separate trials from the intervening maze paths, and only presented together in testing trials. All networks performed well on mazes without the context learning requirement. The Mona and LSTM networks performed well on context learning with non-modular training; the Elman performance degraded as the task length increased. Mona also performed well for modular training; both the LSTM and Elman networks performed poorly with modular training.