September 2023
·
2 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
September 2023
·
2 Reads
September 2021
·
5 Reads
Proceedings of the AAAI Conference on Artificial Intelligence
We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.
December 2020
·
217 Reads
·
29 Citations
IEEE Transactions on Systems Man and Cybernetics Systems
This retrospective describes the overall research project that gave rise to the authors' paper "Neuronlike adaptive elements that can solve difficult learning control problems" that was published in the 1983 Neural and Sensory Information Processing special issue of the IEEE Transactions on Systems, Man, and Cybernetics. This look back explains how this project came about, presents the ideas and previous publications that influenced it, and describes our most closely related subsequent research. It concludes by pointing out some noteworthy aspects of this article that have been eclipsed by its main contributions, followed by commenting on some of the directions and cautions that should inform future research.
January 2020
·
238 Reads
·
40 Citations
Frontiers in Neurorobotics
November 2019
·
425 Reads
·
151 Citations
Science
Making well-behaved algorithms Machine learning algorithms are being used in an ever-increasing number of applications, and many of these applications affect quality of life. Yet such algorithms often exhibit undesirable behavior, from various types of bias to causing financial loss or delaying medical diagnoses. In standard machine learning approaches, the burden of avoiding this harmful behavior is placed on the user of the algorithm, who most often is not a computer scientist. Thomas et al. introduce a general framework for algorithm design in which this burden is shifted from the user to the designer of the algorithm. The researchers illustrate the benefits of their approach using examples in gender fairness and diabetes management. Science , this issue p. 999
March 2019
·
74 Reads
·
17 Citations
The idea of implementing reinforcement learning in a computer was one of the earliest ideas about the possibility of AI, but reinforcement learning remained on the margin of AI until relatively recently. Today we see reinforcement learning playing essential roles in some of the most impressive AI applications. This article presents observations from the author’s personal experience with reinforcement learning over the most recent 40 years of its history in AI, focusing on striking connections that emerged between largely separate disciplines and on some of the findings that surprised him along the way. These connections and surprises place reinforcement learning in a historical context, and they help explain the success it is finding in modern AI. The article concludes by discussing some of the challenges that need to be faced as reinforcement learning moves out into real world.
March 2019
·
1,278 Reads
·
322 Citations
Reinforcement learning algorithms that use deep neural networks are a promising approach for the development of machines that can acquire knowledge and solve problems without human input or supervision. At present, however, these algorithms are implemented in software running on relatively standard complementary metal–oxide–semiconductor digital platforms, where performance will be constrained by the limits of Moore’s law and von Neumann architecture. Here, we report an experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for our hybrid analogue–digital platform. To illustrate the capabilities of our approach in robust in situ training without the need for a model, we performed two classic control problems: the cart–pole and mountain car simulations. We also show that, compared with conventional digital systems in real-world reinforcement learning tasks, our hybrid analogue–digital computing system has the potential to achieve a significant boost in speed and energy efficiency. A reinforcement learning algorithm can be implemented on a hybrid analogue–digital platform based on memristive arrays for parallel and energy-efficient in situ training.
February 2018
·
161 Reads
·
52 Citations
Behavioural Processes
This article focuses on the division of labor between evolution and development in solving sequential, state-dependent decision problems. Currently, behavioral ecologists tend to use dynamic programming methods to study such problems. These methods are successful at predicting animal behavior in a variety of contexts. However, they depend on a distinct set of assumptions. Here, we argue that behavioral ecology will benefit from drawing more than it currently does on a complementary collection of tools, called reinforcement learning methods. These methods allow for the study of behavior in highly complex environments, which conventional dynamic programming methods do not feasibly address. In addition, reinforcement learning methods are well-suited to studying how biological mechanisms solve developmental and learning problems. For instance, we can use them to study simple rules that perform well in complex environments. Or to investigate under what conditions natural selection favors fixed, non-plastic traits (which do not vary across individuals), cue-driven-switch plasticity (innate instructions for adaptive behavioral development based on experience), or developmental selection (the incremental acquisition of adaptive behavior based on experience). If natural selection favors developmental selection, which includes learning from environmental feedback, we can also make predictions about the design of reward systems. Our paper is written in an accessible manner and for a broad audience, though we believe some novel insights can be drawn from our discussion. We hope our paper will help advance the emerging bridge connecting the fields of behavioral ecology and reinforcement learning.
August 2017
·
61 Reads
·
5 Citations
Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are well-behaved---that they do not, for example, cause harm to humans or act in a racist or sexist way---is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we address here. We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors. To show the viability of this new framework, we use it to create new machine learning algorithms that preclude the sexist and harmful behaviors exhibited by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.
January 2017
·
42 Reads
·
1 Citation
... This approach allows them to use deep neural networks to model the uncertainties of the environment, which leads to a more robust controller compared to traditional ones. Later, Konidaris et al. [22] propose to use RL to automatise the skill acquisition on a mobile manipulator. Unlike DL, RL allows to automatically obtain the experience needed to learn robotic skills through trial-and-error and allows to learn complex decision-making policies. ...
August 2011
Proceedings of the AAAI Conference on Artificial Intelligence
... Examples include REINFORCE [24], Advantage Actor-Critic (A2C) [25], and Proximal Policy Optimization (PPO) [26]. • Actor-Critic Architectures [27]: These architectures consist of two neural networks: one for the policy (actor) and one for the value function (critic). The actor chooses actions, while the critic evaluates them to guide the actor's learning. ...
December 2020
IEEE Transactions on Systems Man and Cybernetics Systems
... Indeed, studies on animals [13][14][15] and humans [16][17][18] have explored the inherent inclination towards novelty, which is further supported by neuroscience experiments [19][20][21]. The field of intrinsically motivated open-ended learning (IMOL [22]) tackles the problem of developing agents that aim at improving their capabilities to interact with the environment without any specific assigned task. More precisely, Intrinsic Motivations (IMs [23,24]) are a class of selfgenerated signals that have been used to provide robots with autonomous guidance for several different processes, from state-and-action space exploration [25,26], to the autonomous discovery, selection and learning of multiple goals [27][28][29]. ...
January 2020
Frontiers in Neurorobotics
... To assess and achieve anti-discrimination fairness, discrimination statistics that measure the average similarity of decisions across groups are used. 30 In addition to consistency and anti-discrimination, a third concept is counterfactual fairness, which ensures an algorithm's decisions remain consistent across hypothetical scenarios where individuals' protected attributes are altered. 19 Typically, causal models that describe how changes in protected attributes affect decisions and other attributes of individuals are used to assess and achieve counterfactual fairness. ...
November 2019
Science
... En el transcurso de ese año, Arthur Samuel escribió un juego de damas, el primer software de su clase en integrar Inteligencia Artificial en los Estados Unidos [22]; y, en 1955, extendió las capacidades del juego desarrollado por Strachey, permitiendo que aprendiera por su cuenta con base en la experiencia [22]. En 1954, Belmont Farley y Wesley Clark simularon, por primera vez, el aprendizaje por refuerzo de una red neuronal de 128 neuronas en una computadora digital, con el objetivo de reconocer simples patrones en un conjunto de datos [25]. ...
March 2019
... One solution here is memristor-based analogue computing [9][10][11] . This approach improves energy efficiency by eliminating the emulation layers required to operate an artificial neural network on von Neumann architecture-based computers [12][13][14] . Memristors can achieve a higher density in crossbar arrays compared to conventional digital memory devices due to their simple two-terminal structure and ability to achieve multi-level conductance states 15 . ...
March 2019
... Residual RL: Methods in this family [35], [36], [37] decompose the policy into a sum of two parts, one representing prior knowledge, namely trained from demonstration data, and one residual policy that is learned through RL. ...
January 2004
... Reinforcement learning is a widely used model for learning mechanisms characterized by an algorithm in which an agent learns to choose the optimal behavior in an environment by acquiring rewards through interactions with it [14,15]. The standard model is termed the temporal difference (TD) learning model [16], the Rescorla-Wagner (RW) model [17,18], and Qlearning model [18]. ...
February 2018
Behavioural Processes
... This D4PG variant learns the learning rate of the lagrange multiplier in a soft-constrained optimization procedure. Thomas et al. (2017) propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesired behaviours. There have also been approaches to learn a policy that satisfies constraints in the presence of perturbations to the dynamics of an environment . ...
August 2017
... In columns are presented predicted classes, in rows actual classes. That way of orientation of matrix is used in many sources [19][20][21][22][23][24][25][26] , however different sources [27][28][29] use another. This means that adding the right headings in this kind of presentation is very important to avoid misunderstanding. ...
January 2017