James E. Kostas’s research while affiliated with University of Massachusetts Amherst and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Coagent Networks: Generalized and Scaled
  • Preprint

May 2023

·

13 Reads

James E. Kostas

·

·

Yash Chandak

·

[...]

·

Philip S. Thomas

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different parts of the network \emph{asynchronously} (at different rates or at different times), can incorporate non-differentiable components that cannot be used with backpropagation, and can explore at levels higher than their action spaces (that is, they can be designed as hierarchical networks for exploration and/or temporal abstraction). However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches. This work generalizes the coagent theory and learning rules provided by previous works; this generalization provides more flexibility for network architecture design within the coagent framework. This work also studies one of the chief disadvantages of coagent networks: high variance updates for networks that have many coagents and do not use backpropagation. We show that a coagent algorithm with a policy network that does not use backpropagation can scale to a challenging RL domain with a high-dimensional state and action space (the MuJoCo Ant environment), learning reasonable (although not state-of-the-art) policies. These contributions motivate and provide a more general theoretical foundation for future work that studies coagent networks.



Edge-Compatible Reinforcement Learning for Recommendations

December 2021

·

8 Reads

Most reinforcement learning (RL) recommendation systems designed for edge computing must either synchronize during recommendation selection or depend on an unprincipled patchwork collection of algorithms. In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem. The class of algorithms that we propose can be distributed over the internet and run asynchronously and in real-time. When a given edge fails to respond to a request for data with sufficient speed, this is not a problem; the algorithm is designed to function and learn in the edge setting, and network issues are part of this setting. The result is a principled, theoretically grounded RL algorithm designed to be distributed in and learn in this asynchronous environment. In this work, we describe this algorithm and a proposed class of architectures in detail, and demonstrate that they work well in practice in the asynchronous setting, even as the network quality degrades.



Reinforcement Learning Without Backpropagation or a Clock

February 2019

·

80 Reads

In this paper we introduce a reinforcement learning (RL) approach for training policies, including artificial neural network policies, that is both \emph{backpropagation-free} and \emph{clock-free}. It is \emph{backpropagation-free} in that it does not propagate any information backwards through the network. It is \emph{clock-free} in that no signal is given to each node in the network to specify when it should compute its output and when it should update its weights. We contend that these two properties increase the biological plausibility of our algorithms and facilitate distributed implementations. Additionally, our approach eliminates the need for customized learning rules for hierarchical RL algorithms like the option-critic.


Figure 1. The structure of the proposed overall policy, πo, consisting of f and πi, that learns action representations to generalize over large action sets.
Figure 3. (a) Given a state transition tuple, functions g and f are used to estimate the action taken. The red arrow denotes the gradients of the supervised loss (5) for learning the parameters of these functions. (b) During execution, an internal policy, πi, can be used to first select an action representation, e. The function f , obtained from previous learning procedure, then transforms this representation to an action. The blue arrow represents the internal policy gradients (7) obtained using Lemma 2 to update πi.
Figure 4: (a) The maze environment. The star denotes the goal state, the red dot corresponds to the agent and the arrows around it are the 12 actuators. Each action corresponds to a unique combination of these actuators. Therefore, in total 2 12 actions are possible. (b) 2-D representations for the displacements in the Cartesian co-ordinates caused by each action, and (c) learned action embeddings. In both (b) and (c), each action is colored based on the displacement (∆x, ∆y) it produces. That is, with the color [R= ∆x, G=∆y, B=0.5], where ∆x and ∆y are normalized to [0, 1] before coloring. Cartesian actions are plotted on co-ordinates (∆x, ∆y), and learned ones are on the coordinates in the embedding space. Smoother color transition of the learned representation is better as it corresponds to preservation of the relative underlying structure. The 'squashing' of the learned embeddings is an artifact of a non-linearity applied to bound its range.
Figure 5. (Top) Results on the Maze domain with 2 4 , 2 8 , and 2 12 actions respectively. (Bottom) Results on a) Tutorial MDP b) Software MDP. AC-RA and DPG-RA are the variants of PG-RA algorithm that uses actor-critic (AC) and DPG, respectively. The shaded regions correspond to one standard deviation and were obtained using 10 trials.
Learning Action Representations for Reinforcement Learning
  • Preprint
  • File available

January 2019

·

481 Reads

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

Download

Citations (1)


... Formal languages can be extended to be more expressive, to capture privacy properties [83], data-based properties [59], [60], fairness properties [12], [27], among others. Some of these kinds of properties can be automatically verified probabilistically [4], [29], [33], [53], [81]. ...

Reference:

Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
  • Citing Conference Paper
  • May 2023