Sridhar Mahadevan's research while affiliated with University of Massachusetts Amherst and other places

Publications (136)

Article
Full-text available
Knowledge transfer is computationally challenging, due in part to the curse of dimensionality, compounded by source and target domains expressed using differ-ent features (e.g., documents written in different languages). Recent work on man-ifold learning has shown that data collected in real-world settings often have high-dimensional representation...
Preprint
In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as pre...
Preprint
We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-poi...
Preprint
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample a...
Chapter
We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be reconciled using a unified mathematical framework that exploits the curved Riemannian geometry of statistical manifolds. We exploit a simple but important observation that as the s...
Article
In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as pre...
Preprint
Full-text available
In optimization, the negative gradient of a function denotes the direction of steepest descent. Furthermore, traveling in any direction orthogonal to the gradient maintains the value of the function. In this work, we show that these orthogonal directions that are ignored by gradient descent can be critical in equilibrium problems. Equilibrium probl...
Preprint
Full-text available
We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds. Our approach is based on formulating transfer from source to target as a problem of geometric mean...
Article
Full-text available
Algorithmic game theory (AGT) focuses on the design and analysis of algorithms for interacting agents, with interactions rigorously formalized within the framework of games. Results from AGT find applications in domains such as online bidding auctions for web advertisements and network routing protocols. Monotone games are games where agent strateg...
Article
Calibration transfer (CT) is the process of transferring a calibration curve from one instrument to another or from one set of conditions to another. Direct standardization (DS) of the spectra from a source to a target representation is a popular method of CT, but the multivariate objective function is often significantly underdetermined. Piecewise...
Article
The task of proper baseline or continuum removal is common to nearly all types of spectroscopy. Its goal is to remove any portion of a signal that is irrelevant to features of interest while preserving any predictive information. Despite the importance of baseline removal, median or guessed default parameters are commonly employed, often using comm...
Article
Although many machine learning algorithms involve learning subspaces with particular characteristics, optimizing a parameter matrix that is constrained to represent a subspace can be challenging. One solution is to use Riemannian optimization methods that enforce such constraints implicitly, leveraging the fact that the feasible parameter values fo...
Article
Full-text available
Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the \emph{Generative Multi-Adversarial Network} (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax obje...
Article
Full-text available
This paper presents a new framework for analyzing and designing no-regret algorithms for dynamic (possibly adversarial) systems. The proposed framework generalizes the popular online convex optimization framework and extends it to its natural limit allowing it to capture a notion of regret that is intuitive for more general problems such as those e...
Article
Full-text available
Hyperspectral instruments (HSIs) measure the electromagnetic energy emitted by materials at high resolution (hundreds to thousands of channels) enabling material identification through spectroscopic analysis. Laser-induced breakdown spectroscopy (LIBS) is used by the ChemCam instrument on the Curiosity rover to measure the emission spectra of surfa...
Article
This study uses 1356 spectra from 452 geologically-diverse samples, the largest suite of LIBS rock spectra ever assembled, to compare the accuracy of elemental predictions in models that use only spectral regions thought to contain peaks arising from the element of interest versus those that use information in the entire spectrum. Results show that...
Article
Full-text available
Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement lear...
Article
Tools for mineral identification based on Raman spectroscopy fall into two groups: those that are largely based on fits to diagnostic peaks associated with specific phases, and those that use the entire spectral range for multivariate analyses. In this project, we apply machine learning techniques to improve mineral identification using the latter...
Article
Full-text available
Recent work has explored methods for learning continuous vector space word representations reflecting the underlying semantics of words. Simple vector space arithmetic using cosine distances has been shown to capture certain types of analogies, such as reasoning about plurals from singulars, past tense from present tense, etc. In this paper, we int...
Article
Laser-induced breakdown spectroscopy (LIBS) is currently being used onboard the Mars Science Laboratory rover Curiosity to predict elemental abundances in dust, rocks, and soils using a partial least squares regression model developed by the ChemCam team. Accuracy of that model is constrained by the number of samples needed in the calibration, whic...
Article
Development of standards for analysis of Cr, Ni, Mn, Co, Zn, and S for use in LIBS applications is described.
Article
We present a new calibration and model for determining Fe3+ contents of glasses using XAS. The resulting errors are ±2.9% Fe3+, similar to that of Mössbauer.
Article
Current manifold alignment methods can effectively align data sets that are drawn from a non-intersecting set of manifolds. However, as data sets become increasingly high-dimensional and complex, this assumption may not hold. This paper proposes a novel manifold alignment algorithm, low rank alignment (LRA), that uses a low rank representation (ins...
Article
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to the public attention. Key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition....
Article
In this paper, we show for the first time how gradient TD (GTD) reinforcement learning methods can be formally derived as true stochastic gradient algorithms, not with respect to their original objective functions as previously attempted, but rather using derived primal-dual saddle-point objective functions. We then conduct a saddle-point error ana...
Article
Full-text available
In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement...
Article
Important aspects of human cognition, like creativity and play, involve dealing with multiple divergent views of objects, goals, and plans. We argue in this paper that the current model of optimization that drives much of modern machine learning research is far too restrictive a paradigm to mathematically model the richness of human cognition. Inst...
Article
Graph construction is the essential first step for nearly all manifold learning algorithms. While many applications assume that a simple κ-nearest or ε-close neighbors graph will accurately model the topology of the underlying manifold, these methods often require expert tuning and may not produce high quality graphs. In this paper, the hyperparame...
Article
This paper presents a new approach to representation discovery in reinforcement learning (RL) using basis adaptation. We introduce a general framework for basis adaptation as nonlinear separable least-squares value function approximation based on finding Fréchet gradients of an error function using variable projection functionals. We then present a...
Article
This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in highdimensional spaces. Unlike traditional gradient methods, mirror descent u...
Article
This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis...
Article
We present a novel l 1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point...
Conference Paper
Approximation of matrices using the Singular Value Decomposition (SVD) plays a central role in many science and engineering applications. However, the computation cost of an exact SVD is prohibitively high for very large matrices. In this paper, we describe a GPU-based approximate SVD algorithm for large matrices. Our method is based on the QUIC-SV...
Conference Paper
Full-text available
This paper describes a novel framework to jointly learn data-dependent label and locality-preserving projections. Given a set of data instances from multiple classes, the proposed approach can automatically learn which classes are more similar to each other, and construct discriminative features using both labeled and unlabeled data to map similar...
Conference Paper
Full-text available
We propose a manifold alignment based approach for heterogeneous domain adaptation. A key aspect of this approach is to construct mappings to link different feature spaces in order to transfer knowledge across domains. The new approach can reuse labeled data from multiple source domains in a target domain even in the case when the input domains do...
Article
Full-text available
This paper describes a novel framework for learning discrim-inative features, where both labeled and unlabeled data are used to map the data instances to a lower dimensional space, preserving both class separability and data manifold topology. In contrast to linear discrimi-nant analysis (LDA) and its variants (like semi-supervised discriminant ana...
Article
Full-text available
We introduce a novel approach to multiscale manifold align-ment. Our approach goes beyond the previously studied approaches in that it yields a hierarchical alignment that preserves the local geometry of each manifold and matches the corresponding instances across manifolds at different temporal and spatial scales. The proposed approach is non-para...
Conference Paper
This paper introduces an approach to auto- matic basis function construction for Hierar- chical Reinforcement Learning (HRL) tasks. We describe some considerations that arise when constructing basis functions for multi- level task hierarchies. We extend previous work on using Laplacian bases for value func- tion approximation to situations where th...
Conference Paper
Partially Observable Markov Decision Processes (POMDPs) are a well-established and rigorous frame- work for sequential decision-making under uncertainty. POMDPs are well-known to be intractable to solve exactly, and there has been significant work on finding tractable approximation methods. One well-studied ap- proach is to find a compression of th...
Conference Paper
Automatically constructing novel representations of tasks from analysis of state spaces is a longstanding fundamental challenge in AI. I review recent progress on this problem for sequential decision making tasks modeled as Markov decision processes. Specifically, I discuss three classes of representation discovery problems: finding functional, sta...
Article
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed poi...
Article
Full-text available
Many machine learning data sets are embedded in high-dimensional spaces, and require some type of dimensionality reduction to visualize or analyzethedata. Inthispaper,weproposeanovelframeworkformulti- scaledimensionalityreductionbasedondifiusionwavelets. Ourapproach is completely data driven, computationally e-cient, and able to directly process no...
Conference Paper
Full-text available
We describe a novel framework developed for transfer learning within reinforcement learning (RL) problems. Then we exhibit how this framework can be extended to intelligent tutoring systems (ITS). We compose an algorithm that au- tomatically constructs a graphical representation based on the transfer framework. We evaluate this on a real-world ITS...
Article
This paper describes a novel machine learning framework for solving sequential decision problems called Markov decision processes (MDPs) by iteratively computing low-dimensional representations and approximately optimal policies. A unified mathematical framework for learning representation and optimal control in MDPs is presented based on a class o...
Conference Paper
Full-text available
We address the problem of finding a multiscale embedding of documents from a given corpus. Our approach is based on a recently introduced multiscale matrix analysis frame- work called diffusion wavelets. Diffusion wavelets construct the basis functions at each level of the hierarchy from a set of orthogonal basis functions, typically the unit-vecto...
Article
Full-text available
Manifold alignment has been found to be useful in many fields of machine learning and data mining. In this paper we summarize our work in this area and introduce a general framework for manifold alignment. This framework gener-ates a family of approaches to align manifolds by simulta-neously matching the corresponding instances and preserv-ing the...
Conference Paper
Full-text available
Manifold alignment has been found to be useful in many areas of machine learning and data min- ing. In this paper we introduce a novel mani- fold alignment approach, which differs from ìsemi- supervised alignmentî and ìProcrustes alignmentî in that it does not require predetermining corre- spondences. Our approach learns a projection that maps data...
Article
Full-text available
In this paper we introduce proto-transfer leaning, a new framework for transfer learn-ing. We explore solutions to transfer learning within reinforcement learning through the use of spectral methods. Proto-value func-tions (PVFs) are basis functions computed from a spectral analysis of random walks on the state space graph. They naturally lead to t...
Conference Paper
Full-text available
In this paper we introduce a novel approach to manifold alignment, based on Procrustes analysis. Our approach difiers from \semi- supervised alignment" in that it results in a mapping that is deflned everywhere { when used with a suitable dimensionality reduction method { rather than just on the training data points. We describe and evaluate our ap...
Book
Representations are at the heart of artificial intelligence (AI). This book is devoted to the problem of representation discovery: how can an intelligent system construct representations from its experience? Representation discovery re-parameterizes the state space - prior to the application of information retrieval, machine learning, or optimizati...
Conference Paper
The core computational step in spectral learning - find- ing the projection of a function onto the eigenspace of a symmetric operator, such as a graph Laplacian - gen- erally incurs a cubic computational complexity O(N3). This paper describes the use of Lanczos eigenspace pro- jections for accelerating spectral projections, which re- duces the comp...
Article
Full-text available
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space de ned by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average-reward HRL algorithms for nding hierarchically optimal p...
Article
Full-text available
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis func- tions by diagonalizing symmetric diffusion operators (ii)...
Conference Paper
This paper introduces a new approach to action- value function approximation by learning basis functions from a spectral decomposition of the state-action manifold. This paper extends pre- vious work on using Laplacian bases for value function approximation by using the actions of the agent as part of the representation when creating basis function...
Conference Paper
Full-text available
A new spectral approach to value function approxima- tion has recently been proposed to automatically con- struct basis functions from samples. Global basis func- tions called proto-value functions are generated by di- agonalizing a diffusion operator, such as a reversible random walk or the Laplacian, on a graph formed from connecting nearby sampl...
Conference Paper
Full-text available
We evaluated the impact of a set of interventions to repair students' disengagement while solving geometry problems in a tutoring system. We present a deep analysis of how a tutor can remediate a student's disengagement and motivation with self-monitoring feedback. The analysis consists of a between-subjects analyses on students learning and on stu...
Conference Paper
Full-text available
Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with re- spect to the state space geometry. This paper explores the properties of bases created fro...
Conference Paper
This paper investigates compression of 3D objects in computer graphics using mani- fold learning. Spectral compression uses the eigenvectors of the graph Laplacian of an ob- ject's topology to adaptively compress 3D ob- jects. 3D compression is a challenging ap- plication domain: object models can have > 105 vertices, and reliably computing the bas...
Conference Paper
Full-text available
This paper summarizes research on a new emerging framework for learning to plan using the Markov de- cision process model (MDP). In this paradigm, two approaches to learning to plan have traditionally been studied: the indirect model-based approach infers the state transition matrix and reward function from sam- ples, and then solves the Bellman eq...
Article
Full-text available
In this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how...
Article
Full-text available
We present a novel hierarchical framework for solving Markov decision processes (MDPs) using a multiscale method called diffusion wavelets. Diffusion wavelet bases significantly differ from the Laplacian eigenfunctions studied in the companion paper (Mahadevan and Maggioni, 2006): the basis functions have compact support, and are inherently multi-s...
Conference Paper
Full-text available
Item Response Theory (IRT) models were investigated as a tool for student modeling in an intelligent t utoring system (ITS). The models were tested using real data of high school students using the Wayang Outpost, a computer-based tutor for the mathematics portion o f the Scholastic Aptitude Test (SAT). A cross-validation framework was developed an...
Conference Paper
Full-text available
This paper describes research to analyze students' initial skill level and to predict their hidden characteristics while working with an intelligent tutor. Based only on pre-test problems, a learned network was able to evaluate a students mastery of twelve geometry skills. This model will be used online by an Intelligent Tutoring System to dynamica...
Conference Paper
Full-text available
Policy evaluation is a critical step in the approximate solution of large Markov deci- sion processes (MDPs), typically requiring O(|S|3) to directly solve the Bellman sys- tem of |S| linear equations (where |S| is the state space size in the discrete case, and the sample size in the continuous case). In this paper we apply a recently introduced mu...
Conference Paper
Full-text available
This paper presents a novel framework for simultane- ously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the un- derlying representation or basis functions are automati- cally derived from a spectral analysis of the state space manifold. The proto...
Article
Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may n...
Conference Paper
Most work on value function approximation adheres to Samuel's original design: agents learn a task-specific value function using parameter estimation, where the approximation architecture (e.g, polynomials) is speci- fied by a human designer. This paper proposes a novel framework generalizing Samuel's paradigm using a coordinate-free approach to va...
Conference Paper
This paper presents a novel framework called proto-reinforcement learning (PRL), based on a mathematical model of a proto-value function: these are task-independent basis functions that form the building blocks of all value functions on a given state space manifold. Proto-value functions are learned not from rewards, but instead from analyz- ing th...
Conference Paper
Full-text available
We present a fast algorithm for learning the parameters of the abstract hidden Markov model, a type of hierarchical activ- ity recognition model. Learning using exact inference scales poorly as the number of levels in the hierarchy increases; therefore, an approximation is required for large models. We demonstrate that variational inference is well...
Conference Paper
Full-text available
We investigate the problem of automatically constructing efficient rep- resentations or basis functions for approximating value fu nctions based on analyzing the structure and topology of the state space. I n particu- lar, two novel approaches to value function approximation are explored based on automatically constructing basis functions on state...
Conference Paper
We study an approach for performing concur- rent activities in Markov decision processes (MDPs) based on the coarticulation frame- work. We assume that the agent has multiple degrees of freedom (DOF) in the action space which enables it to perform activities simul- taneously. We demonstrate that one natural way for generating concurrency in the sys...
Article
This paper investigates learning hierarchical statistical activity models in indoor environments. The Abstract Hidden Markov Model (AHMM) is used to represent behaviors in stochastic environments. We train the model using both labeled and unlabeled data and estimate the parameters using Expectation Maximization (EM). Results are shown on three data...
Article
Full-text available
In this paper, we investigate the use of hierarchical rein-forcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent RL framework, and present a hierarchical multi-agent RL algorithm called Cooperative HRL. The fundamen-tal property of our approach is that the use of hierarchy al-...