# Stephane Ross's research while affiliated with Carnegie Mellon University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (29)

Recent work has demonstrated that problems-- particularly imitation learning
and structured prediction-- where a learner's predictions influence the
input-distribution it is tested on can be naturally addressed by an interactive
approach and analyzed using no-regret online learning. These approaches to
imitation learning, however, neither require n...

We improve "learning to search" approaches to structured prediction in two
ways. First, we show that the search space can be defined by an arbitrary
imperative program, reducing the number of lines of code required to develop
new structured prediction tasks by orders of magnitude. Second, we make
structured prediction orders of magnitude faster thr...

We study the problem of predicting a set or list of options under knapsack
constraint. The quality of such lists are evaluated by a submodular reward
function that measures both quality and diversity. Similar to DAgger (Ross et
al., 2010), by a reduction to online learning, we show how to adapt two
sequence prediction models to imitate greedy maxim...

Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such...

We introduce online learning algorithms which are independent of feature
scales, proving regret bounds dependent on the ratio of scales existent in the
data rather than the absolute scale. This has several useful effects: there is
no need to pre-normalize data, the test-time and test-space complexity are
reduced, and the algorithms are more robust.

Many prediction domains, such as ad placement, recommendation, trajectory
prediction, and document summarization, require predicting a set or list of
options. Such lists are often evaluated using submodular reward functions that
measure both quality and diversity. We propose a simple, efficient, and
provably near-optimal approach to optimizing such...

We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maxim...

Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly
straight-forward, as expensive sensors and monitoring devices can be employed.
In contrast, obstacle avoidance remains a challenging task for Micro Aerial
Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike
large vehicles, MAVs can only carry very...

Model-based Bayesian reinforcement learning has generated significant
interest in the AI community as it provides an elegant solution to the optimal
exploration-exploitation tradeoff in classical reinforcement learning.
Unfortunately, the applicability of this type of approach has been limited to
small domains due to the high complexity of reasonin...

A fundamental problem in control is to learn a model of a system from
observations that is useful for controller synthesis. To provide good
performance guarantees, existing methods must assume that the real system is in
the class of models considered during learning. We present an iterative method
with strong guarantees even in the agnostic case wh...

Stability is a general notion that quantifies the sensitivity of a learning
algorithm's output to small change in the training dataset (e.g. deletion or
replacement of a single training sample). Such conditions have recently been
shown to be more powerful to characterize learnability in the general learning
setting under i.i.d. samples where unifor...

Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learne...

Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of...

Sequential prediction problems such as imitation learning, where future
observations depend on previous predictions (actions), violate the common
i.i.d. assumptions made in statistical learning. This leads to poor performance
in theory and often in practice. Some recent approaches provide stronger
guarantees in this setting, but remain somewhat uns...

Sequential prediction problems such as imitation learning, where future
observations depend on previous predictions (actions), violate the common
i.i.d. assumptions made in statistical learning. This leads to poor performance
in both theory and often in practice. Some recent approaches provide stronger
performance guarantees in this setting, but re...

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle real-world sequential decision processes but require a known model to be solved by most approaches. However, mainstream POMDP research focuses on the discrete case and this complicates its application to most realistic problems that are naturally mod...

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good lo...

We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially observable Markov decision processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be solved by most approaches. This is a limitation i...

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify...

Bayesian Reinforcement Learning has generated substan- tial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learn- ing. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Pro- cesses (MDPs). Our goal is to extend these ideas to...

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the m...

Solving large Partially Observable Markov Deci- sion Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even state- of-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in on- line POMDP sea...

So far, most equilibrium concepts in game theory require that the rewards and actions of the other agents are known and/or ob- served by all agents. However, in real life problems, agents are generally faced with situations where they only have partial or no knowledge about their environment and the other agents evolving in it. In this context, all...

When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov de- cision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline approach, learning a complete pol- icy before the...

Partially Observable Markov Decision Pro-cesses (POMDPs) provide a rich mathemat-ical framework for planning under uncer-tainty. However, most real world systems are modelled by huge POMDPs that can-not be solved due to their high complex-ity. To palliate to this difficulty, we pro-pose combining existing offline approaches with an online search pr...

Most of the POMDP litterature as focused on developping new approximate algorithms to solve ever larger POMDPs, under the general assumption that the POMDP model is known a priori. In prac-tice, however this is rarely the case. For instance, robot navigation prob-lems generally require that the parameters of the POMDP be well tuned to the robot's s...

We describe a step towards model-based Bayesian reinforcement learning in continuous state spaces. Model-based Bayesian RL provides an elegant way of incorporating model uncertainty for trading off between exploration and exploitation. Yet, significant work remains to be done for extending model-based Bayesian RL to continuous state spaces; in this...

In this paper, we describe DAMAS-Rescue, a team of agents participating in the RoboCupRescue simulation competition. In the fol-lowing, we explain the strategies of all our agents that will be used at the world competition in 2006 at Bremen in Germany. In short, FireBrigade agents are choosing the best fire to extinguish based on the knowledge they...

## Citations

... For more complex fuels such as gasoline, diesel, or JP10, we might have too many potential products to keep track of each individually, making it important to group by chemical functionality. This will be a fruitful area to apply machine-learning algorithms to group species appropriately and to refine the analyses automatically to ensure good statistics (36)(37)(38)(39). As computers get faster and cheaper, these in silico methods should be applicable to wide ranges of reactive systems. ...

... Inference machines reduce the problem of learning graphical models to solving a set of classification or regression problems, where the learned classifiers mimic message passing procedures that output marginal distributions for the nodes in the model. In other words, instead of parameterizing graphical models using, for example, potential functions, Inference Machines directly learn the operators (e.g., a classifier such as logistic regression) that map the incoming messages and the local features of a node to the outgoing message (Langford et al., 2009; Ross et al., 2011b; Bagnell et al., 2010). However, Inference Machines cannot be applied to learning latent state space models since we do not have access to hidden states' information. ...

... Imitation Learning for Navigation. Imitation Learning (IL) as learning paradigm has been researched extensively in recent years [15,16,17]. Traditional approaches for navigation in unknown environments usually utilize the greedy search with Euclidean heuristic or Manhattan heuristic, which are in-efficient in detecting and escaping local minima [18]. ...

... as well (Daumé III et al., 2014). ...

... This problem leads us a knapsack problem-like formulation. This type of submodular optimization has been studied by many researchers (Streeter and Golovin, 2009;Zhou et al., 2013). Our method will be able to be extended in the similar way. ...

... For a large problem, many point-based approximate algorithms have limited quality [29]. An alternative approach is to determine only the best action instead of the optimal policy for the current belief in each time stage. ...

... Parameter free versions of static regret minimizing algorithms have been proposed in the works of Orabona and Pál (2016); Cutkosky and Orabona (2018). Ross et al. (2013); Luo et al. (2016) propose algorithms to control the static regret for competing against bounded linear predictors. ...

... The proposed end-to-end learning method belongs to the first kind, i.e., behavioral cloning. Imitation learning has also seen applications in many other fields, such as speech animation [28], improving over the teacher [29], structured prediction [30], safe learning for autonomous driving [31], learning with multiple objectives [32], one shot learning and meta learning [33,34], multi-agent learning [35,36], multi-modal learning [37], and hierarchical learning [38]. ...

... In 2013, Ross et al., in their paper [12] presented Learning monocular reactive UAV control in cluttered natural environments, in which the task is the collision-free flight in the forest. The network input was imaging from a forward-facing camera, network output was desired literal speed, and the training methodology was supervised learning with recorded data from a human pilot. ...

... On the other hand, factored reinforcement learning can exploit independence present in the environment and generalize past experiences to new states, which allows the agent to reduce the number of non-optimal actions it takes (Ross and Pineau 2008;Degris, Sigaud, and Wuillemin 2006;Strehl, Diuk, and Littman 2007). However, such algorithms have been proposed for RL settings that ignore safety and in which an agent can explore freely. ...