Stephane Ross's research while affiliated with Carnegie Mellon University and other places

Publications (29)

Article
Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require n...
Article
We improve "learning to search" approaches to structured prediction in two ways. First, we show that the search space can be defined by an arbitrary imperative program, reducing the number of lines of code required to develop new structured prediction tasks by orders of magnitude. Second, we make structured prediction orders of magnitude faster thr...
Article
Full-text available
We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maxim...
Conference Paper
Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such...
Article
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.
Article
Full-text available
Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such...
Conference Paper
We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maxim...
Article
Full-text available
Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly straight-forward, as expensive sensors and monitoring devices can be employed. In contrast, obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike large vehicles, MAVs can only carry very...
Article
Model-based Bayesian reinforcement learning has generated significant interest in the AI community as it provides an elegant solution to the optimal exploration-exploitation tradeoff in classical reinforcement learning. Unfortunately, the applicability of this type of approach has been limited to small domains due to the high complexity of reasonin...
Article
A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case wh...
Article
Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). Such conditions have recently been shown to be more powerful to characterize learnability in the general learning setting under i.i.d. samples where unifor...
Conference Paper
Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learne...
Article
Full-text available
Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of...
Article
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat uns...
Article
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in both theory and often in practice. Some recent approaches provide stronger performance guarantees in this setting, but re...
Conference Paper
Full-text available
Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle real-world sequential decision processes but require a known model to be solved by most approaches. However, mainstream POMDP research focuses on the discrete case and this complicates its application to most realistic problems that are naturally mod...
Article
Full-text available
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good lo...
Conference Paper
Full-text available
We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially observable Markov decision processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be solved by most approaches. This is a limitation i...
Article
Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify...
Conference Paper
Full-text available
Bayesian Reinforcement Learning has generated substan- tial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learn- ing. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Pro- cesses (MDPs). Our goal is to extend these ideas to...
Conference Paper
Full-text available
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the m...
Conference Paper
Full-text available
Solving large Partially Observable Markov Deci- sion Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even state- of-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in on- line POMDP sea...
Conference Paper
Full-text available
So far, most equilibrium concepts in game theory require that the rewards and actions of the other agents are known and/or ob- served by all agents. However, in real life problems, agents are generally faced with situations where they only have partial or no knowledge about their environment and the other agents evolving in it. In this context, all...
Article
Full-text available
When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov de- cision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline approach, learning a complete pol- icy before the...
Article
Partially Observable Markov Decision Pro-cesses (POMDPs) provide a rich mathemat-ical framework for planning under uncer-tainty. However, most real world systems are modelled by huge POMDPs that can-not be solved due to their high complex-ity. To palliate to this difficulty, we pro-pose combining existing offline approaches with an online search pr...
Article
Most of the POMDP litterature as focused on developping new approximate algorithms to solve ever larger POMDPs, under the general assumption that the POMDP model is known a priori. In prac-tice, however this is rarely the case. For instance, robot navigation prob-lems generally require that the parameters of the POMDP be well tuned to the robot's s...
Article
Full-text available
We describe a step towards model-based Bayesian reinforcement learning in continuous state spaces. Model-based Bayesian RL provides an elegant way of incorporating model uncertainty for trading off between exploration and exploitation. Yet, significant work remains to be done for extending model-based Bayesian RL to continuous state spaces; in this...
Article
Full-text available
In this paper, we describe DAMAS-Rescue, a team of agents participating in the RoboCupRescue simulation competition. In the fol-lowing, we explain the strategies of all our agents that will be used at the world competition in 2006 at Bremen in Germany. In short, FireBrigade agents are choosing the best fire to extinguish based on the knowledge they...

Citations

... For more complex fuels such as gasoline, diesel, or JP10, we might have too many potential products to keep track of each individually, making it important to group by chemical functionality. This will be a fruitful area to apply machine-learning algorithms to group species appropriately and to refine the analyses automatically to ensure good statistics (36)(37)(38)(39). As computers get faster and cheaper, these in silico methods should be applicable to wide ranges of reactive systems. ...
... Inference machines reduce the problem of learning graphical models to solving a set of classification or regression problems, where the learned classifiers mimic message passing procedures that output marginal distributions for the nodes in the model. In other words, instead of parameterizing graphical models using, for example, potential functions, Inference Machines directly learn the operators (e.g., a classifier such as logistic regression) that map the incoming messages and the local features of a node to the outgoing message (Langford et al., 2009; Ross et al., 2011b; Bagnell et al., 2010). However, Inference Machines cannot be applied to learning latent state space models since we do not have access to hidden states' information. ...
... Imitation Learning for Navigation. Imitation Learning (IL) as learning paradigm has been researched extensively in recent years [15,16,17]. Traditional approaches for navigation in unknown environments usually utilize the greedy search with Euclidean heuristic or Manhattan heuristic, which are in-efficient in detecting and escaping local minima [18]. ...
... This problem leads us a knapsack problem-like formulation. This type of submodular optimization has been studied by many researchers (Streeter and Golovin, 2009;Zhou et al., 2013). Our method will be able to be extended in the similar way. ...
... For a large problem, many point-based approximate algorithms have limited quality [29]. An alternative approach is to determine only the best action instead of the optimal policy for the current belief in each time stage. ...
... Parameter free versions of static regret minimizing algorithms have been proposed in the works of Orabona and Pál (2016); Cutkosky and Orabona (2018). Ross et al. (2013); Luo et al. (2016) propose algorithms to control the static regret for competing against bounded linear predictors. ...
... The proposed end-to-end learning method belongs to the first kind, i.e., behavioral cloning. Imitation learning has also seen applications in many other fields, such as speech animation [28], improving over the teacher [29], structured prediction [30], safe learning for autonomous driving [31], learning with multiple objectives [32], one shot learning and meta learning [33,34], multi-agent learning [35,36], multi-modal learning [37], and hierarchical learning [38]. ...
... In 2013, Ross et al., in their paper [12] presented Learning monocular reactive UAV control in cluttered natural environments, in which the task is the collision-free flight in the forest. The network input was imaging from a forward-facing camera, network output was desired literal speed, and the training methodology was supervised learning with recorded data from a human pilot. ...
... On the other hand, factored reinforcement learning can exploit independence present in the environment and generalize past experiences to new states, which allows the agent to reduce the number of non-optimal actions it takes (Ross and Pineau 2008;Degris, Sigaud, and Wuillemin 2006;Strehl, Diuk, and Littman 2007). However, such algorithms have been proposed for RL settings that ignore safety and in which an agent can explore freely. ...