Stuart Russell

University of California, Berkeley, Berkeley, California, United States

Are you Stuart Russell?

Claim your profile

Publications (136)28.56 Total impact

  • N. S. Arora, S. Russell, E. Sudderth
    [Show abstract] [Hide abstract]
    ABSTRACT: The automated processing of multiple seismic signals to detect and localize seismic events is a central tool in both geophysics and nuclear treaty verification. This paper reports on a project, begun in 2009, to reformulate this problem in a Bayesian framework. A Bayesian seismic monitoring system, NET-VISA, has been built comprising a spatial event prior and generative models of event transmission and detection, as well as an inference algorithm. The probabilistic model allows for seamless integration of various disparate sources of information. Applied in the context of the International Monitoring System (IMS), a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT), NET-VISA achieves a reduction of around 60% in the number of missed events compared to the currently deployed system. It also finds events that are missed by the human analysts who post-process the IMS output.
    Bulletin of the Seismological Society of America 04/2013; 103(2A):709-729. · 1.94 Impact Factor
  • Source
    Keiji Kanazawa, Daphne Koller, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: Stochastic simulation algorithms such as likelihood weighting often give fast, accurate approximations to posterior probabilities in probabilistic networks, and are the methods of choice for very large networks. Unfortunately, the special characteristics of dynamic probabilistic networks (DPNs), which are used to represent stochastic temporal processes, mean that standard simulation algorithms perform very poorly. In essence, the simulation trials diverge further and further from reality as the process is observed over time. In this paper, we present simulation algorithms that use the evidence observed at each time step to push the set of trials back towards reality. The first algorithm, ?evidence reversal? (ER) restructures each time slice of the DPN so that the evidence nodes for the slice become ancestors of the state variables. The second algorithm, called ?survival of the fittestz? sampling (SOF), ?repopulates? the set of trials at each time step using a stochastic reproduction rate weighted by the likelihood of the evidence according to each trial. We compare the performance of each algorithm with likelihood weighting on the original network, and also investigate the benefits of combining the ER and SOF methods. The ER/SOF combination appears to maintain bounded error independent of the number of time steps in the simulation.
    02/2013;
  • Source
    Nir Friedman, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: "Background subtraction" is an old technique for finding moving objects in a video sequence for example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image; it fails with slow-moving objects and does not distinguish shadows from moving objects. The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. We learn a mixture-of-Gaussians classification model for each pixel using an unsupervised technique- an efficient, incremental version of EM. Unlike the standard image-averaging approach, this automatically updates the mixture component for each class according to likelihood of membership; hence slow-moving objects are handled perfectly. Our approach also identifies and eliminates shadows much more effectively than other techniques such as thresholding. Application of this method as part of the Roadwatch traffic surveillance project is expected to result in significant improvements in vehicle identification and tracking.
    02/2013;
  • Source
    Nir Friedman, Kevin Murphy, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic probabilistic networks are a compact representation of complex stochastic processes. In this paper we examine how to learn the structure of a DPN from data. We extend structure scoring rules for standard probabilistic networks to the dynamic case, and show how to search for structure when some of the variables are hidden. Finally, we examine two applications where such a technology might be useful: predicting and classifying dynamic behaviors, and learning causal orderings in biological processes. We provide empirical results that demonstrate the applicability of our methods in both domains.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as "condensation", "sequential Monte Carlo" and "survival of the fittest". In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new class of learning algorithms that combines variational approximation and Markov chain Monte Carlo (MCMC) simulation. Naive algorithms that use the variational approximation as proposal distribution can perform poorly because this approximation tends to underestimate the true variance and other features of the data. We solve this problem by introducing more sophisticated MCMC algorithms. One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution. The MH kernel allows one to locate regions of high probability efficiently. The Metropolis kernel allows us to explore the vicinity of these regions. This algorithm outperforms variationalapproximations because it yields slightly better estimates of the mean and considerably better estimates of higher moments, such as covariances. It also outperforms standard MCMC algorithms because it locates theregions of high probability quickly, thus speeding up convergence. We demonstrate this algorithm on the problem of Bayesian parameter estimation for logistic (sigmoid) belief networks.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Filtering---estimating the state of a partially observable Markov process from a sequence of observations---is one of the most widely studied problems in control theory, AI, and computational statistics. Exact computation of the posterior distribution is generally intractable for large discrete systems and for nonlinear continuous systems, so a good deal of effort has gone into developing robust approximation algorithms. This paper describes a simple stochastic approximation algorithm for filtering called {em decayed MCMC}. The algorithm applies Markov chain Monte Carlo sampling to the space of state trajectories using a proposal distribution that favours flips of more recent state variables. The formal analysis of the algorithm involves a generalization of standard coupling arguments for MCMC convergence. We prove that for any ergodic underlying Markov process, the convergence time of decayed MCMC with inverse-polynomial decay remains bounded as the length of the observation sequence grows. We show experimentally that decayed MCMC is at least competitive with other approximation algorithms such as particle filtering.
    12/2012;
  • Source
    Eric P. Xing, Michael I. Jordan, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: The mean field methods, which entail approximating intractable probability distributions variationally with distributions from a tractable family, enjoy high efficiency, guaranteed convergence, and provide lower bounds on the true likelihood. But due to requirement for model-specific derivation of the optimization equations and unclear inference quality in various models, it is not widely used as a generic approximate inference algorithm. In this paper, we discuss a generalized mean field theory on variational approximation to a broad class of intractable distributions using a rich set of tractable distributions via constrained optimization over distribution spaces. We present a class of generalized mean field (GMF) algorithms for approximate inference in complex exponential family models, which entails limiting the optimization over the class of cluster-factorizable distributions. GMF is a generic method requiring no model-specific derivations. It factors a complex model into a set of disjoint variable clusters, and uses a set of canonical fix-point equations to iteratively update the cluster distributions, and converge to locally optimal cluster marginals that preserve the original dependency structure within each cluster, hence, fully decomposed the overall inference problem. We empirically analyzed the effect of different tractable family (clusters of different granularity) on inference quality, and compared GMF with BP on several canonical models. Possible extension to higher-order MF approximation is also discussed.
    10/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.
    07/2012;
  • Source
    Eric P. Xing, Michael I. Jordan, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: An autonomous variational inference algorithm for arbitrary graphical models requires the ability to optimize variational approximations over the space of model parameters as well as over the choice of tractable families used for the variational approximation. In this paper, we present a novel combination of graph partitioning algorithms with a generalized mean field (GMF) inference algorithm. This combination optimizes over disjoint clustering of variables and performs inference using those clusters. We provide a formal analysis of the relationship between the graph cut and the GMF approximation, and explore several graph partition strategies empirically. Our empirical results provide rather clear support for a weighted version of MinCut as a useful clustering algorithm for GMF inference, which is consistent with the implications from the formal analysis.
    07/2012;
  • Source
    Bhaskara Marthi, Stuart Russell, David Andre
    [Show abstract] [Hide abstract]
    ABSTRACT: Previous work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much more concise representations of exit state distributions, leading to further state abstraction. In essence, the only variables whose exit values need be considered are those that the parent cares about and the child affects. We demonstrate the utility of our algorithms on a series of increasingly complex environments.
    06/2012;
  • Source
    Brian Milch, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: Tasks such as record linkage and multi-target tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a general-purpose probabilistic modeling language (such as BLOG) and a generic Metropolis-Hastings MCMC algorithm that supports user-supplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a context-specific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our general-purpose MCMC engine compares favorably with an application-specific system.
    06/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Languages for open-universe probabilistic models (OUPMs) can represent situations with an unknown number of objects and iden- tity uncertainty. While such cases arise in a wide range of important real-world appli- cations, existing general purpose inference methods for OUPMs are far less efficient than those available for more restricted lan- guages and model classes. This paper goes some way to remedying this deficit by in- troducing, and proving correct, a generaliza- tion of Gibbs sampling to partial worlds with possibly varying model structure. Our ap- proach draws on and extends previous generic OUPM inference methods, as well as aux- iliary variable samplers for nonparametric mixture models. It has been implemented for BLOG, a well-known OUPM language. Combined with compile-time optimizations, the resulting algorithm yields very substan- tial speedups over existing methods on sev- eral test cases, and substantially improves the practicality of OUPM languages generally.
    03/2012;
  • Source
    Shaunak Chatterjee, Stuart Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: Hierarchical problem abstraction, when applicable, may offer exponential reductions in computational complexity. Previous work on coarse-to-fine dynamic programming (CFDP) has demonstrated this possibility using state abstraction to speed up the Viterbi algorithm. In this paper, we show how to apply temporal abstraction to the Viterbi problem. Our algorithm uses bounds derived from analysis of coarse timescales to prune large parts of the state trellis at finer timescales. We demonstrate improvements of several orders of magnitude over the standard Viterbi algorithm, as well as significant speedups over CFDP, for problems whose state variables evolve at widely differing rates.
    CoRR. 01/2012; abs/1202.3707.
  • N. S. Arora, T. Dear, S. Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a probabilistic generative model for seismic events, their transmission through the earth, and their detection (or mis-detection) at seismic stations. We also describe an inference algorithm that constructs the most probable event bulletin explaining the observed set of detections. The model and inference are called NET-VISA (network processing vertically integrated seismic analysis) and is designed to replace the current automated network processing at the IDC, the SEL3 bulletin. Our results (attached table) demonstrate that NET-VISA significantly outperforms SEL3 by reducing the missed events from 30.3% down to 12.5%. The difference is even more dramatic for smaller magnitude events. NET-VISA has no difficulty in locating nuclear explosions as well. The attached figure demonstrates the location predicted by NET-VISA versus other bulletins for the second DPRK event. Further evaluation on dense regional networks demonstrates that NET-VISA finds many events missed in the LEB bulletin, which is produced by the human analysts. Large aftershock sequences, as produced by the 2004 December Sumatra earthquake and the 2011 March Tohoku earthquake, can pose a significant load for automated processing, often delaying the IDC bulletins by weeks or months. Indeed these sequences can overload the serial NET-VISA inference as well. We describe an enhancement to NET-VISA to make it multi-threaded, and hence take full advantage of the processing power of multi-core and -cpu machines. Our experiments show that the new inference algorithm is able to achieve 80% efficiency in parallel speedup.
    AGU Fall Meeting Abstracts. 12/2011;
  • Source
    Jason Wolfe, Stuart J. Russell
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for solving unary SAS+ planning problems. This approach extends an SAS+ instance with new state variables representing intentions about how each original state variable will be used or changed next, and splits the original actions into several stages of intention followed by eventual execution. The result is a new SAS+ instance with the same basic solutions as the original. While the transformed problem is larger, it has additional structure that can be exploited to reduce the branching factor, leading to reachable state spaces that are many orders of magnitude smaller (and hence much faster planning) in several test domains with acyclic causal graphs.
    IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011; 01/2011
  • Source
    Shaunak Chatterjee, Stuart Russell
    UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14-17, 2011; 01/2011
  • Source
    Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011; 01/2011
  • Source
    Evaluation. 01/2011;
  • BIOSIGNALS 2010 - Proceedings of the Third International Conference on Bio-inspired Systems and Signal Processing, Valencia, Spain, January 20-23, 2010; 01/2010

Publication Stats

7k Citations
28.56 Total Impact Points

Institutions

  • 1970–2013
    • University of California, Berkeley
      • Computer Science Division
      Berkeley, California, United States
  • 2009
    • Brown University
      Providence, Rhode Island, United States
  • 1998
    • Middlebury College
      Middlebury, Indiana, United States