Sebastian Thrun

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Are you Sebastian Thrun?

Claim your profile

Publications (344)87.47 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a Time-of-Flight (ToF) camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology, they bear potential for economical production in big volumes. Our easy-to-use, cost-effective scanning solution, which is based on such a sensor, could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a nontrivial systematic bias. In this paper, we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics.
    IEEE Transactions on Software Engineering 05/2013; 35(5):1039-50. · 2.59 Impact Factor
  • Source
    Dimitris Margaritis, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
    12/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles.
    10/2012;
  • Source
    Joelle Pineau, Geoffrey Gordon, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems.
    10/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tracking human pose in real-time is a difficult problem with many interesting applications. Existing solutions suffer from a variety of problems, especially when confronted with unusual human poses. In this paper, we derive an algorithm for tracking human pose in real-time from depth sequences based on MAP inference in a probabilistic temporal model. The key idea is to extend the iterative closest points (ICP) objective by modeling the constraint that the observed subject cannot enter free space, the area of space in front of the true range measurements. Our primary contribution is an extension to the articulated ICP algorithm that can efficiently enforce this constraint. The resulting filter runs at 125 frames per second using a single desktop CPU core. We provide extensive experimental results on challenging real-world data, which show that the algorithm outperforms the previous state-of-the-art trackers both in computational efficiency and accuracy.
    Proceedings of the 12th European conference on Computer Vision - Volume Part VI; 10/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers allthe meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a fiexing arm where significant non-rigid deformation was present.
    07/2012;
  • Source
    David Stavens, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a machine learning approach for estimating the second derivative of a drivable surface, its roughness. Robot perception generally focuses on the first derivative, obstacle detection. However, the second derivative is also important due to its direct relation (with speed) to the shock the vehicle experiences. Knowing the second derivative allows a vehicle to slow down in advance of rough terrain. Estimating the second derivative is challenging due to uncertainty. For example, at range, laser readings may be so sparse that significant information about the surface is missing. Also, a high degree of precision is required in projecting laser readings. This precision may be unavailable due to latency or error in the pose estimation. We model these sources of error as a multivariate polynomial. Its coefficients are learned using the shock data as ground truth -- the accelerometers are used to train the lasers. The resulting classifier operates on individual laser readings from a road surface described by a 3D point cloud. The classifier identifies sections of road where the second derivative is likely to be large. Thus, the vehicle can slow down in advance, reducing the shock it experiences. The algorithm is an evolution of one we used in the 2005 DARPA Grand Challenge. We analyze it using data from that route.
    06/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    International Conference on Robotics and Automation, Saint Paul, MN; 05/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012;
  • David Held, Jesse Levinson, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting cars in real-world images is an important task for autonomous driving, yet it remains unsolved. The system described in this paper takes advantage of context and scale to build a monocular single-frame image-based car detector that significantly outperforms the baseline. The system uses a probabilistic model to combine multiple forms of evidence for both context and scale to locate cars in a real-world image. We also use scale filtering to speed up our algorithm by a factor of 3.3 compared to the baseline. By using a calibrated camera and localization on a road map, we are able to obtain context and scale information from a single image without the use of a 3D laser. The system outperforms the baseline by an absolute 9.4% in overall average precision and 11.7% in average precision for cars smaller than 50 pixels in height, for which context and scale cues are especially important.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012;
  • Wolfram Burgard, Dieter Fox, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the ultimate goals of the field of artificial intelligence and robotics is to develop systems that assist us in our everyday lives by autonomously carrying out a variety of different tasks. To achieve this and to generate appropriate actions, such systems need to be able to accurately interpret their sensory input and estimate their state or the state of the environment to be successful. In recent years, probabilistic approaches have emerged as a key technology for these problems. In this article, we will describe state-of-the-art solutions to challenging tasks from the area of mobile robotics, autonomous cars, and activity recognition, which are all based on the paradigm of probabilistic state estimation.
    Informatik Spektrum 01/2011; 34:455-461.
  • Source
    Alex Teichman, Jesse Levinson, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Object recognition is a critical next step for autonomous robots, but a solution to the problem has remained elusive. Prior 3D-sensor-based work largely classifies individual point cloud segments or uses class-specific trackers. In this paper, we take the approach of classifying the tracks of all visible objects. Our new track classification method, based on a mathematically principled method of combining log odds estimators, is fast enough for real time use, is non-specific to object class, and performs well (98.5% accuracy) on the task of classifying correctly-tracked, well-segmented objects into car, pedestrian, bicyclist, and background classes. We evaluate the classifier's performance using the Stanford Track Collection, a new dataset of about 1.3 million labeled point clouds in about 14,000 tracks recorded from an au- tonomous vehicle research platform. This dataset, which we make publicly available, contains tracks extracted from about one hour of 360-degree, 10Hz depth information recorded both while driving on busy campus streets and parked at busy intersections.
    IEEE International Conference on Robotics and Automation, ICRA 2011, Shanghai, China, 9-13 May 2011; 01/2011
  • Alex Teichman, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider a semi-supervised approach to the problem of track classification in dense three-dimensional range data. This problem involves the classification of objects that have been segmented and tracked without the use of a class-specific tracker. This paper is an extended version of our previous work.
    Robotics: Science and Systems VII, University of Southern California, Los Angeles, CA, USA, June 27-30, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture animations in real time based on acoustic features in the user’s speech. The controller consists of an inference layer, which infers a distribution over a set of hidden states from the speech signal, and a control layer, which selects the optimal motion based on the inferred state distribution. The inference layer, consisting of a specialized conditional random field, learns the hidden structure in body language style and associates it with acoustic features in speech. The control layer uses reinforcement learning to construct an optimal policy for selecting motion clips from a distribution over the learned hidden states. The modularity of the proposed method allows customization of a character’s gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.
    ACM Transactions on Graphics 07/2010; 29(4). · 3.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a time-of-flight camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology they bear potential for low cost production in big volumes. Our easy-to-use, cost-effective scanning solution based on such a sensor could make D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a non-trivial systematic bias. In this paper we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics.
    The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010; 01/2010
  • Source
    ACM Transactions on Graphics 01/2010; 29:139. · 3.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new performance capture approach that incorporates a physically-based cloth model to reconstruct a rigged fullyanimatable virtual double of a real person in loose apparel from multi-view video recordings. Our algorithm only requires a minimum of manual interaction. Without the use of optical markers in the scene, our algorithm first reconstructs skeleton motion and detailed time-varying surface geometry of a real person from a reference video sequence. These captured reference performance data are then analyzed to automatically identify non-rigidly deforming pieces of apparel on the animated geometry. For each piece of apparel, parameters of a physically-based real-time cloth simulation model are estimated, and surface geometry of occluded body regions is approximated. The reconstructed character model comprises a skeleton-based representation for the actual body parts and a physically-based simulation model for the apparel. In contrast to previous performance capture methods, we can now also create new real-time animations of actors captured in general apparel.
    ACM Transactions on Graphics 01/2010; 29:139. · 3.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We deal with the problem of detecting and identifying body parts in depth images at video frame rates. Our solution involves a novel interest point detector for mesh and range data that is particularly well suited for analyzing human shape. The interest points, which are based on identifying geodesic extrema on the surface mesh, coincide with salient points of the body, which can be classified as, e.g., hand, foot or head using local shape descriptors. Our approach also provides a natural way of estimating a 3D orientation vector for a given interest point. This can be used to normalize the local shape descriptors to simplify the classification problem as well as to directly estimate the orientation of body parts in space. Experiments involving ground truth labels acquired via an active motion capture system show that our interest points in conjunction with a boosted patch classifier are significantly better in detecting body parts in depth images than state-of-the-art sliding-window based detectors.
    IEEE International Conference on Robotics and Automation, ICRA 2010, Anchorage, Alaska, USA, 3-7 May 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a flexible method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. In contrast with existing approaches, we do not assume that the depth and color data streams have the same data rates or that the observed scene is fully static. Our method produces a dense, high-resolution depth map of the scene, automatically generating confidence values for every interpolated depth point. We describe how to integrate priors on object motion and appearance and how to achieve an efficient implementation using parallel processing hardware such as GPUs.
    The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010; 01/2010

Publication Stats

20k Citations
87.47 Total Impact Points

Institutions

  • 1970–2013
    • Carnegie Mellon University
      • • Computer Science Department
      • • Robotics Institute
      Pittsburgh, Pennsylvania, United States
  • 2003–2012
    • Stanford University
      • • Department of Computer Science
      • • Artificial Intelligence Laboratory
      Palo Alto, California, United States
    • University of Sydney
      • Australian Centre for Field Robotics
      Sydney, New South Wales, Australia
  • 2009
    • Carnegie Institution for Science
      Washington, West Virginia, United States
  • 2004
    • Rutgers, The State University of New Jersey
      New Brunswick, New Jersey, United States
  • 2002–2004
    • University of Freiburg
      • Department of Computer Science
      Freiburg, Baden-Württemberg, Germany
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 1993–2001
    • University of Bonn
      • Institute for Computer Sciences
      Bonn, North Rhine-Westphalia, Germany