Sebastian Thrun

Deutsches Forschungszentrum für Künstliche Intelligenz, Kaiserlautern, Rheinland-Pfalz, Germany

Are you Sebastian Thrun?

Claim your profile

Publications (455)120.95 Total impact

  • Source
    Alex Teichman, J.T. Lussier, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions-no prior object model, no stationary sensor, and no prior 3-D map-thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3-D model capture, and object recognition. Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art. This paper is an extended version of our previous work. Building on our previous work, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( ~ 20 FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.
    IEEE Transactions on Automation Science and Engineering 10/2013; 10(4):841-852. · 2.16 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a Time-of-Flight (ToF) camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology, they bear potential for economical production in big volumes. Our easy-to-use, cost-effective scanning solution, which is based on such a sensor, could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a nontrivial systematic bias. In this paper, we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics.
    IEEE Transactions on Software Engineering 05/2013; 35(5):1039-50. · 2.29 Impact Factor
  • Source
    Dimitris Margaritis, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables.
    01/2013;
  • Conference Paper: Group induction
    A. Teichman, S. Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Machine perception often requires a large amount of user-annotated data which is time-consuming, difficult, or expensive to collect. Perception systems should be easy to train by regular users, and this is currently far from the case. Our previous work, tracking-based semi-supervised learning [14], helped reduce the labeling burden by using tracking information to harvest new and useful training examples. However, [14] was designed for offline use; it assumed a fixed amount of unlabeled data and did not allow for corrections from users. In many practical robot perception scenarios we A) desire continuous learning over a long period of time, B) have a stream of unlabeled sensor data available rather than a fixed dataset, and C) are willing to periodically provide a small number of new training examples. In light of this, we present group induction, a new mathematical framework that rigorously encodes the intuition of [14] in an alternating optimization problem similar to expectation maximization (EM), but with the assumption that the unlabeled data comes in groups of instances that share the same hidden label. The mathematics suggest several improvements to the original heuristic algorithm, and make clear how to handle user interaction and streams of unlabeled data. We evaluate group induction on a track classification task from natural street scenes, demonstrating its ability to learn continuously, adapt to user feedback, and accurately recognize objects of interest.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • D. Held, J. Levinson, S. Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Precision tracking is important for predicting the behavior of other cars in autonomous driving. We present a novel method to combine laser and camera data to achieve accurate velocity estimates of moving vehicles. We combine sparse laser points with a high-resolution camera image to obtain a dense colored point cloud. We use a color-augmented search algorithm to align the dense color point clouds from successive time frames for a moving vehicle, thereby obtaining a precise estimate of the tracked vehicle's velocity. Using this alignment method, we obtain velocity estimates at a much higher accuracy than previous methods. Through pre-filtering, we are able to achieve near real time results. We also present an online method for real-time use with accuracies close to that of the full method. We present a novel approach to quantitatively evaluate our velocity estimates by tracking a parked car in a local reference frame in which it appears to be moving relative to the ego vehicle. We use this evaluation method to automatically quantitatively evaluate our tracking performance on 466 separate tracked vehicles. Our method obtains a mean absolute velocity error of 0.27 m/s and an RMS error of 0.47 m/s on this test set. We can also qualitatively evaluate our method by building color 3D car models from moving vehicles. We have thus demonstrated that our method can be used for precision car tracking with applications to autonomous driving and behavior modeling.
    Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
  • S. Miller, A. Teichman, S. Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: While inexpensive depth sensors are becoming increasingly ubiquitous, field of view and self-occlusion constraints limit the information a single sensor can provide. For many applications one may instead require a network of depth sensors, registered to a common world frame and synchronized in time. Historically such a setup has required a tedious manual calibration procedure, making it infeasible to deploy these networks in the wild, where spatial and temporal drift are common. In this work, we propose an entirely unsupervised procedure for calibrating the relative pose and time offsets of a pair of depth sensors. So doing, we make no use of an explicit calibration target, or any intentional activity on the part of a user. Rather, we use the unstructured motion of objects in the scene to find potential correspondences between the sensor pair. This yields a rough transform which is then refined with an occlusion-aware energy minimization. We compare our results against the standard checkerboard technique, and provide qualitative examples for scenes in which such a technique would be impossible.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
    12/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles.
    10/2012;
  • Source
    Joelle Pineau, Geoffrey Gordon, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems.
    10/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tracking human pose in real-time is a difficult problem with many interesting applications. Existing solutions suffer from a variety of problems, especially when confronted with unusual human poses. In this paper, we derive an algorithm for tracking human pose in real-time from depth sequences based on MAP inference in a probabilistic temporal model. The key idea is to extend the iterative closest points (ICP) objective by modeling the constraint that the observed subject cannot enter free space, the area of space in front of the true range measurements. Our primary contribution is an extension to the articulated ICP algorithm that can efficiently enforce this constraint. The resulting filter runs at 125 frames per second using a single desktop CPU core. We provide extensive experimental results on challenging real-world data, which show that the algorithm outperforms the previous state-of-the-art trackers both in computational efficiency and accuracy.
    Proceedings of the 12th European conference on Computer Vision - Volume Part VI; 10/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers allthe meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a fiexing arm where significant non-rigid deformation was present.
    07/2012;
  • Source
    David Stavens, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a machine learning approach for estimating the second derivative of a drivable surface, its roughness. Robot perception generally focuses on the first derivative, obstacle detection. However, the second derivative is also important due to its direct relation (with speed) to the shock the vehicle experiences. Knowing the second derivative allows a vehicle to slow down in advance of rough terrain. Estimating the second derivative is challenging due to uncertainty. For example, at range, laser readings may be so sparse that significant information about the surface is missing. Also, a high degree of precision is required in projecting laser readings. This precision may be unavailable due to latency or error in the pose estimation. We model these sources of error as a multivariate polynomial. Its coefficients are learned using the shock data as ground truth -- the accelerometers are used to train the lasers. The resulting classifier operates on individual laser readings from a road surface described by a 3D point cloud. The classifier identifies sections of road where the second derivative is likely to be large. Thus, the vehicle can slow down in advance, reducing the shock it experiences. The algorithm is an evolution of one we used in the 2005 DARPA Grand Challenge. We analyze it using data from that route.
    06/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    International Conference on Robotics and Automation, Saint Paul, MN; 05/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012;
  • David Held, Jesse Levinson, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting cars in real-world images is an important task for autonomous driving, yet it remains unsolved. The system described in this paper takes advantage of context and scale to build a monocular single-frame image-based car detector that significantly outperforms the baseline. The system uses a probabilistic model to combine multiple forms of evidence for both context and scale to locate cars in a real-world image. We also use scale filtering to speed up our algorithm by a factor of 3.3 compared to the baseline. By using a calibrated camera and localization on a road map, we are able to obtain context and scale information from a single image without the use of a 3D laser. The system outperforms the baseline by an absolute 9.4% in overall average precision and 11.7% in average precision for cars smaller than 50 pixels in height, for which context and scale cues are especially important.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012;
  • Wolfram Burgard, Dieter Fox, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the ultimate goals of the field of artificial intelligence and robotics is to develop systems that assist us in our everyday lives by autonomously carrying out a variety of different tasks. To achieve this and to generate appropriate actions, such systems need to be able to accurately interpret their sensory input and estimate their state or the state of the environment to be successful. In recent years, probabilistic approaches have emerged as a key technology for these problems. In this article, we will describe state-of-the-art solutions to challenging tasks from the area of mobile robotics, autonomous cars, and activity recognition, which are all based on the paradigm of probabilistic state estimation.
    Informatik Spektrum 10/2011; 34:455-461.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In order to achieve autonomous operation of a vehicle in urban situations with unpredictable traffic, several realtime systems must interoperate, including environment perception, localization, planning, and control. In addition, a robust vehicle platform with appropriate sensors, computational hardware, networking, and software infrastructure is essential. We previously published an overview of Junior, Stanford's entry in the 2007 DARPA Urban Challenge. This race was a closed-course competition which, while historic and inciting much progress in the field, was not fully representative of the situations that exist in the real world. In this paper, we present a summary of our recent research towards the goal of enabling safe and robust autonomous operation in more realistic situations. First, a trio of unsupervised algorithms automatically calibrates our 64-beam rotating LIDAR with accuracy superior to tedious hand measurements. We then generate high-resolution maps of the environment which are subsequently used for online localization with centimeter accuracy. Improved perception and recognition algorithms now enable Junior to track and classify obstacles as cyclists, pedestrians, and vehicles; traffic lights are detected as well. A new planning system uses this incoming data to generate thousands of candidate trajectories per second, choosing the optimal path dynamically. The improved controller continuously selects throttle, brake, and steering actuations that maximize comfort and minimize trajectory error. All of these algorithms work in sun or rain and during the day or night. With these systems operating together, Junior has successfully logged hundreds of miles of autonomous operation in a variety of real-life conditions.
    Intelligent Vehicles Symposium (IV), 2011 IEEE; 07/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Detection of traffic light state is essential for autonomous driving in cities. Currently, the only reliable systems for determining traffic light state information are non-passive proofs of concept, requiring explicit communication between a traffic signal and vehicle. Here, we present a passive camera based pipeline for traffic light state detection, using (imperfect) vehicle localization and assuming prior knowledge of traffic light location. First, we introduce a convenient technique for mapping traffic light locations from recorded video data using tracking, back-projection, and triangulation. In order to achieve robust real-time detection results in a variety of lighting conditions, we combine several probabilistic stages that explicitly account for the corresponding sources of sensor and data uncertainty. In addition, our approach is the first to account for multiple lights per intersection, which yields superior results by probabilistically combining evidence from all available lights. To evaluate the performance of our method, we present several results across a variety of lighting conditions in a real-world environment. The techniques described here have for the first time enabled our autonomous research vehicle to successfully navigate through traffic-light-controlled intersections in real traffic.
    Robotics and Automation (ICRA), 2011 IEEE International Conference on; 06/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The widespread deployment of wireless networks presents an opportunity for localization and mapping using only signal-strength measurements. The current state of the art is to use Gaussian process latent variable models (GP-LVM). This method works well, but relies on a signature uniqueness assumption which limits its applicability to only signal-rich environments. Moreover, it does not scale computationally to large sets of data, requiring O(N<sup>3</sup>) operations per iteration. We present a GraphSLAM-like algorithm for signal strength SLAM. Our algorithm shares many of the benefits of Gaussian processes, yet is viable for a broader range of environments since it makes no signature uniqueness assumptions. It is also more tractable to larger map sizes, requiring O(N<sup>2</sup>) operations per iteration. We compare our algorithm to a laser-SLAM ground truth, showing it produces excellent results in practice.
    Robotics and Automation (ICRA), 2011 IEEE International Conference on; 06/2011
  • Source
    Alex Teichman, Jesse Levinson, Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Object recognition is a critical next step for autonomous robots, but a solution to the problem has remained elusive. Prior 3D-sensor-based work largely classifies individual point cloud segments or uses class-specific trackers. In this paper, we take the approach of classifying the tracks of all visible objects. Our new track classification method, based on a mathematically principled method of combining log odds estimators, is fast enough for real time use, is non-specific to object class, and performs well (98.5% accuracy) on the task of classifying correctly-tracked, well-segmented objects into car, pedestrian, bicyclist, and background classes. We evaluate the classifier's performance using the Stanford Track Collection, a new dataset of about 1.3 million labeled point clouds in about 14,000 tracks recorded from an au- tonomous vehicle research platform. This dataset, which we make publicly available, contains tracks extracted from about one hour of 360-degree, 10Hz depth information recorded both while driving on busy campus streets and parked at busy intersections.
    IEEE International Conference on Robotics and Automation, ICRA 2011, Shanghai, China, 9-13 May 2011; 01/2011

Publication Stats

30k Citations
120.95 Total Impact Points

Institutions

  • 2013
    • Deutsches Forschungszentrum für Künstliche Intelligenz
      • Augmented Vision
      Kaiserlautern, Rheinland-Pfalz, Germany
  • 1970–2013
    • Carnegie Mellon University
      • Computer Science Department
      Pittsburgh, Pennsylvania, United States
  • 2002–2012
    • Stanford University
      • • Department of Computer Science
      • • Artificial Intelligence Laboratory
      Palo Alto, California, United States
    • Universidad de Valladolid
      Valladolid, Castille and León, Spain
  • 2002–2010
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2009
    • Carnegie Institution for Science
      Washington, West Virginia, United States
  • 2000–2009
    • University of Freiburg
      • Department of Computer Science
      Freiburg, Lower Saxony, Germany
  • 2004
    • Rutgers, The State University of New Jersey
      New Brunswick, New Jersey, United States
  • 2003
    • University of Sydney
      • Australian Centre for Field Robotics
      Sydney, New South Wales, Australia
  • 1993–2001
    • University of Bonn
      • Institute for Computer Sciences
      Bonn, North Rhine-Westphalia, Germany