Sebastian Thrun

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Are you Sebastian Thrun?

Claim your profile

Publications (474)187.5 Total impact

  • David Held · Jesse Levinson · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real-world data. We present a tracker that combines 3D shape, color (when available), and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.37 ms per object, our method outperforms all of the baseline methods of similar speed by at least 25% in root-mean-square (RMS) tracking error. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application. By combining 3D shape, color (when available), and motion cues in a probabilistic framework, our tracker is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances.
    No preview · Article · Aug 2015 · The International Journal of Robotics Research
  • Source
    David Held · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep learning methods have typically been trained on large datasets in which many training examples are available. However, many real-world product datasets have only a small number of images available for each product. We explore the use of deep learning methods for recognizing object instances when we have only a single training example per class. We show that feedforward neural networks outperform state-of-the-art methods for recognizing objects from novel viewpoints even when trained from just a single image per object. To further improve our performance on this task, we propose to take advantage of a supplementary dataset in which we observe a separate set of objects from multiple viewpoints. We introduce a new approach for training deep learning methods for instance recognition with limited training data, in which we use an auxiliary multi-view dataset to train our network to be robust to viewpoint changes. We find that this approach leads to a more robust classifier for recognizing objects from novel viewpoints, outperforming previous state-of-the-art approaches including keypoint-matching, template-based techniques, and sparse coding.
    Preview · Article · Jul 2015
  • Jake T. Lussier · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Reasoning about a scene's thermal signature, in addition to its visual appearance and spatial configuration, would facilitate significant advances in perceptual systems. Applications involving the segmentation and tracking of persons, vehicles, and other heat-emitting objects, for example, could benefit tremendously from even coarsely accurate relative temperatures. With the increasing affordability of commercially available thermal cameras, as well as the imminent introduction of new, mobile form factors, such data will be readily and widely accessible. However, in order for thermal processing to complement existing methods in RGBD, there must be an effective procedure for calibrating RGBD and thermal cameras to create RGBDT (red, green, blue, depth, and thermal) data. In this paper, we present an automatic method for the synchronization and calibration of RGBD and thermal cameras in arbitrary environments. While traditional calibration methods fail in our multimodal setting, we leverage invariant features visible by both camera types. We first synchronize the streams with a simple optimization procedure that aligns their motion statistic time series. We then find the relative poses of the cameras by minimizing an objective that measures the alignment between edge maps from the two streams. In contrast to existing methods that use special calibration targets with key points visible to both cameras, our method requires nothing more than some edges visible to both cameras, such as those arising from humans. We evaluate our method and demonstrate that it consistently converges to the correct transform and that it results in high-quality RGBDT data.
    No preview · Conference Paper · Sep 2014
  • Jesse Levinson · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Light Detection and Ranging (LIDAR) sensors have become increasingly common in both industrial and robotic applications. LIDAR sensors are particularly desirable for their direct distance measurements and high accuracy, but traditionally have been configured with only a single rotating beam. However, recent technological progress has spawned a new generation of LIDAR sensors equipped with many simultaneous rotating beams at varying angles, providing at least an order of magnitude more data than single-beam LIDARs and enabling new applications in mapping [6], object detection and recognition [15], scene understanding [16], and SLAM [9].
    No preview · Chapter · Jan 2014
  • Conference Paper: Group induction
    Alex Teichman · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Machine perception often requires a large amount of user-annotated data which is time-consuming, difficult, or expensive to collect. Perception systems should be easy to train by regular users, and this is currently far from the case. Our previous work, tracking-based semi-supervised learning [14], helped reduce the labeling burden by using tracking information to harvest new and useful training examples. However, [14] was designed for offline use; it assumed a fixed amount of unlabeled data and did not allow for corrections from users. In many practical robot perception scenarios we A) desire continuous learning over a long period of time, B) have a stream of unlabeled sensor data available rather than a fixed dataset, and C) are willing to periodically provide a small number of new training examples. In light of this, we present group induction, a new mathematical framework that rigorously encodes the intuition of [14] in an alternating optimization problem similar to expectation maximization (EM), but with the assumption that the unlabeled data comes in groups of instances that share the same hidden label. The mathematics suggest several improvements to the original heuristic algorithm, and make clear how to handle user interaction and streams of unlabeled data. We evaluate group induction on a track classification task from natural street scenes, demonstrating its ability to learn continuously, adapt to user feedback, and accurately recognize objects of interest.
    No preview · Conference Paper · Nov 2013
  • Source
    Alex Teichman · Jake T. Lussier · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions-no prior object model, no stationary sensor, and no prior 3-D map-thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3-D model capture, and object recognition. Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art. This paper is an extended version of our previous work. Building on our previous work, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( ~ 20 FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.
    Preview · Article · Oct 2013 · IEEE Transactions on Automation Science and Engineering
  • David Held · Jesse Levinson · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Precision tracking is important for predicting the behavior of other cars in autonomous driving. We present a novel method to combine laser and camera data to achieve accurate velocity estimates of moving vehicles. We combine sparse laser points with a high-resolution camera image to obtain a dense colored point cloud. We use a color-augmented search algorithm to align the dense color point clouds from successive time frames for a moving vehicle, thereby obtaining a precise estimate of the tracked vehicle's velocity. Using this alignment method, we obtain velocity estimates at a much higher accuracy than previous methods. Through pre-filtering, we are able to achieve near real time results. We also present an online method for real-time use with accuracies close to that of the full method. We present a novel approach to quantitatively evaluate our velocity estimates by tracking a parked car in a local reference frame in which it appears to be moving relative to the ego vehicle. We use this evaluation method to automatically quantitatively evaluate our tracking performance on 466 separate tracked vehicles. Our method obtains a mean absolute velocity error of 0.27 m/s and an RMS error of 0.47 m/s on this test set. We can also qualitatively evaluate our method by building color 3D car models from moving vehicles. We have thus demonstrated that our method can be used for precision car tracking with applications to autonomous driving and behavior modeling.
    No preview · Conference Paper · May 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a Time-of-Flight (ToF) camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology, they bear potential for economical production in big volumes. Our easy-to-use, cost-effective scanning solution, which is based on such a sensor, could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a nontrivial systematic bias. In this paper, we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics.
    No preview · Article · May 2013 · IEEE Transactions on Software Engineering
  • Source
    Dimitris Margaritis · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables.
    Preview · Article · Jan 2013
  • S. Miller · A. Teichman · S. Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: While inexpensive depth sensors are becoming increasingly ubiquitous, field of view and self-occlusion constraints limit the information a single sensor can provide. For many applications one may instead require a network of depth sensors, registered to a common world frame and synchronized in time. Historically such a setup has required a tedious manual calibration procedure, making it infeasible to deploy these networks in the wild, where spatial and temporal drift are common. In this work, we propose an entirely unsupervised procedure for calibrating the relative pose and time offsets of a pair of depth sensors. So doing, we make no use of an explicit calibration target, or any intentional activity on the part of a user. Rather, we use the unstructured motion of objects in the scene to find potential correspondences between the sensor pair. This yields a rough transform which is then refined with an occlusion-aware energy minimization. We compare our results against the standard checkerboard technique, and provide qualitative examples for scenes in which such a technique would be impossible.
    No preview · Conference Paper · Jan 2013
  • Source
    Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: This presentation will introduce the audience to a new, emerging body of research on sequential Monte Carlo techniques in robotics. In recent years, particle filters have solved several hard perceptual robotic problems. Early successes were limited to low-dimensional problems, such as the problem of robot localization in environments with known maps. More recently, researchers have begun exploiting structural properties of robotic domains that have led to successful particle filter applications in spaces with as many as 100,000 dimensions. The presentation will discuss specific tricks necessary to make these techniques work in real - world domains,and also discuss open challenges for researchers IN the UAI community.
    Preview · Article · Dec 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
    Full-text · Article · Dec 2012
  • Source
    Joelle Pineau · Geoffrey Gordon · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems.
    Preview · Article · Oct 2012
  • Source
    Matthew Rosencrantz · Geoffrey Gordon · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles.
    Preview · Article · Oct 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tracking human pose in real-time is a difficult problem with many interesting applications. Existing solutions suffer from a variety of problems, especially when confronted with unusual human poses. In this paper, we derive an algorithm for tracking human pose in real-time from depth sequences based on MAP inference in a probabilistic temporal model. The key idea is to extend the iterative closest points (ICP) objective by modeling the constraint that the observed subject cannot enter free space, the area of space in front of the true range measurements. Our primary contribution is an extension to the articulated ICP algorithm that can efficiently enforce this constraint. The resulting filter runs at 125 frames per second using a single desktop CPU core. We provide extensive experimental results on challenging real-world data, which show that the algorithm outperforms the previous state-of-the-art trackers both in computational efficiency and accuracy.
    No preview · Conference Paper · Oct 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers allthe meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a fiexing arm where significant non-rigid deformation was present.
    Full-text · Article · Jul 2012
  • Source
    David Stavens · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a machine learning approach for estimating the second derivative of a drivable surface, its roughness. Robot perception generally focuses on the first derivative, obstacle detection. However, the second derivative is also important due to its direct relation (with speed) to the shock the vehicle experiences. Knowing the second derivative allows a vehicle to slow down in advance of rough terrain. Estimating the second derivative is challenging due to uncertainty. For example, at range, laser readings may be so sparse that significant information about the surface is missing. Also, a high degree of precision is required in projecting laser readings. This precision may be unavailable due to latency or error in the pose estimation. We model these sources of error as a multivariate polynomial. Its coefficients are learned using the shock data as ground truth -- the accelerometers are used to train the lasers. The resulting classifier operates on individual laser readings from a road surface described by a 3D point cloud. The classifier identifies sections of road where the second derivative is likely to be large. Thus, the vehicle can slow down in advance, reducing the shock it experiences. The algorithm is an evolution of one we used in the 2005 DARPA Grand Challenge. We analyze it using data from that route.
    Preview · Article · Jun 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    Full-text · Conference Paper · May 2012
  • David Held · Jesse Levinson · Sebastian Thrun
    [Show abstract] [Hide abstract]
    ABSTRACT: Detecting cars in real-world images is an important task for autonomous driving, yet it remains unsolved. The system described in this paper takes advantage of context and scale to build a monocular single-frame image-based car detector that significantly outperforms the baseline. The system uses a probabilistic model to combine multiple forms of evidence for both context and scale to locate cars in a real-world image. We also use scale filtering to speed up our algorithm by a factor of 3.3 compared to the baseline. By using a calibrated camera and localization on a road map, we are able to obtain context and scale information from a single image without the use of a 3D laser. The system outperforms the baseline by an absolute 9.4% in overall average precision and 11.7% in average precision for cars smaller than 50 pixels in height, for which context and scale cues are especially important.
    No preview · Article · May 2012 · Proceedings - IEEE International Conference on Robotics and Automation
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a hand-held system for real-time, interactive acquisition of residential floor plans. The system integrates a commodity range camera, a micro-projector, and a button interface for user input and allows the user to freely move through a building to capture its important architectural elements. The system uses the Manhattan world assumption, which posits that wall layouts are rectilinear. This assumption allows generating floor plans in real time, enabling the operator to interactively guide the reconstruction process and to resolve structural ambiguities and errors during the acquisition. The interactive component aids users with no architectural training in acquiring wall layouts for their residences. We show a number of residential floor plans reconstructed with the system.
    Preview · Article · Jan 2012 · Proceedings - IEEE International Conference on Robotics and Automation

Publication Stats

38k Citations
187.50 Total Impact Points

Institutions

  • 1970-2013
    • Carnegie Mellon University
      • Computer Science Department
      Pittsburgh, Pennsylvania, United States
    • Stanford University
      • • Department of Computer Science
      • • Artificial Intelligence Laboratory
      Palo Alto, California, United States
  • 2002-2010
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2009
    • Carnegie Institution for Science
      Washington, West Virginia, United States
  • 2000-2009
    • University of Freiburg
      • Department of Computer Science
      Freiburg, Lower Saxony, Germany
  • 2004
    • NASA
      Вашингтон, West Virginia, United States
    • Rutgers, The State University of New Jersey
      New Brunswick, New Jersey, United States
  • 2003
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, Washington, United States
  • 1991-2000
    • University of Bonn
      • Institute for Computer Sciences
      Bonn, North Rhine-Westphalia, Germany