Sebastian Thrun

Google Inc.

Are you Sebastian Thrun?

Claim your profile

Publications (485)

  • David Held · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract] ABSTRACT: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker’s state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker (Our tracker is available at http:// davheld. github. io/ GOTURN/ GOTURN. html) is the first neural-network tracker that learns to track generic objects at 100 fps.
    Chapter · Oct 2016
  • Conference Paper · Jun 2016
  • David Held · Sebastian Thrun · Silvio Savarese
    Conference Paper · May 2016
  • Source
    David Held · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract] ABSTRACT: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for using neural networks to track generic objects in a way that allows them to improve performance by training on labeled videos. Previous attempts to use neural networks for tracking are very slow to run and not practical for real-time applications. In contrast, our tracker uses a simple feed-forward network with no online training required, allowing our tracker to run at 100 fps during test time. Our tracker trains from both labeled video as well as a large collection of images, which helps prevent overfitting. The tracker learns generic object motion and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker's state-of-the-art performance. Our network learns to track generic objects in real-time as they move throughout the world.
    Full-text Article · Apr 2016
  • David Held · Jesse Levinson · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract] ABSTRACT: Real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real-world data. We present a tracker that combines 3D shape, color (when available), and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.37 ms per object, our method outperforms all of the baseline methods of similar speed by at least 25% in root-mean-square (RMS) tracking error. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application. By combining 3D shape, color (when available), and motion cues in a probabilistic framework, our tracker is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances.
    Article · Aug 2015 · The International Journal of Robotics Research
  • Source
    David Held · Sebastian Thrun · Silvio Savarese
    [Show abstract] [Hide abstract] ABSTRACT: Deep learning methods have typically been trained on large datasets in which many training examples are available. However, many real-world product datasets have only a small number of images available for each product. We explore the use of deep learning methods for recognizing object instances when we have only a single training example per class. We show that feedforward neural networks outperform state-of-the-art methods for recognizing objects from novel viewpoints even when trained from just a single image per object. To further improve our performance on this task, we propose to take advantage of a supplementary dataset in which we observe a separate set of objects from multiple viewpoints. We introduce a new approach for training deep learning methods for instance recognition with limited training data, in which we use an auxiliary multi-view dataset to train our network to be robust to viewpoint changes. We find that this approach leads to a more robust classifier for recognizing objects from novel viewpoints, outperforming previous state-of-the-art approaches including keypoint-matching, template-based techniques, and sparse coding.
    Full-text Article · Jul 2015
  • Jake T. Lussier · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Reasoning about a scene's thermal signature, in addition to its visual appearance and spatial configuration, would facilitate significant advances in perceptual systems. Applications involving the segmentation and tracking of persons, vehicles, and other heat-emitting objects, for example, could benefit tremendously from even coarsely accurate relative temperatures. With the increasing affordability of commercially available thermal cameras, as well as the imminent introduction of new, mobile form factors, such data will be readily and widely accessible. However, in order for thermal processing to complement existing methods in RGBD, there must be an effective procedure for calibrating RGBD and thermal cameras to create RGBDT (red, green, blue, depth, and thermal) data. In this paper, we present an automatic method for the synchronization and calibration of RGBD and thermal cameras in arbitrary environments. While traditional calibration methods fail in our multimodal setting, we leverage invariant features visible by both camera types. We first synchronize the streams with a simple optimization procedure that aligns their motion statistic time series. We then find the relative poses of the cameras by minimizing an objective that measures the alignment between edge maps from the two streams. In contrast to existing methods that use special calibration targets with key points visible to both cameras, our method requires nothing more than some edges visible to both cameras, such as those arising from humans. We evaluate our method and demonstrate that it consistently converges to the correct transform and that it results in high-quality RGBDT data.
    Conference Paper · Sep 2014
  • Source
    David Held · Jesse Levinson · Sebastian Thrun · Silvio Savarese
    Full-text Conference Paper · Jul 2014
  • Jesse Levinson · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Light Detection and Ranging (LIDAR) sensors have become increasingly common in both industrial and robotic applications. LIDAR sensors are particularly desirable for their direct distance measurements and high accuracy, but traditionally have been configured with only a single rotating beam. However, recent technological progress has spawned a new generation of LIDAR sensors equipped with many simultaneous rotating beams at varying angles, providing at least an order of magnitude more data than single-beam LIDARs and enabling new applications in mapping [6], object detection and recognition [15], scene understanding [16], and SLAM [9].
    Chapter · Jan 2014
  • Michael Steven Montemerlo · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Aspects of the disclosure relate generally to an autonomous vehicle accessing portions of a map to localize itself within the map. More specifically, one or more convolution scores may be generated between a prior map and a current map. Convolution scores may be generated by applying a fast Fourier transform on both the prior and current maps, multiplying the results of the transforms, and taking the inverse fast Fourier transform of the product. Based on these convolution scores, an autonomous vehicle may determine the offset between the maps and localize itself relative to the prior map.
    Patent · Dec 2013
  • Conference Paper: Group induction
    Alex Teichman · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Machine perception often requires a large amount of user-annotated data which is time-consuming, difficult, or expensive to collect. Perception systems should be easy to train by regular users, and this is currently far from the case. Our previous work, tracking-based semi-supervised learning [14], helped reduce the labeling burden by using tracking information to harvest new and useful training examples. However, [14] was designed for offline use; it assumed a fixed amount of unlabeled data and did not allow for corrections from users. In many practical robot perception scenarios we A) desire continuous learning over a long period of time, B) have a stream of unlabeled sensor data available rather than a fixed dataset, and C) are willing to periodically provide a small number of new training examples. In light of this, we present group induction, a new mathematical framework that rigorously encodes the intuition of [14] in an alternating optimization problem similar to expectation maximization (EM), but with the assumption that the unlabeled data comes in groups of instances that share the same hidden label. The mathematics suggest several improvements to the original heuristic algorithm, and make clear how to handle user interaction and streams of unlabeled data. We evaluate group induction on a track classification task from natural street scenes, demonstrating its ability to learn continuously, adapt to user feedback, and accurately recognize objects of interest.
    Conference Paper · Nov 2013
  • Stephen Miller · Alex Teichman · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: While inexpensive depth sensors are becoming increasingly ubiquitous, field of view and self-occlusion constraints limit the information a single sensor can provide. For many applications one may instead require a network of depth sensors, registered to a common world frame and synchronized in time. Historically such a setup has required a tedious manual calibration procedure, making it infeasible to deploy these networks in the wild, where spatial and temporal drift are common. In this work, we propose an entirely unsupervised procedure for calibrating the relative pose and time offsets of a pair of depth sensors. So doing, we make no use of an explicit calibration target, or any intentional activity on the part of a user. Rather, we use the unstructured motion of objects in the scene to find potential correspondences between the sensor pair. This yields a rough transform which is then refined with an occlusion-aware energy minimization. We compare our results against the standard checkerboard technique, and provide qualitative examples for scenes in which such a technique would be impossible.
    Conference Paper · Nov 2013
  • Alex Teichman · Jake T. Lussier · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions-no prior object model, no stationary sensor, and no prior 3-D map-thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3-D model capture, and object recognition. Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art. This paper is an extended version of our previous work. Building on our previous work, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( ~ 20 FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.
    Article · Oct 2013 · IEEE Transactions on Automation Science and Engineering
  • Jesse Levinson · Sebastian Thrun
    Conference Paper · Jun 2013
  • Alex Teichman · Stephen Miller · Sebastian Thrun
    Conference Paper · Jun 2013
  • David Held · Jesse Levinson · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Precision tracking is important for predicting the behavior of other cars in autonomous driving. We present a novel method to combine laser and camera data to achieve accurate velocity estimates of moving vehicles. We combine sparse laser points with a high-resolution camera image to obtain a dense colored point cloud. We use a color-augmented search algorithm to align the dense color point clouds from successive time frames for a moving vehicle, thereby obtaining a precise estimate of the tracked vehicle's velocity. Using this alignment method, we obtain velocity estimates at a much higher accuracy than previous methods. Through pre-filtering, we are able to achieve near real time results. We also present an online method for real-time use with accuracies close to that of the full method. We present a novel approach to quantitatively evaluate our velocity estimates by tracking a parked car in a local reference frame in which it appears to be moving relative to the ego vehicle. We use this evaluation method to automatically quantitatively evaluate our tracking performance on 466 separate tracked vehicles. Our method obtains a mean absolute velocity error of 0.27 m/s and an RMS error of 0.47 m/s on this test set. We can also qualitatively evaluate our method by building color 3D car models from moving vehicles. We have thus demonstrated that our method can be used for precision car tracking with applications to autonomous driving and behavior modeling.
    Conference Paper · May 2013
  • Yan Cui · Sebastian Schuon · Sebastian Thrun · [...] · Christian Theobalt
    [Show abstract] [Hide abstract] ABSTRACT: We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a Time-of-Flight (ToF) camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology, they bear potential for economical production in big volumes. Our easy-to-use, cost-effective scanning solution, which is based on such a sensor, could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of random noise is substantial and there is a nontrivial systematic bias. In this paper, we show the surprising result that 3D scans of reasonable quality can also be obtained with a sensor of such low data quality. Established filtering and scan alignment techniques from the literature fail to achieve this goal. In contrast, our algorithm is based on a new combination of a 3D superresolution method with a probabilistic scan alignment approach that explicitly takes into account the sensor's noise characteristics.
    Article · May 2013 · IEEE Transactions on Software Engineering
  • Source
    Edilson De Aguiar · Carsten Stoll · Christian Theobalt · [...] · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: A variety of methods, devices and storage mediums are implemented for creating digital representations of figures. According to one such computer implemented method, a volumetric representation of a figure is correlated with an image of the figure. Reference points are found that are common to each of two temporally distinct images of the figure, the reference points representing movement of the figure between the two images. A volumetric deformation is applied to the digital representation of the figure as a function of the reference points and the correlation of the volumetric representation of the figure. A fine deformation is applied as a function of the coarse/volumetric deformation. Responsive to the applied deformations, an updated digital representation of the figure is generated.
    Full-text Patent · Feb 2013
  • Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: This presentation will introduce the audience to a new, emerging body of research on sequential Monte Carlo techniques in robotics. In recent years, particle filters have solved several hard perceptual robotic problems. Early successes were limited to low-dimensional problems, such as the problem of robot localization in environments with known maps. More recently, researchers have begun exploiting structural properties of robotic domains that have led to successful particle filter applications in spaces with as many as 100,000 dimensions. The presentation will discuss specific tricks necessary to make these techniques work in real - world domains,and also discuss open challenges for researchers IN the UAI community.
    Article · Dec 2012
  • Source
    Dragomir Anguelov · Rahul Biswas · Daphne Koller · [...] · Sebastian Thrun
    [Show abstract] [Hide abstract] ABSTRACT: Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
    Full-text Article · Dec 2012

Publication Stats

43k Citations


  • 2013
    • Google Inc.
  • 1970-2012
    • Carnegie Mellon University
      • Computer Science Department
      Pittsburgh, Pennsylvania, United States
  • 1970-2011
    • Stanford University
      • • Department of Computer Science
      • • Artificial Intelligence Laboratory
      Palo Alto, California, United States
  • 2002-2010
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2009
    • Carnegie Institution for Science
      Washington, West Virginia, United States
  • 2000-2009
    • University of Freiburg
      • Department of Computer Science
      Freiburg, Lower Saxony, Germany
  • 2004
    • NASA
      Вашингтон, West Virginia, United States
    • Rutgers, The State University of New Jersey
      New Brunswick, New Jersey, United States
  • 2003
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, Washington, United States
  • 1991-2000
    • University of Bonn
      • Institute for Computer Sciences
      Bonn, North Rhine-Westphalia, Germany