Richard Szeliski

Microsoft, Washington, West Virginia, United States

Are you Richard Szeliski?

Claim your profile

Publications (110)97.36 Total impact

  • Sudipta Narayan Sinha · Daniel Scharstein · Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses for a semi-global matching algorithm. Our local plane hypotheses are derived from initial sparse feature correspondences followed by an iterative clustering step. Local plane sweeps are then performed around each slanted plane to produce out-of-plane parallax and matching-cost estimates. A final global optimization stage, implemented using semi-global matching, assigns each pixel to one of the local plane hypotheses. By only exploring a small fraction of the whole disparity space volume, our technique achieves significant speedups over previous algorithms and achieves state-of-the-art accuracy on high-resolution stereo pairs of up to 19 megapixels.
    No preview · Article · Sep 2014
  • Juliet Fiss · Brian Curless · Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a simple, novel plane sweep technique for refocusing plenoptic images. Rays are projected directly from the raw plenoptic image captured on the sensor into the output image plane, without computing intermediate representations such as subaperture views or epipolar images. Interpolation is performed in the output image plane using splatting. The splat kernel for each ray is adjusted adaptively, based on the refocus depth and an estimate of the depth at which that ray intersects the scene. This adaptive interpolation method antialiases out-of-focus regions, while keeping in-focus regions sharp. We test the proposed method on images from a Lytro camera and compare our results with those from the Lytro SDK. Additionally, we provide a thorough discussion of our calibration and preprocessing pipeline for this camera.
    No preview · Conference Paper · May 2014
  • Dilip Krishnan · Raanan Fattal · Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new multi-level preconditioning scheme for discrete Poisson equations that arise in various computer graphics applications such as colorization, edge-preserving decomposition for two-dimensional images, and geodesic distances and diffusion on three-dimensional meshes. Our approach interleaves the selection of fine-and coarse-level variables with the removal of weak connections between potential fine-level variables (sparsification) and the compensation for these changes by strengthening nearby connections. By applying these operations before each elimination step and repeating the procedure recursively on the resulting smaller systems, we obtain a highly efficient multi-level preconditioning scheme with linear time and memory requirements. Our experiments demonstrate that our new scheme outperforms or is comparable with other state-of-the-art methods, both in terms of operation count and wall-clock time. This speedup is achieved by the new method's ability to reduce the condition number of irregular Laplacian matrices as well as homogeneous systems. It can therefore be used for a wide variety of computational photography problems, as well as several 3D mesh processing tasks, without the need to carefully match the algorithm to the problem characteristics.
    No preview · Article · Jul 2013 · ACM Transactions on Graphics
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we publish selected posts or excerpts.twitterFollow us on Twitter ...
    No preview · Article · Nov 2012 · Communications of the ACM
  • Source
    Dilip Krishnan · Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper unifies multigrid and multilevel (hierarchical) preconditioners, two widely-used approaches for solving computational photography and other computer graphics simulation problems. It provides detailed experimental comparisons of these techniques and their variants, including an analysis of relative computational costs and how these impact practical algorithm performance. We derive both theoretical convergence rates based on the condition numbers of the systems and their preconditioners, and empirical convergence rates drawn from real-world problems. We also develop new techniques for sparsifying higher connectivity problems, and compare our techniques to existing and newly developed variants such as algebraic and combinatorial multigrid. Our experimental results demonstrate that, except for highly irregular problems, adaptive hierarchical basis function preconditioners generally outperform alternative multigrid techniques, especially when computational complexity is taken into account.
    Preview · Article · Dec 2011 · ACM Transactions on Graphics
  • Source
    Richard Szeliski · Matthew Uyttendaele · Drew Steedly
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a technique for fast Poisson blending and gradient domain compositing. Instead of using a single piecewise-smooth offset map to perform the blending, we associate a separate map with each input source image. Each individual offset map is itself smoothly varying and can therefore be represented using a low-dimensional spline. The resulting linear system is much smaller than either the original Poisson system or the quadtree spline approximation of a single (unified) offset map. We demonstrate the speed and memory improvements available with our system and apply it to large panoramas. We also show how robustly modeling the multiplicative gain rather than the offset between overlapping images leads to improved results, and how adding a small amount of Laplacian pyramid blending improves the results in areas of inconsistent texture.
    Preview · Conference Paper · May 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Once we have extracted features from images, the next stage in many vision algorithms is to match these features across different images (Section 4.1.3). An important component of this matching is to verify whether the set of matching features is geometrically consistent, e.g., whether the feature displacements can be described by a simple 2D or 3D geometric transformation. The computed motions can then be used in other applications such as image stitching (Chapter 9) or augmented reality (Section 6.2.3)
    No preview · Article · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Now that we have seen how images are formed through the interaction of 3D scene elements, lighting, and camera optics and sensors, let us look at the first stage in most computer vision applications, namely the use of image processing to preprocess the image and convert it into a form suitable for further analysis. Examples of such operations include exposure correction and color balancing, the reduction of image noise, increasing sharpness, or straightening the image by rotating it (Figure 3.1). While some may consider image processing to be outside the purview of computer vision, most computer vision applications, such as computational photography and even recognition, require care in designing the image processing stages in order to achieve acceptable results.
    No preview · Chapter · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature detection and matching are an essential component of many computer vision applications. Consider the two pairs of images shown in Figure 4.2. For the first pair, we may wish to align the two images so that they can be seamlessly stitched into a composite mosaic (Chapter 9). For the second pair, we may wish to establish a dense set of correspondences so that a D model can be constructed or an in-between view can be generated (Chapter 11). In either case, what kinds of features should you detect and then match in order to establish such an alignment or set of correspondences? Think about this for a few moments before reading on.
    No preview · Article · Jan 2011
  • Source
    Xiangyang Lan · Larry Zitnick · Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe a model-based approach to object recognition. Spatial relationships between matching primitives are modeled using a purely local bi-gram representation consisting of transition probabilities between neighboring primitives. For matching primitives, sets of one, two or three features are used. The addition of doublets and triplets provides a highly discriminative matching primitive and a reference frame that is invariant to similarity or affine transformations. The recognition of new objects is accomplished by finding trees of matching primitives in an image that obey the model learned for a specific object class. We propose a greedy approach based on best-firstsearch expansion for creating trees. Experimental results are presented to demonstrate the ability of our method to recognize objects undergoing nonrigid transformations for both object instance and category recognition. Furthermore, we show results for both unsupervised and semi-supervised learning.
    Preview · Article · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Algorithms for aligning images and estimating motion in video sequences are among the most widely used in computer vision. For example, frame-rate image alignment is widely used in camcorders and digital cameras to implement their image stabilization (IS) feature
    No preview · Article · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Image segmentation is the task of finding groups of pixels that “go together”. In statistics, this problem is known as cluster analysis and is a widely studied area with hundreds of different algorithms (Jain and Dubes 1988; Kaufman and Rousseeuw 1990; Jain, Duin, and Mao 2000; Jain, Topchy, Law et al. 2004).
    No preview · Chapter · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Of all the visual tasks we might ask a computer to perform, analyzing a scene and recognizing all of the constituent objects remains the most challenging. While computers excel at accurately reconstructing the 3D shape of a scene from images taken from different views, they cannot name all the objects and animals present in a picture, even at the level of a twoyear- old child. There is not even any consensus among researchers on when this level of performance might be achieved.
    No preview · Chapter · Jan 2011
  • Source
    Richard Szeliski

    Preview · Article · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Stereo matching is the process of taking two or more images and estimating a D model of the scene by finding matching pixels in the images and converting their 2D positions into 3D depths. In Chapters 6-7, we described techniques for recovering camera positions and building sparse 3D models of scenes or objects. In this chapter, we address the question of how to build a more complete 3D model, e.g., a sparse or dense depth map that assigns relative depths to pixels in the input images. We also look at the topic of multi-view stereo algorithms that produce complete 3D volumetric or surface-based object models.
    No preview · Article · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In this book, we have covered a broad range of computer vision topics. Starting with image formation, we have seen how images can be pre-processed to remove noise or blur, segmented into regions, or converted into feature descriptors. Multiple images can be matched and registered, with the results used to estimate motion, track people, reconstruct 3D models, or merge images into more attractive and interesting composites and renderings. Images can also be analyzed to produce semantic descriptions of their content. However, the gap between computer and human performance in this area is still large and is likely to remain so for many years.
    No preview · Chapter · Jan 2011
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Before we can intelligently analyze and manipulate images, we need to establish a vocabulary for describing the geometry of a scene. We also need to understand the image formation process that produced a particular image given a set of lighting conditions, scene geometry, surface properties, and camera optics. In this chapter, we present a simplified model of such an image formation process
    No preview · Chapter · Jan 2011
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the last two decades, image-based rendering has emerged as one of the most exciting applications of computer vision (Kang, Li, Tong et al. 2006; Shum, Chan, and Kang 2007). In image-based rendering, 3D reconstruction techniques from computer vision are combined with computer graphics rendering techniques that use multiple views of a scene to create interactive photo-realistic experiences, such as the Photo Tourism system shown in Figure 13.1a. Commercial versions of such systems include immersive street-level navigation in on-line mapping systems and the creation of 3D Photosynths from large collections of casually acquired photographs.
    Preview · Chapter · Sep 2010
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In the previous chapter, we saw how 2D and 3D point sets could be aligned and how such alignments could be used to estimate both a camera’s pose and its internal calibration parameters. In this chapter, we look at the converse problem of estimating the locations of 3D points from multiple images given only a sparse set of correspondences between image features. While this process often involves simultaneously estimating both 3D geometry (structure) and camera pose (motion), it is commonly known as structure from motion (Ullman 1979)
    Preview · Chapter · Sep 2010
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: As we saw in the previous chapter, a variety of stereo matching techniques have been developed to reconstruct high quality 3D models from two or more images. However, stereo is just one of the many potential cues that can be used to infer shape from images. In this chapter, we investigate a number of such techniques, which include not only visual cues such as shading and focus, but also techniques for merging multiple range or depth images into 3D models, as well as techniques for reconstructing specialized models, such as heads, bodies, or architecture.
    No preview · Chapter · Sep 2010

Publication Stats

10k Citations
97.36 Total Impact Points

Institutions

  • 1997-2013
    • Microsoft
      • Microsoft Research
      Washington, West Virginia, United States
  • 2007
    • Massachusetts Institute of Technology
      • Department of Electrical Engineering and Computer Science
      Cambridge, MA, United States
  • 1986-2007
    • Carnegie Mellon University
      • • Robotics Institute
      • • Computer Science Department
      Pittsburgh, Pennsylvania, United States
  • 2001
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, WA, United States
  • 1996
    • Cornell University
      • Computer Science
      Ithaca, NY, United States
    • McGill University
      Montréal, Quebec, Canada
  • 1989-1990
    • Palo Alto Research Center
      Palo Alto, California, United States