R. Szeliski

Microsoft, Washington, West Virginia, United States

Are you R. Szeliski?

Claim your profile

Publications (96)70.77 Total impact

  • Dilip Krishnan, Raanan Fattal, Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new multi-level preconditioning scheme for discrete Poisson equations that arise in various computer graphics applications such as colorization, edge-preserving decomposition for two-dimensional images, and geodesic distances and diffusion on three-dimensional meshes. Our approach interleaves the selection of fine-and coarse-level variables with the removal of weak connections between potential fine-level variables (sparsification) and the compensation for these changes by strengthening nearby connections. By applying these operations before each elimination step and repeating the procedure recursively on the resulting smaller systems, we obtain a highly efficient multi-level preconditioning scheme with linear time and memory requirements. Our experiments demonstrate that our new scheme outperforms or is comparable with other state-of-the-art methods, both in terms of operation count and wall-clock time. This speedup is achieved by the new method's ability to reduce the condition number of irregular Laplacian matrices as well as homogeneous systems. It can therefore be used for a wide variety of computational photography problems, as well as several 3D mesh processing tasks, without the need to carefully match the algorithm to the problem characteristics.
    ACM Transactions on Graphics (TOG). 07/2013; 32(4).
  • Source
    Dilip Krishnan, Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper unifies multigrid and multilevel (hierarchical) preconditioners, two widely-used approaches for solving computational photography and other computer graphics simulation problems. It provides detailed experimental comparisons of these techniques and their variants, including an analysis of relative computational costs and how these impact practical algorithm performance. We derive both theoretical convergence rates based on the condition numbers of the systems and their preconditioners, and empirical convergence rates drawn from real-world problems. We also develop new techniques for sparsifying higher connectivity problems, and compare our techniques to existing and newly developed variants such as algebraic and combinatorial multigrid. Our experimental results demonstrate that, except for highly irregular problems, adaptive hierarchical basis function preconditioners generally outperform alternative multigrid techniques, especially when computational complexity is taken into account.
    ACM Transactions on Graphics 12/2011; 30:177. · 3.36 Impact Factor
  • Source
    R. Szeliski, M. Uyttendaele, D. Steedly
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a technique for fast Poisson blending and gradient domain compositing. Instead of using a single piecewise-smooth offset map to perform the blending, we associate a separate map with each input source image. Each individual offset map is itself smoothly varying and can therefore be represented using a low-dimensional spline. The resulting linear system is much smaller than either the original Poisson system or the quadtree spline approximation of a single (unified) offset map. We demonstrate the speed and memory improvements available with our system and apply it to large panoramas. We also show how robustly modeling the multiplicative gain rather than the offset between overlapping images leads to improved results, and how adding a small amount of Laplacian pyramid blending improves the results in areas of inconsistent texture.
    Computational Photography (ICCP), 2011 IEEE International Conference on; 05/2011
  • Source
    Xiangyang Lan, Larry Zitnick, Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe a model-based approach to object recognition. Spatial relationships between matching primitives are modeled using a purely local bi-gram representation consisting of transition probabilities between neighboring primitives. For matching primitives, sets of one, two or three features are used. The addition of doublets and triplets provides a highly discriminative matching primitive and a reference frame that is invariant to similarity or affine transformations. The recognition of new objects is accomplished by finding trees of matching primitives in an image that obey the model learned for a specific object class. We propose a greedy approach based on best-firstsearch expansion for creating trees. Experimental results are presented to demonstrate the ability of our method to recognize objects undergoing nonrigid transformations for both object instance and category recognition. Furthermore, we show results for both unsupervised and semi-supervised learning.
    01/2011;
  • Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: As we saw in the previous chapter, a variety of stereo matching techniques have been developed to reconstruct high quality 3D models from two or more images. However, stereo is just one of the many potential cues that can be used to infer shape from images. In this chapter, we investigate a number of such techniques, which include not only visual cues such as shading and focus, but also techniques for merging multiple range or depth images into 3D models, as well as techniques for reconstructing specialized models, such as heads, bodies, or architecture.
    09/2010: pages 505-541;
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the last two decades, image-based rendering has emerged as one of the most exciting applications of computer vision (Kang, Li, Tong et al. 2006; Shum, Chan, and Kang 2007). In image-based rendering, 3D reconstruction techniques from computer vision are combined with computer graphics rendering techniques that use multiple views of a scene to create interactive photo-realistic experiences, such as the Photo Tourism system shown in Figure 13.1a. Commercial versions of such systems include immersive street-level navigation in on-line mapping systems and the creation of 3D Photosynths from large collections of casually acquired photographs.
    09/2010: pages 543-573;
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: In the previous chapter, we saw how 2D and 3D point sets could be aligned and how such alignments could be used to estimate both a camera’s pose and its internal calibration parameters. In this chapter, we look at the converse problem of estimating the locations of 3D points from multiple images given only a sparse set of correspondences between image features. While this process often involves simultaneously estimating both 3D geometry (structure) and camera pose (motion), it is commonly known as structure from motion (Ullman 1979)
    09/2010: pages 303-334;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There are billions of photographs on the Internet, representing an extremely large, rich, and nearly comprehensive visual record of virtually every famous place on Earth. Unfortunately, these massive community photo collections are almost completely unstructured, making it very difficult to use them for applications such as the virtual exploration of our world. Over the past several years, advances in computer vision have made it possible to automatically reconstruct 3-D geometry - including camera positions and scene models - from these large, diverse photo collections. Once the geometry is known, we can recover higher level information from the spatial distribution of photos, such as the most common viewpoints and paths through the scene. This paper reviews recent progress on these challenging computer vision problems, and describes how we can use the recovered structure to turn community photo collections into immersive, interactive 3-D experiences.
    Proceedings of the IEEE 09/2010; · 6.91 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: View interpolation and image-based rendering algorithms often produce visual artifacts in regions where the 3D scene geometry is erroneous, uncertain, or incomplete. We introduce ambient point clouds constructed from colored pixels with uncertain depth, which help reduce these artifacts while providing non-photorealistic background coloring and emphasizing reconstructed 3D geometry. Ambient point clouds are created by randomly sampling colored points along the viewing rays associated with uncertain pixels. Our realtime rendering system combines these with more traditional rigid 3D point clouds and colored surface meshes obtained using multiview stereo. Our resulting system can handle larger-range view transitions with fewer visible artifacts than previous approaches.
    Fraunhofer IGD. 07/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless, painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan World scenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.
    Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 06/2009;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of automatically aligning structure-from-motion reconstructions to overhead images, such as satellite images, maps and floor plans, generated from an orthographic camera. We compute the optimal alignment using an objective function that matches 3D points to image edges and imposes free space constraints based on the visibility of points in each camera. We demonstrate the accuracy of our alignment algorithm on several outdoor and indoor scenes using both satellite and floor plan images. We also present an application of our technique, which uses a labeled overhead image to automatically tag the input photo collection with textual information.
    2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 06/2009;
  • Source
    Grant Schindler, Matthew Brown, Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: We look at the problem of location recognition in a large image dataset using a vocabulary tree. This entails finding the location of a query image in a large dataset containing 3 × 105 streetside images of a city. We investigate how the traditional invariant feature matching approach falls down as the size of the database grows. In particular we show that by carefully selecting the vocabulary using the most informative features, retrieval performance is significantly improved, allowing us to increase the number of database images by a factor of 10. We also introduce a generalization of the traditional vocabulary tree search algorithm which improves performance by effectively increasing the branch- ing factor of a fixed vocabulary tree.
    2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA; 01/2007
  • Source
    Frédo Durand, Richard Szeliski
    IEEE Computer Graphics and Applications 01/2007; 27(2):21-2. · 1.23 Impact Factor
  • Source
    M.F. Cohen, R. Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: Future cameras are used to "capture the moment" not just the instant when the shutter opens. The moment camera gathers significantly more data than is needed for a single image. This data, coupled with computational photography and user-assisted algorithms, provide powerful new paradigms for image making. The camera constantly records time slices of imagery. Although the input to the moment camera creates a spacetime slab, the moment's output typically consists of a single image. Thus, the processing primarily selects the color for each output pixel given the set of input images in the spacetime slab
    Computer 09/2006; · 1.68 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most algorithms for 3D reconstruction from images use cost functions based on SSD, which assume that the surfaces being reconstructed are visible to all cameras. This makes it difficult to reconstruct objects which are partially occluded. Recently, researchers working with large camera arrays have shown it is possible to "see through" occlusions using a technique called synthetic aperture focusing. This suggests that we can design alternative cost functions that are robust to occlusions using synthetic apertures. Our paper explores this design space. We compare classical shape from stereo with shape from synthetic aperture focus. We also describe two variants of multi-view stereo based on color medians and entropy that increase robustness to occlusions. We present an experimental comparison of these cost functions on complex light fields, measuring their accuracy against the amount of occlusion.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 02/2006
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper develops locally adapted hierarchical basis functions for effectively preconditioning large optimization problems that arise in computer graphics applications such as tone mapping, gradient- domain blending, colorization, and scattered data interpolation. By looking at the local structure of the coefficient matrix and p erform- ing a recursive set of variable eliminations, combined with a sim- plification of the resulting coarse level problems, we obtai n bases better suited for problems with inhomogeneous (spatially varying) data, smoothness, and boundary constraints. Our approach removes the need to heuristically adjust the optimal number of precondi- tioning levels, significantly outperforms previously prop osed ap- proaches, and also maps cleanly onto data-parallel architectures such as modern GPUs.
    ACM Trans. Graph. 01/2006; 25:1135-1143.
  • Source
    Richard Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: This tutorial reviews image alignment and image stitching algorithms. Image alignment (registration) algorithms can discover the large-scale (parametric) correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of large-scale panoramic photographs. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It closes with a discussion of open research problems in the area.
    Foundations and Trends in Computer Graphics and Vision. 01/2006; 2.
  • Source
    D. Steedly, C. Pal, R. Szeliski
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an automatic and efficient method to register and stitch thousands of video frames into a large panoramic mosaic. Our method preserves the robustness and accuracy of image stitchers that match all pairs of images while utilizing the ordering information provided by video. We reduce the cost of searching for matches between video frames by adaptively identifying key frames based on the amount of image-to-image overlap. Key frames are matched to all other key frames, but intermediate video frames are only matched to temporally neighboring key frames and intermediate frames. Image orientations can be estimated from this sparse set of matches in time quadratic to cubic in the number of key frames but only linear in the number of intermediate frames. Additionally, the matches between pairs of images are compressed by replacing measurements within small windows in the image with a single representative measurement. We show that this approach substantially reduces the time required to estimate the image orientations with minimal loss of accuracy. Finally, we demonstrate both the efficiency and quality of our results by registering several long video sequences.
    Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on; 11/2005
  • Source
    M. Brown, R. Szeliski, S. Winder
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a novel multi-view matching framework based on a new type of invariant feature. Our features are located at Harris corners in discrete scale-space and oriented using a blurred local gradient. This defines a rotationally invariant frame in which we sample a feature descriptor, which consists of an 8 × 8 patch of bias/gain normalised intensity values. The density of features in the image is controlled using a novel adaptive non-maximal suppression algorithm, which gives a better spatial distribution of features than previous approaches. Matching is achieved using a fast nearest neighbour algorithm that indexes features based on their low frequency Haar wavelet coefficients. We also introduce a novel outlier rejection procedure that verifies a pairwise feature match based on a background distribution of incorrect feature matches. Feature matches are refined using RANSAC and used in an automatic 2D panorama stitcher that has been extensively tested on hundreds of sample inputs.
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on; 07/2005
  • D. Steedly, C. Pal, R. Szeliski
    01/2005;

Publication Stats

7k Citations
70.77 Total Impact Points

Institutions

  • 1996–2013
    • Microsoft
      • Microsoft Research
      Washington, West Virginia, United States
    • Cornell University
      • Computer Science
      Ithaca, NY, United States
  • 2007
    • Massachusetts Institute of Technology
      • Department of Electrical Engineering and Computer Science
      Cambridge, MA, United States
  • 2006
    • Stanford University
      Palo Alto, California, United States
  • 2004
    • The Hong Kong University of Science and Technology
      • Department of Computer Science and Engineering
      Kowloon, Hong Kong
  • 2003
    • Carnegie Mellon University
      • Robotics Institute
      Pittsburgh, PA, United States
    • Middlebury College
      Middlebury, Indiana, United States
  • 2001
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, WA, United States
  • 1989
    • Palo Alto Research Center
      Palo Alto, California, United States