R. Szeliski

Dechert LLP, New York City, NY, USA

Are you R. Szeliski?

Claim your profile

Publications (51)42.2 Total impact

  • Source
    Article: Image Restoration by Matching Gradient Distributions
    [show abstract] [hide abstract]
    ABSTRACT: The restoration of a blurry or noisy image is commonly performed with a MAP estimator, which maximizes a posterior probability to reconstruct a clean image from a degraded image. A MAP estimator, when used with a sparse gradient image prior, reconstructs piecewise smooth images and typically removes textures that are important for visual realism. We present an alternative deconvolution method called iterative distribution reweighting (IDR) which imposes a global constraint on gradients so that a reconstructed image should have a gradient distribution similar to a reference distribution. In natural images, a reference distribution not only varies from one image to another, but also within an image depending on texture. We estimate a reference distribution directly from an input image for each texture segment. Our algorithm is able to restore rich mid-frequency textures. A large-scale user study supports the conclusion that our algorithm improves the visual realism of reconstructed images compared to those of MAP estimators.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 05/2012; · 4.91 Impact Factor
  • Source
    Conference Proceeding: Structure from motion for scenes with large duplicate structures
    [show abstract] [hide abstract]
    ABSTRACT: Most existing structure from motion (SFM) approaches for unordered images cannot handle multiple instances of the same structure in the scene. When image pairs containing different instances are matched based on visual similarity, the pairwise geometric relations as well as the correspondences inferred from such pairs are erroneous, which can lead to catastrophic failures in the reconstruction. In this paper, we investigate the geometric ambiguities caused by the presence of repeated or duplicate structures and show that to disambiguate between multiple hypotheses requires more than pure geometric reasoning. We couple an expectation maximization (EM)-based algorithm that estimates camera poses and identifies the false match-pairs with an efficient sampling method to discover plausible data association hypotheses. The sampling method is informed by geometric and image-based cues. Our algorithm usually recovers the correct data association, even in the presence of large numbers of false pairwise matches.
    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on; 07/2011
  • Conference Proceeding: Fast Poisson blending using multi-splines
    R. Szeliski, M. Uyttendaele, D. Steedly
    [show abstract] [hide abstract]
    ABSTRACT: We present a technique for fast Poisson blending and gradient domain compositing. Instead of using a single piecewise-smooth offset map to perform the blending, we associate a separate map with each input source image. Each individual offset map is itself smoothly varying and can therefore be represented using a low-dimensional spline. The resulting linear system is much smaller than either the original Poisson system or the quadtree spline approximation of a single (unified) offset map. We demonstrate the speed and memory improvements available with our system and apply it to large panoramas. We also show how robustly modeling the multiplicative gain rather than the offset between overlapping images leads to improved results, and how adding a small amount of Laplacian pyramid blending improves the results in areas of inconsistent texture.
    Computational Photography (ICCP), 2011 IEEE International Conference on; 05/2011
  • Source
    Article: Scene Reconstruction and Visualization From Community Photo Collections
    [show abstract] [hide abstract]
    ABSTRACT: There are billions of photographs on the Internet, representing an extremely large, rich, and nearly comprehensive visual record of virtually every famous place on Earth. Unfortunately, these massive community photo collections are almost completely unstructured, making it very difficult to use them for applications such as the virtual exploration of our world. Over the past several years, advances in computer vision have made it possible to automatically reconstruct 3-D geometry - including camera positions and scene models - from these large, diverse photo collections. Once the geometry is known, we can recover higher level information from the spatial distribution of photos, such as the most common viewpoints and paths through the scene. This paper reviews recent progress on these challenging computer vision problems, and describes how we can use the recovered structure to turn community photo collections into immersive, interactive 3-D experiences.
    Proceedings of the IEEE 09/2010; · 6.81 Impact Factor
  • Conference Proceeding: A content-aware image prior
    [show abstract] [hide abstract]
    ABSTRACT: In image restoration tasks, a heavy-tailed gradient distribution of natural images has been extensively exploited as an image prior. Most image restoration algorithms impose a sparse gradient prior on the whole image, reconstructing an image with piecewise smooth characteristics. While the sparse gradient prior removes ringing and noise artifacts, it also tends to remove mid-frequency textures, degrading the visual quality. We can attribute such degradations to imposing an incorrect image prior. The gradient profile in fractal-like textures, such as trees, is close to a Gaussian distribution, and small gradients from such regions are severely penalized by the sparse gradient prior. To address this issue, we introduce an image restoration algorithm that adapts the image prior to the underlying texture. We adapt the prior to both low-level local structures as well as mid-level textural characteristics. Improvements in visual quality is demonstrated on deconvolution and denoising tasks.
    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on; 07/2010
  • Article: Alignment of 3D point clouds to overhead images
    [show abstract] [hide abstract]
    ABSTRACT: We address the problem of automatically aligning structure-from-motion reconstructions to overhead images, such as satellite images, maps and floor plans, generated from an orthographic camera. We compute the optimal alignment using an objective function that matches 3D points to image edges and imposes free space constraints based on the visibility of points in each camera. We demonstrate the accuracy of our alignment algorithm on several outdoor and indoor scenes using both satellite and floor plan images. We also present an application of our technique, which uses a labeled overhead image to automatically tag the input photo collection with textual information.
    2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 06/2009;
  • Article: Manhattan-world stereo
    [show abstract] [hide abstract]
    ABSTRACT: Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless, painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan World scenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.
    2012 IEEE Conference on Computer Vision and Pattern Recognition. 06/2009;
  • Source
    Article: Automatic Estimation and Removal of Noise from a Single Image
    [show abstract] [hide abstract]
    ABSTRACT: Image denoising algorithms often assume an additive white Gaussian noise (AWGN) process that is independent of the actual RGB values. Such approaches cannot effectively remove color noise produced by today's CCD digital camera. In this paper, we propose a unified framework for two tasks: automatic estimation and removal of color noise from a single image using piecewise smooth image models. We introduce the noise level function (NLF), which is a continuous function describing the noise level as a function of image brightness. We then estimate an upper bound of the real NLF by fitting a lower envelope to the standard deviations of per-segment image variances. For denoising, the chrominance of color noise is significantly removed by projecting pixel values onto a line fit to the RGB values in each segment. Then, a Gaussian conditional random field (GCRF) is constructed to obtain the underlying clean image from the noisy input. Extensive experiments are conducted to test the proposed algorithm, which is shown to outperform state-of-the-art denoising algorithms.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 03/2008; 30(2):299-314. · 4.91 Impact Factor
  • Conference Proceeding: A Database and Evaluation Methodology for Optical Flow
    [show abstract] [hide abstract]
    ABSTRACT: The quantitative evaluation of optical flow algorithms by Barron et al. led to significant advances in the performance of optical flow methods. The challenges for optical flow today go beyond the datasets and evaluation methods proposed in that paper and center on problems associated with nonrigid motion, real sensor noise, complex natural scenes, and motion discontinuities. Our goal is to establish a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: sequences with nonrigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture; realistic synthetic sequences; high frame-rate video used to study interpolation error; and modified stereo sequences of static scenes. In addition to the average angular error used in Barron et al., we compute the absolute flow endpoint error, measures for frame interpolation error, improved statistics, and flow accuracy at motion boundaries and in textureless regions. We evaluate the performance of several well-known methods on this data to establish the current state of the art. Our database is freely available on the Web together with scripts for scoring and publication of the results at http://vision.middlebury.edu/flow/.
    Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on; 11/2007
  • Source
    Article: The Moment Camera
    M.F. Cohen, R. Szeliski
    [show abstract] [hide abstract]
    ABSTRACT: Future cameras are used to "capture the moment" not just the instant when the shutter opens. The moment camera gathers significantly more data than is needed for a single image. This data, coupled with computational photography and user-assisted algorithms, provide powerful new paradigms for image making. The camera constantly records time slices of imagery. Although the input to the moment camera creates a spacetime slab, the moment's output typically consists of a single image. Thus, the processing primarily selects the color for each output pixel given the set of input images in the spacetime slab
    Computer 09/2006; · 1.47 Impact Factor
  • Source
    Conference Proceeding: Noise Estimation from a Single Image
    [show abstract] [hide abstract]
    ABSTRACT: In order to work well, many computer vision algorithms require that their parameters be adjusted according to the image noise level, making it an important quantity to estimate. We show how to estimate an upper bound on the noise level from a single image based on a piecewise smooth image prior model and measured CCD camera response functions. We also learn the space of noise level functions how noise level changes with respect to brightness and use Bayesian MAP inference to infer the noise level function from a single image. We illustrate the utility of this noise estimation for two algorithms: edge detection and featurepreserving smoothing through bilateral filtering. For a variety of different noise levels, we obtain good results for both these algorithms with no user-specified inputs.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 07/2006
  • Source
    Conference Proceeding: A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 07/2006
  • Source
    Conference Proceeding: Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures
    [show abstract] [hide abstract]
    ABSTRACT: Most algorithms for 3D reconstruction from images use cost functions based on SSD, which assume that the surfaces being reconstructed are visible to all cameras. This makes it difficult to reconstruct objects which are partially occluded. Recently, researchers working with large camera arrays have shown it is possible to "see through" occlusions using a technique called synthetic aperture focusing. This suggests that we can design alternative cost functions that are robust to occlusions using synthetic apertures. Our paper explores this design space. We compare classical shape from stereo with shape from synthetic aperture focus. We also describe two variants of multi-view stereo based on color medians and entropy that increase robustness to occlusions. We present an experimental comparison of these cost functions on complex light fields, measuring their accuracy against the amount of occlusion.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 02/2006
  • Conference Proceeding: Seamless Image Stitching of Scenes with Large Motions and Exposure Differences
    A. Eden, M. Uyttendaele, R. Szeliski
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a technique to automatically stitch multiple images at varying orientations and exposures to create a composite panorama that preserves the angular extent and dynamic range of the inputs. The main contribution of our method is that it allows for large exposure differences, large scene motion or other misregistrations between frames and requires no extra camera hardware. To do this, we introduce a two-step graph cut approach. The purpose of the first step is to fix the positions of moving objects in the scene. In the second step, we fill in the entire available dynamic range. We introduce data costs that encourage consistency and higher signal-to-noise ratios, and seam costs that encourage smooth transitions. Our method is simple to implement and effective. We demonstrate the effectiveness of our approach on several input sets with varying exposures and camera orientations.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 02/2006
  • Source
    Conference Proceeding: Efficiently registering video into panoramic mosaics
    D. Steedly, C. Pal, R. Szeliski
    [show abstract] [hide abstract]
    ABSTRACT: We present an automatic and efficient method to register and stitch thousands of video frames into a large panoramic mosaic. Our method preserves the robustness and accuracy of image stitchers that match all pairs of images while utilizing the ordering information provided by video. We reduce the cost of searching for matches between video frames by adaptively identifying key frames based on the amount of image-to-image overlap. Key frames are matched to all other key frames, but intermediate video frames are only matched to temporally neighboring key frames and intermediate frames. Image orientations can be estimated from this sparse set of matches in time quadratic to cubic in the number of key frames but only linear in the number of intermediate frames. Additionally, the matches between pairs of images are compressed by replacing measurements within small windows in the image with a single representative measurement. We show that this approach substantially reduces the time required to estimate the image orientations with minimal loss of accuracy. Finally, we demonstrate both the efficiency and quality of our results by registering several long video sequences.
    Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on; 11/2005
  • Source
    Conference Proceeding: Multi-image matching using multi-scale oriented patches
    M. Brown, R. Szeliski, S. Winder
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes a novel multi-view matching framework based on a new type of invariant feature. Our features are located at Harris corners in discrete scale-space and oriented using a blurred local gradient. This defines a rotationally invariant frame in which we sample a feature descriptor, which consists of an 8 × 8 patch of bias/gain normalised intensity values. The density of features in the image is controlled using a novel adaptive non-maximal suppression algorithm, which gives a better spatial distribution of features than previous approaches. Matching is achieved using a fast nearest neighbour algorithm that indexes features based on their low frequency Haar wavelet coefficients. We also introduce a novel outlier rejection procedure that verifies a pairwise feature match based on a background distribution of incorrect feature matches. Feature matches are refined using RANSAC and used in an automatic 2D panorama stitcher that has been extensively tested on hundreds of sample inputs.
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on; 07/2005
  • Source
    Article: Image-based interactive exploration of real-world environments
    [show abstract] [hide abstract]
    ABSTRACT: Interactive scene walkthroughs have long been an important computer graphics application area. More recently, researchers have developed techniques for constructing photorealistic 3D architectural models from real-world images. We present an image-based rendering system that brings us a step closer to a compelling sense of being there. Whereas many previous systems have used still photography and 3D scene modeling, we avoid explicit 3D reconstruction because it tends to be brittle. Our system is not the first to propose interactive video-based tours. We believe, however, that our system is the first to deliver fully interactive, photorealistic image-based tours on a personal computer at or above broadcast video resolutions and frame rates. Moreover, to our knowledge, no other tour provides the same rich set of interactions or visually complex environments.
    IEEE Computer Graphics and Applications 06/2004; 24(3):52- 63. · 1.41 Impact Factor
  • Article: Stereo reconstruction from multiperspective panoramas
    [show abstract] [hide abstract]
    ABSTRACT: A new approach to computing a panoramic (360 degrees) depth map is presented in this paper. Our approach uses a large collection of images taken by a camera whose motion has been constrained to planar concentric circles. We resample regular perspective images to produce a set of multiperspective panoramas and then compute depth maps directly from these resampled panoramas. Our panoramas sample uniformly in three dimensions: rotation angle, inverse radial distance, and vertical elevation. The use of multiperspective panoramas eliminates the limited overlap present in the original input images and, thus, problems as in conventional multibaseline stereo can be avoided. Our approach differs from stereo matching of single-perspective panoramic images taken from different locations, where the epipolar constraints are sine curves. For our multiperspective panoramas, the epipolar geometry, to the first order approximation, consists of horizontal lines. Therefore, any traditional stereo algorithm can be applied to multiperspective panoramas with little modification. In this paper, we describe two reconstruction algorithms. The first is a cylinder sweep algorithm that uses a small number of resampled multiperspective panoramas to obtain dense 3D reconstruction. The second algorithm, in contrast, uses a large number of multiperspective panoramas and takes advantage of the approximate horizontal epipolar geometry inherent in multiperspective panoramas. It comprises a novel and efficient 1D multibaseline matching technique, followed by tensor voting to extract the depth surface. Experiments show that our algorithms are capable of producing comparable high quality depth maps which can be used for applications such as view interpolation.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 02/2004; 26(1):45-62. · 4.91 Impact Factor
  • Conference Proceeding: Stereo matching with reflections and translucency
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we address the stereo matching problem in the presence of reflections and translucency, where image formation can be modeled as the additive superposition of layers at different depth. The presence of such effects violates the Lambertian assumption underlying traditional stereo vision algorithms, making it impossible to recover component depths using direct color matching based methods. We develop several techniques to estimate both depths and colors of the component layers. Depth hypotheses are enumerated in pairs, one from each layer, in a nested plane sweep. For each pair of depth hypotheses, we compute a component-color-independent matching error per pixel, using a spatial-temporal differencing technique. We then use graph cut optimization to solve for the depths of both layers. This is followed by an iterative color update algorithm whose convergence is proven in our paper. We show convincing results of depth and color estimates for both synthetic and real image sequences.
    Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on; 07/2003
  • Source
    Conference Proceeding: High-accuracy stereo depth maps using structured light
    D. Scharstein, R. Szeliski
    [show abstract] [hide abstract]
    ABSTRACT: Progress in stereo algorithm performance is quickly outpacing the ability of existing stereo data sets to discriminate among the best-performing algorithms, motivating the need for more challenging scenes with accurate ground truth information. This paper describes a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Unlike traditional range-sensing approaches, our method does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors. We present new stereo data sets acquired with our method and demonstrate their suitability for stereo algorithm evaluation. Our results are available at http://www.middlebury.edu/stereo/.
    Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on; 07/2003

Institutions

  • 2012
    • Dechert LLP
      New York City, NY, USA
  • 2008
    • Massachusetts Institute of Technology
      Cambridge, MA, USA
  • 1996–2007
    • Microsoft
      • Microsoft Research
      Washington, WV, USA
    • Cornell University
      • Computer Science
      Ithaca, NY, USA
  • 2006
    • Stanford University
      Palo Alto, CA, USA
  • 2004
    • The Hong Kong University of Science and Technology
      • Department of Computer Science and Engineering
      Kowloon, Hong Kong
  • 2003
    • Carnegie Mellon University
      • Robotics Institute
      Pittsburgh, PA, USA
    • Middlebury College
      Middlebury, VT, USA
  • 2001
    • University of Washington Seattle
      • Department of Computer Science and Engineering
      Seattle, WA, USA
  • 1998
    • Columbia University
      • Department of Computer Science
      New York City, NY, USA