High-accuracy stereo depth maps using structured light

Conference Paper · July 2003with43 Reads
DOI: 10.1109/CVPR.2003.1211354 · Source: IEEE Xplore
Conference: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, Volume: 1
Abstract
Progress in stereo algorithm performance is quickly outpacing the ability of existing stereo data sets to discriminate among the best-performing algorithms, motivating the need for more challenging scenes with accurate ground truth information. This paper describes a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Unlike traditional range-sensing approaches, our method does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors. We present new stereo data sets acquired with our method and demonstrate their suitability for stereo algorithm evaluation. Our results are available at http://www.middlebury.edu/stereo/.
    • "In binary coding a temporal sequence of dark and bright stripes are projected onto an object, so that a given spatial location has a unique binary code that can be distinguished from other codes. A popular representative is based on Gray Codes (GC) [14,16,[34][35][36]. GC bases techniques have usually a lower spatial resolution, however, they may be used to unwrap the phase pattern obtained under a single period of sinusoidal illumination. "
    [Show abstract] [Hide abstract] ABSTRACT: Structured light illumination is a well-established technology for noncontact 3D surface measurements. A common challenge in those systems is to obtain the absolute surface information using few measurement frames. This work discusses techniques based on the projection of multiple sinusoidal fringe patterns with different fringe period, as well as the projection of intensity discrete Gray Code and grey-level coded patterns. The use of sinusoidal multi-frequency techniques has been since years an on-going area of research, where various algorithms have been developed based on beats, look-up tables, or number-theoretical approaches. This work shows that a related technique, the so-called algebraic reconstruction technique that is borrowed from the area of multi-wavelength interferometry can be used for this purpose. This approach provides a robust analytical solution to the phase-unwrapping problem. However, this work argues that despite these advances, the acquisition of additional phase maps obtained with different fringe periods requires too many measurement frames, and hence is inefficient. Motivated by that, this work proposes a new grey level coding scheme that uses only few measurement frames, overcomes typical defocus errors, and has an error detecting feature. The latter feature makes the need of separate error detecting algorithms obsolete. This so-called closed-loop space filling curve can be implemented with an arbitrary number of N grey-levels enabling to code up to (2N) code-words. The performance of this so-called closed-loop space filling curve is demonstrated using experimental data.
    Article · Sep 2016
    • "These competitive algorithms belong to the upsampling type; consequently, the computational cost is low. We use five test images in Middlebury stereo datasets [23, 24]. For our simulation, we generate low-resolution depth maps from the ground truth by using nearest neighbor sampling. "
    Full-text · Conference Paper · Jul 2016 · International Journal of Computer Vision
    • "The Γ is an Intelligent Cost Function (ICF) [17] which is a type of robust scoring function using Gaussian Processes to reflect the distribution of motion errors in real scenarios. This ICF is trained using the Middlebury stereo dataset [44] . Real correspondences are extracted using the ground truth, alongside deliberate " erroneous matches " . "
    [Show abstract] [Hide abstract] ABSTRACT: Action recognition “in the wild” is extremely challenging, particularly when complex 3D actions are projected down to the image plane, losing a great deal of information. The recent growth of 3D data in broadcast content and commercial depth sensors, makes it possible to overcome this. However, there is little work examining the best way to exploit this new modality. In this paper we introduce the Hollywood 3D benchmark, which is the first dataset containing “in the wild” action footage including 3D data. This dataset consists of 650 stereo video clips across 14 action classes, taken from Hollywood movies. We provide stereo calibrations and depth reconstructions for each clip. We also provide an action recognition pipeline, and propose a number of specialised depth-aware techniques including five interest point detectors and three feature descriptors. Extensive tests allow evaluation of different appearance and depth encoding schemes. Our novel techniques exploiting this depth allow us to reach performance levels more than triple those of the best baseline algorithm using only appearance information. The benchmark data, code and calibrations are all made available to the community.
    Full-text · Article · Jun 2016
Show more