High-accuracy stereo depth maps using structured light
Middlebury Coll., VT, USADOI: 10.1109/CVPR.2003.1211354 Conference: Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, Volume: 1
Source: IEEE Xplore
Progress in stereo algorithm performance is quickly outpacing the ability of existing stereo data sets to discriminate among the best-performing algorithms, motivating the need for more challenging scenes with accurate ground truth information. This paper describes a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Unlike traditional range-sensing approaches, our method does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors. We present new stereo data sets acquired with our method and demonstrate their suitability for stereo algorithm evaluation. Our results are available at http://www.middlebury.edu/stereo/.
High-Accuracy Stereo Depth Maps Using Structured Light
Recent progress in stereo algorithm performance is
quickly outpacing the ability of existing stereo data sets to
discriminate among the best-performing algorithms, moti-
vating the need for more challenging scenes with accurate
ground truth information. This paper describes a method
for acquiring high-complexity stereo image pairs with
pixel-accurate correspondence information using struc-
tured light. Unlike traditional range-sensing approaches,
our method does not require the calibration of the light
sources and yields registered disparity maps between all
pairs of cameras and illumination projectors. We present
new stereo data sets acquired with our method and demon-
strate their suitability for stereo algorithm evaluation. Our
results are available at http://www.middlebury.edu/stereo/.
The last few years have seen a resurgence of interest in
the development of highly accurate stereo correspondence
algorithms. Part of this interest has been spurred by funda-
mental breakthroughs in matching strategies and optimiza-
tion algorithms, and part of the interest is due to the exis-
tence of image databases that can be used to test and com-
pare such algorithms. Unfortunately, as algorithms have
improved, the difﬁculty of the existing test images has not
kept pace. The best-performing algorithms can now cor-
rectly match most of the pixels in data sets for which correct
(ground truth) disparity information is available .
In this paper, we devise a method to automatically
acquire high-complexity stereo image pairs with pixel-
accurate correspondence information. Previous approaches
have either relied on hand-labeling a small number of im-
ages consisting mostly of fronto-parallel planes , or set-
ting up scenes with a small number of slanted planes that
can be segmented and then matched reliably with para-
metric correspondence algorithms . Synthetic images
have also been suggested for testing stereo algorithm per-
formance [12, 9], but they typically are either too easy to
Figure 1. Experimental setup, showing the digital
camera mounted ona translation stage, the video pro-
jector, and the complex scene being acquired.
solve if noise, aliasing, etc. are not modeled, or too difﬁcult,
e.g., due to complete lack of texture in parts of the scene.
In this paper, we use structured light to uniquely label
each pixel in a set of acquired images, so that correspon-
dence becomes (mostly) trivial, and dense pixel-accurate
correspondences can be automatically produced to act as
ground-truth data. Structured-light techniques rely on pro-
jecting one or more special light patterns onto a scene,
usually in order to directly acquire a range map of the
scene, typically using a single camera and a single projector
[1, 2, 3, 4, 5, 7, 11, 13, 18, 19, 20, 22, 23]. Random light
patterns have sometimes been used to provide artiﬁcial tex-
ture to stereo-based range sensing systems . Another
approach is to register range data with stereo image pairs,
but the range data is usually of lower resolution than the
images, and the ﬁelds of view may not correspond exactly,
leading to areas of the image for which no range data is
2. Overview of our approach
The goal of our technique is to produce pairs of real-
world images of complex scenes where each pixel is labeled
with its correspondence in the other image. These image
pairs can then be used to test the accuracy of stereo algo-
rithms relative to the known ground-truth correspondences.
You are reading a preview. Would you like to access the full-text?
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.
[Show abstract] [Hide abstract] ABSTRACT: Structured light illumination is a well-established technology for noncontact 3D surface measurements. A common challenge in those systems is to obtain the absolute surface information using few measurement frames. This work discusses techniques based on the projection of multiple sinusoidal fringe patterns with different fringe period, as well as the projection of intensity discrete Gray Code and grey-level coded patterns. The use of sinusoidal multi-frequency techniques has been since years an on-going area of research, where various algorithms have been developed based on beats, look-up tables, or number-theoretical approaches. This work shows that a related technique, the so-called algebraic reconstruction technique that is borrowed from the area of multi-wavelength interferometry can be used for this purpose. This approach provides a robust analytical solution to the phase-unwrapping problem. However, this work argues that despite these advances, the acquisition of additional phase maps obtained with different fringe periods requires too many measurement frames, and hence is inefficient. Motivated by that, this work proposes a new grey level coding scheme that uses only few measurement frames, overcomes typical defocus errors, and has an error detecting feature. The latter feature makes the need of separate error detecting algorithms obsolete. This so-called closed-loop space filling curve can be implemented with an arbitrary number of N grey-levels enabling to code up to (2N) code-words. The performance of this so-called closed-loop space filling curve is demonstrated using experimental data.
- "In binary coding a temporal sequence of dark and bright stripes are projected onto an object, so that a given spatial location has a unique binary code that can be distinguished from other codes. A popular representative is based on Gray Codes (GC) [14,16,. GC bases techniques have usually a lower spatial resolution, however, they may be used to unwrap the phase pattern obtained under a single period of sinusoidal illumination. "
[Show abstract] [Hide abstract] ABSTRACT: The seminal multiple-view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis (MVS) methodology. The somewhat small size and variability of these data sets, however, limit their scope and the conclusions that can be derived from them. To facilitate further development within MVS, we here present a new and varied data set consisting of 80 scenes, seen from 49 or 64 accurate camera positions. This is accompanied by accurate structured light scans for reference and evaluation. In addition all images are taken under seven different lighting conditions. As a benchmark and to validate the use of our data set for obtaining reasonable and statistically significant findings about MVS, we have applied the three state-of-the-art MVS algorithms by Campbell et al., Furukawa et al., and Tola et al. to the data set. To do this we have extended the evaluation protocol from the Middlebury evaluation, necessitated by the more complex geometry of some of our scenes. The data set and accompanying evaluation framework are made freely available online. Based on this evaluation, we are able to observe several characteristics of state-of-the-art MVS, e.g. that there is a tradeoff between the quality of the reconstructed 3D points (accuracy) and how much of an object’s surface is captured (completeness). Also, several issues that we hypothesized would challenge MVS, such as specularities and changing lighting conditions did not pose serious problems. Our study finds that the two most pressing issues for MVS are lack of texture and meshing (forming 3D points into closed triangulated surfaces).
- "The reference points, obtained from the structured light scans, are based on binary gray code, which is recommended as being one of the most precise structured light methods (Scharstein and Szeliski 2003; Salvi et al. 2004; Salvi et al. 2010). The scans are, however, not complete. "
[Show abstract] [Hide abstract] ABSTRACT: Due to raising system complexity and higher "time to market'' demands in industry, hardware development for fast image processing applications is becoming more and more important. In order to ease and accelerate the design flow, special frameworks aim to hide the HDL code from the developer. On the one hand, many frameworks generate HDL code from a programming language like C++ to synthesize hardware from a higher abstraction level. On the other hand, HDL libraries, which instantiate predefined hardware components, are utilized. In contrast to high level synthesis, hardware designs, resulting from such a library, will lead to resource utilizations close to hand written implementations. Therefore, we propose a library of highly configurable IP blocks and demonstrate how they can be used on different Altera and Xilinx FPGAs. Our blocks are designed in a generic way, which makes the design very flexible in several functional parameters. At the current stage of our block library, it is possible to synthesize hardware for common local operations like Sobel, Laplacian or Median filter, but also complex operations like stereo matching and Canny edge detector. Moreover, we designed an XML based language interface, that gives users, who have only low specific hardware knowledge, access to predefined filter operations. With these features a rapid implementation of image processing operators for FPGA designs becomes possible.
- "Compared to the HLS approaches our resulting clock frequencies remain constant on a high level with 395 MHz. Stereo Block Matching: For testing the functionality of the block matching technique we used image pairs from the authors of , which are widely used for benchmarking stereo matching algorithms. The resolution has been set to 450 × 375. "