High-Accuracy Stereo Depth Maps Using Structured Light
Recent progress in stereo algorithm performance is
quickly outpacing the ability of existing stereo data sets to
discriminate among the best-performing algorithms, moti-
vating the need for more challenging scenes with accurate
ground truth information. This paper describes a method
for acquiring high-complexity stereo image pairs with
pixel-accurate correspondence information using struc-
tured light. Unlike traditional range-sensing approaches,
our method does not require the calibration of the light
sources and yields registered disparity maps between all
pairs of cameras and illumination projectors. We present
new stereo data sets acquired with our method and demon-
strate their suitability for stereo algorithm evaluation. Our
results are available at http://www.middlebury.edu/stereo/.
The last few years have seen a resurgence of interest in
the development of highly accurate stereo correspondence
algorithms. Part of this interest has been spurred by funda-
mental breakthroughs in matching strategies and optimiza-
tion algorithms, and part of the interest is due to the exis-
tence of image databases that can be used to test and com-
pare such algorithms. Unfortunately, as algorithms have
improved, the difficulty of the existing test images has not
kept pace. The best-performing algorithms can now cor-
rectly match most of the pixels in data sets for which correct
(ground truth) disparity information is available .
In this paper, we devise a method to automatically
acquire high-complexity stereo image pairs with pixel-
accurate correspondence information. Previous approaches
have either relied on hand-labeling a small number of im-
ages consisting mostly of fronto-parallel planes , or set-
ting up scenes with a small number of slanted planes that
can be segmented and then matched reliably with para-
metric correspondence algorithms . Synthetic images
have also been suggested for testing stereo algorithm per-
formance [12, 9], but they typically are either too easy to
Figure 1. Experimental setup, showing the digital
camera mounted on a translation stage, the video pro-
jector, and the complex scene being acquired.
solve if noise, aliasing, etc. are not modeled, or too difficult,
e.g., due to complete lack of texture in parts of the scene.
In this paper, we use structured light to uniquely label
each pixel in a set of acquired images, so that correspon-
dence becomes (mostly) trivial, and dense pixel-accurate
correspondences can be automatically produced to act as
ground-truth data. Structured-light techniques rely on pro-
jecting one or more special light patterns onto a scene,
usually in order to directly acquire a range map of the
scene, typically using a single camera and a single projector
[1, 2, 3, 4, 5, 7, 11, 13, 18, 19, 20, 22, 23]. Random light
patterns have sometimes been used to provide artificial tex-
ture to stereo-based range sensing systems . Another
approach is to register range data with stereo image pairs,
but the range data is usually of lower resolution than the
images, and the fields of view may not correspond exactly,
leading to areas of the image for which no range data is
2. Overview of our approach
The goal of our technique is to produce pairs of real-
world images of complex scenes where each pixel is labeled
with its correspondence in the other image. These image
pairs can then be used to test the accuracy of stereo algo-
rithms relative to the known ground-truth correspondences.
t = 1
t = 2
t = 1
t = 2
Table1.Performance of SSD, dynamic programming,
and graph cut stereo methods on our data sets. The
table shows the percentage of pixels whose disparity
error is greater than threshold t for t=1,2.
pixels currently marked as unknown in our disparity maps,
using a combination of semi-automated and manual meth-
ods. It may also be possible to co-locate the cameras and
projectors using mirrors and to use Zickler et al.’s beautiful
results on reciprocity to deal with highlights .
While doing our research, we became aware of concur-
rent work aimed at acquiring high-quality correspondences
with active illumination that is applicable to both static and
dynamic scenes [8, 24]. Instead of decoding the projected
light patterns to yield pixel addresses in the projector, these
alternative methods simply temporally sum up the corre-
lation or error measures of all frames to directly compute
stereo disparities. This results in a simpler approach that
produces high-quality inter-camera correspondences. Un-
like our method, however, these techniques are not able to
fill in semi-occluded areas using projected disparities.
We close the paper with the following challenge. Can
one devise a comparable (or any) technique to acquire high-
quality ground truth data for real-time two-dimensional mo-
tion? The existence of such data would be of invaluable
use to the motion estimation community, just as we hope
the data presented here will aid in developing better stereo
mental in creating the scenes and capturing the images.
 G.J.AginandT.O.Binford. Computerdescriptionofcurved
objects. IEEE Trans. Comp., C-25(4):439–449, 1976.
 J. Batlle, E. Mouaddib, and J. Salvi. Recent progress in
coded structured light as a technique to solve the correspon-
dence problem: a survey. Pat. Recog., 31(7):963–982, 1998.
 P. Besl. Active optical range imaging sensors. In Jorge L.C.
Sanz, editor, Advances in Machine Vision, pp. 1–63, 1989.
 J.-Y. Bouguet and P. Perona. 3D photography on your desk.
In ICCV’98, pp. 43–50, 1998.
 C. Chen, Y. Hung, C. Chiang, and J. Wu. Range data acquisi-
tion using color structured lighting and stereo vision. Image
and Vision Computing, 15(6):445–456, 1997.
 Y.-Y. Chuang et al. Environment matting extensions: to-
wards higher accuracy and real-time capture. In SIGGRAPH
2000, pp. 121–130, 2000.
 B. Curless and M. Levoy.
through spacetime analysis. In ICCV’95, pp. 987–994, 1995.
 J. Davis, R. Ramamoorthi, and S. Rusinkiewicz. Spacetime
stereo: a unifying framework for depth from triangulation.
In CVPR 2003, 2003.
 T. Frohlinghaus and J. M. Buhmann. Regularizing phase-
based stereo. In ICPR’96, vol. A, pp. 451–455, 1996.
 G. Golub and C. F. Van Loan. Matrix Computation, third
edition. The John Hopkins University Press, 1996.
 G. H¨ ausler and D. Ritter. Parallel three-dimensional sensing
by color-coded triangulation. Applied Optics, 32(35):7164–
 W. Hoff and N. Ahuja. Surfaces from stereo: integrating fea-
ture matching, disparity estimation, and contour detection.
IEEE Trans. Pat. Anal. Mach. Int., 11(2):121–136, 1989.
 E. Horn and N. Kiryati. Toward optimal structured light pat-
terns. In Intl. Conf. Recent Advances in 3D Digital Imaging
and Modeling, pp. 28–35, 1997.
 S. B. Kang, J. Webb, L. Zitnick, and T. Kanade. A multi-
baseline stereo system with active illumination and real-time
image acquisition. In ICCV’95, pp. 88–93, 1995.
 C. Loop and Z. Zhang. Computing rectifying homographies
for stereo vision. In CVPR’99, volume I, pp. 125–131, 1999.
 J. Mulligan, V. Isler, and K. Daniilidis. Performance evalu-
ation of stereo for tele-presence. In ICCV 2001, vol. II, pp.
 Y. Nakamura, T. Matsuura, K. Satoh, and Y. Ohta. Occlusion
detectable stereo - occlusion patterns in camera matrix. In
CVPR’96, pp. 371–378, 1996.
 M. Proesmans, L. Van Gool, and F. Defoort. Reading be-
tween the lines - a method for extracting dynamic 3D with
texture. In ICCV’98, pp. 1081–1086, 1998.
 K. Pulli et al. Acquisition and visualization of colored 3D
objects. In ICPR’98, pp. 11-15, 1998.
 K. Sato and S. Inokuchi. Three-dimensional surface mea-
surement by space encoding range imaging. J. Robotic Sys-
tems, 2:27–39, 1985.
 D. Scharstein and R. Szeliski. A taxonomy and evaluation of
dense two-frame stereo correspondence algorithms. Intl. J.
Comp. Vis., 47(1):7–42, 2002.
 E. Schubert. Fast 3D object recognition using multiple color
coded illumination. In Proc. IEEE Conf. Acoustics, Speech,
and Signal Processing, pp. 3057–3060, 1997.
 P. Vuylsteke and A. Oosterlinck. Range image acquisition
with a single binary-encoded light pattern. IEEE Trans. Pat.
Anal. Mach. Int., 12(2):148–164, 1990.
 L. Zhang, B. Curless, and S. M. Seitz. Spacetime stereo:
shape recovery for dynamic scenes. In CVPR 2003, 2003.
 T. Zickler, P. N. Belhumeur, and D. J. Kriegman. Helmholtz
stereopsis: exploiting reciprocity for surface reconstruction.
In ECCV 2002, v. III, pp. 869–884, 2002.
Better optical triangulation