Conference PaperPDF Available

Stratified Dense Matching for Stereopsis in Complex Scenes

Authors:

Abstract and Figures

Local joint image modeling in stereo matching brings more discriminable and stable matching features. Such features reduce the need for strong prior models (continuity) and thus algorithms that are less prone to false positive artefacts in general complex scenes can be applied. One of the principal qual-ity factors in area-based dense stereo is the matching window shape. As it cannot be selected without having any initial matching hypothesis we pro-pose a stratified matching approach. The window adapts to high-correlation structures in disparity space found in pre-matching which is then followed by final matching. In a rigorous ground-truth experiment we show that Stratified Dense Matching is able to increase matching density 3×, matching accuracy 1.8×, and occlusion boundary detection 2× as compared to a fixed-size rect-angular windows algorithm. Performance on real outdoor complex scenes is also evaluated.
Content may be subject to copyright.
Stratified Dense Matching for Stereopsis
in Complex Scenes
Jana Kostkov´
a and Radim ˇ
S´
ara
Center for Machine Perception, Czech Technical University, Prague, Czech Republic
{kostkova,sara}@cmp.felk.cvut.cz,http://cmp.felk.cvut.cz
Abstract
Local joint image modeling in stereo matching brings more discriminable
and stable matching features. Such features reduce the need for strong prior
models (continuity) and thus algorithms that are less prone to false positive
artefacts in general complex scenes can be applied. One of the principal qual-
ity factors in area-based dense stereo is the matching window shape. As it
cannot be selected without having any initial matching hypothesis we pro-
pose a stratified matching approach. The window adapts to high-correlation
structures in disparity space found in pre-matching which is then followed by
final matching. In a rigorous ground-truth experiment we show that Stratified
Dense Matching is able to increase matching density 3×, matching accuracy
1.8×, and occlusion boundary detection 2×as compared to a fixed-size rect-
angular windows algorithm. Performance on real outdoor complex scenes is
also evaluated.
1 Introduction
The core problem of computational stereopsis is computing disparity map of the scene
which means (1) finding correspondences between binocularly visible points in the input
2-D images and (2) jointly segmenting the images to binocularly visible, half-occluded,
and mutually occluded regions.1There exist two main classes of applications the stereo
matching can be used for: view prediction and 3D structure reconstruction. In our re-
search, we are interested in structure reconstruction which imposes the following require-
ments on stereo matching results: it must not contain incorrect correspondences and the
occluded regions have to be identified accurately. The price paid for these requirements
is lower matching density (mainly in texture-less regions). Nevertheless, as our approach
belongs to the area-based stereo, the results are required to be as dense as possible.
In general, pixels having the most similar neighbourhoods (measured by various statis-
tics: SSD, SAD, NCC) are assigned as the corresponding pairs. In order to produce
accurate results, the matching features have to be as discriminable and stable as possi-
ble. By discriminability we mean the ability to recognize correct correspondences. By
stability we mean independence on distortions introduced by image projection. Hence,
the key-problem is the selection of suitable matching windows over which the statistics
are computed. The simplest approaches use centralized fixed-size rectangular windows.
1To visualize a pair of mutually occluded regions imagine taking a binocular peek through a keyhole. A real
example of this phenomenon is shown in the experiments, Sec. 5.
However, due to geometric distortions, this definition fails at curved surfaces and occlu-
sion boundaries. Image similarity computed over independent rectangular image windows
has been shown inferior to similarity computed over binocularly corresponding windows.
There exist several approaches trying to cope with this problem. Kanade proposed a
method [10], where windows adapt their size: at hypothesized boundaries they are small,
otherwise they remain large. Various versions of windows adapting their shape have also
been introduced [21, 4, 14, 16]. The resulting non-rectangular windows cover the scene
by independent patches (based on the reference image). However, they assume constant
disparity within the window. The specific problem at occlusion boundaries is often solved
by shifting the window away from the boundary to minimize the non-corresponding back-
ground part [2, 8, 15]. Some authors have tried to avoid the occlusion boundary problem
by segmentation [20, 22]: The windows are then defined not to cross a segment boundary.
In [16, 5], a matching process starts by finding the seeds (high-probability matches),
from which the final solution is spread out. Since the seeds determine the matching ac-
curacy, their correctness is crucial. The first stage we see as a kind of pre-matching. We
show this can be done in a much more general way. In [3], a disparity space has been
introduced. First, plausible disparities are computed for each pixel. Based on them, pixels
are assigned to disparity components, which are defined to be of constant disparity. The
final disparity at each pixel is selected by assigning that of its largest component.
We claim that in order to produce good results it is essential to assign the matching
over jointly discriminable and stable matching features. By ‘jointly’ we mean that both
images contribute to their definition. This seems to be impossible without first having the
correspondences. Therefore, we have come to the conclusion, that the matching process
should be algorithmically divided into two semi-independent stages, where the first one
(pre-matching) hypothesizes reliable matching features, while the second one resolves
ambiguities and establishes the final matching. This division follows from requirements
on these stages: the pre-matching is to produce dense results (low false-negative error) but
it need not be a one-to-one mapping. The final matching is to produce accurate results
(low mismatch and false-positive errors) and it is to be a one-to-one mapping.
In this paper, we follow this view and propose a straightforward method, the Strat-
ified Dense Matching. We pose the matching problem in disparity space and design a
full-generality pre-matching, which is used to define the windows: They adapt to 3D-
connected structures of high similarity, which we call disparity components. Such win-
dows are of various shapes, non-constant disparity and are not independent (they are
consistent at their overlap). The final matching problem is solved using the similarity
statistics re-computed over these adaptive windows.
2 Disparity Components
In our approach, the windows collectively adapt to the 3D-image of the scene in the
disparity space. It increases the probability that a high similarity implies a correct match.
As a consequence, the discriminability of matching features is improved, which results in
higher matching quality. Due to projective distortions this would be impossible to do in
input images. The window definition in disparity space enables to get not only windows
covering the same scene patches, but also its symmetric left-right and right-left form.
Disparity space Tis the set of all possible matches between two or among more
images. In the rectified binocular case it can be visualized as matching tables computed
separately for each pair of image rows and stacked on top of each other. Each matching
4-neighbourhood: constant disparity 20-neighbourhood: varying disparity
Figure 1: Neighbourhoods of a disparity space point (r,i,j)(empty blue circle): for constant dis-
parity components (left) and for varying disparity components (right).
table consists of similarity values evaluated on the Cartesian product of left and right
image pixels in the equivalent row (epipolar line). Matching table elements are called
(tentative) pairs. A part of the disparity space (matching tables for rows r1, rand
r+1) is shown in Fig. 1. Matching table rows represent positions iin the left image, the
columns represent positions jin the right image. Since the disparity space corresponds to
all possible pairwise optical ray intersections it is clear that a surface point neighbourhood
in the scene maps on disparity point neighbourhood.
Disparity components are defined in disparity space as connected structures of pairs
with high similarity values. The connectedness within a disparity component is defined
by a neighbourhood relation. Two high-similarity pairs (r,i,j)and (r0,i0,j0), where r,r0
denote corresponding image rows, i,i0columns in left image and j,j0columns in right
image, are neighbours in the disparity space if and only if (1) they are neighbours to each
other in the left or the right image, and (2) the difference of their disparities is smaller or
equal to a predefined value δ. In mathematical terms the neighbourhood relation can be
formulated in the following way:
Definition 1 The pairs (r,i,j)and (r0,i0,j0)in the disparity space are neighbours to each
other if and only if the following three conditions hold:
1. |rr0| ≤ 1,
2. |ii0| ≤ 1 or |jj0| ≤ 1 for r=r0,
i=i0or j=j0for r6=r0,
3. |d(r,i,j)d(r0,i0,j0)| ≤ δ,
where r,r0are the corresponding image rows, i,i0are the left-image positions, j,j0are the
right-image positions and d(r,i,j) = ij is the disparity of the point (r,i,j).
Using the neighbourhood relation recursively, the disparity components are traced out.
For each pair (r,i,j), the corresponding disparity component can be identified uniquely (a
single match can be part of at most one component). Based on the component, the shape
of the adaptive window is found (it brings entire image patches into correspondence).
The parameter δin the definition of the neighbourhood relation allows disparity varia-
tions within one disparity component. If we restrict our definition to points with the same
disparity (δ=0), we get constant disparity components with 4-neighbourhood relation,
see Fig. 1 (left). The 4-connected components correspond to planparallelity assumption,
proposed by Boykov [3]. In our approach, the difference of neighbouring pixel disparities
is allowed to be smaller or equal to one (δ=1). Consequently, we get varying-disparity
components with 20-neighbourhood relation, see Fig. 1 (right). This definition corre-
sponds to continuity assumption and allows to capture even small variations in disparity.
3 Stratified Dense Matching
In this section we overview the Stratified Dense Matching Algorithm. The input to this
algorithm is a pair of left and right rectified images and the output is the disparity map of
a scene. The algorithm consists of four steps: (1) pre-matching, (2) disparity component
tracing, (3) similarity value re-computation, and (4) final matching.
In the first step, an algorithm which is able to produce multivalued disparity maps
is run. These pre-matches segment the disparity space to a set of connected subsets by
eliminating the least number of low-similarity pairs. The requirements on this step are as
follows: (1) multivalued disparity map (to obtain all prospective matching hypotheses),
and (2) dense results (without unmatched rows or columns in the matching table). The
pre-matching step can be based both on global energy minimizations [9, 17, 11] or local
correlation methods [10, 18]. We have applied a local correlation method [18].
In the second step, connected disparity components are traced out on pre-matches
resulting from the first step. The tracing is based on applying recursively the above-
defined 20-neighbourhood relation. For each pre-match, the unique disparity component
is identified. Based on the disparity component, the pair of equivalent matching windows
is found in the input images. Note that it is not necessary to trace out the entire disparity
component, all we need is a set of n-th order neighbours for every pre-match (to get the
tracing computationally efficient).
In the third step, for each pre-match, the similarity statistics is recomputed using
the pair of equivalent matching windows resulting from the second step. In order to
get similarity values comparable in their statistical properties, for each pre-match only
a fixed-size disparity component neighbourhood is used to jointly define the windows.
The match similarity is re-computed only if the corresponding disparity component is
large enough (the minimal size is a parameter), otherwise the pre-match is discarded (to
suppress mismatches caused by noise or weak textures).
In the fourth step, the final univalued disparity map is computed using the re-com-
puted similarities. The most important requirement is a low error rate. Full density is not
strictly required. The results are desired to be as dense as possible, however. In principle,
various stereo matching algorithms can be used to compute the final matching. We have
selected Confidently Stable Matching [18] for its accuracy: it produces disparity maps
that are not necessarily dense but have very low error rates [13].
4 Implementation
Confidently Stable Matching (CSM) solves an optimization task which is defined on mu-
tual occlusion graph G= (T,E)in which the vertex set Tis the set of all tentative matches
(pairs) and (t,s)is an edge in Eif pairs tand sare mutually exclusive, i.e. cannot be both
elements of the same matching due to occlusion. We use uniqueness and ordering con-
straints as the occlusion model. Every pair tin Tthus has a set of competitors (neighbours
in G)N(t)which we call the inhibition zone of t.2Every pair tThas associated the
2Inhibition zone for matchings is as follows: if t= (i,j)then N(t) = {(k,l)|k=ior l=j,(k,l)6= (i,j)}.
value c(t)of match similarity statistics. We say a pair tTis confidently excluded by
another pair eTif (t,e)Eand c(t)c(e)(t,e). The value of (t,e)is related to
the confidence interval widths of c(t)and c(e). Confidently stable subset Mis the largest
subset of Tsuch that every pair not in Mhas either no unambiguous competitors in Mor
is confidently excluded by a competitor in M. Simply and somewhat imprecisely, all pairs
not in Mare either ambiguous in c(·)or confidently occluded by some strongly better
match in M. If exclusion takes into account uniqueness the stable subset is a (univalued)
matching but multivalued ‘matchings’ are also possible when the inhibition zone has a
finite extent as in Fig. 2. For precise definitions, existence and uniqueness theorems, and
the algorithm see [18]. The advantage of CSM is that it does not need a prior detection of
matchable image features: it automatically recovers them in the matching process.
To simplify the implementation we use stable matching
i
j
Figure 2: Inhibition zone
for pre-matching (black
circles) for a pair (i,j).
for the pre-matching step. It is defined as confidently stable
matching with (·,·)0. The corresponding algorithm is
considerably simpler. Multivalued disparity is achieved by
using the smallest inhibition zone possible, shown in Fig. 2.
For the component tracing we used a brute-force method:
For each pre-match, the neighbours are found by directly test-
ing all their possible positions (based on the defined neigh-
bourhood) without any optimization, which is very time con-
suming, but we suppose it can be speeded up about 100-fold.
In the third step, we did not recompute the similarities c(t)from image data, we
only averaged the values computed in the first step over neighbourhoods in the traced-out
disparity component.
There are a few procedure parameters which can be adjusted: disparity search range,
initial matching window size, minimal disparity component size, and the confidence level.
However, none of them (except for the search range) are critical for the matching process.
The default values are: 5×5 initial matching window, and minimal component size of 25
pixels. The confidence level, parametrized by two constants αand β[18], determines the
quality of the results. Its choice is left to the user (see the experiments).
On the Middlebury dataset [19] the running time of our current implementation av-
erages 5.2min, of which pre-matching is 2.1 sec, correlation computation 0.63 sec, final
matching 0.64sec and the rest is spent on component tracing.
5 Experiments
We demonstrate the disparity map improvement due to our adaptive windows over stan-
dard rectangular windows. Therefore, in the experiments, we compare the results of Strat-
ified Dense Matching (SDM) with the plain Confidently Stable Matching (CSM) with
5×5 rectangular windows.
We divide the experiments reported here into two groups: the first one is based on a
rigorous ground-truth evaluation and focuses on matching failure mechanisms related to
insufficient image feature discriminability. The second one demonstrates the results on
complex outdoor scenes. For a comparison based on the Middlebury dataset (which is
omitted here due to space limitations) see [12].
Ground-Truth Evaluation
This evaluation method [13] is based on a designed artificial scene with known ground-
truth, shown in Fig. 3. The scene consists of five thin textured stripes (foreground) in front
Ground Truth
lowest contrast medium contrast highest contrast ground-truth
Figure 3: A selection of tested scene: contrasts of 1, 13 and 20, and the ground-truth disparity map.
The rightmost bar shows disparity map colour coding: low disparities are dark blue, high disparities
are red, half-occluded regions are gray.
1 13 20
10−3
10−2
10−1
100Mismatch Rate (δ>1)
MIR
texture contrast
SDM
CSM
1 13 20
10−2
10−1
100False Negative Rate
FNR
texture contrast
SDM
CSM
1 13 20
10−5
10−4
10−3
10−2
10−1
100False Positive Rate
FPR
texture contrast
SDM
CSM
1 13 20
10−2
10−1
100Failure Rate
FR
texture contrast
SDM
CSM
1 13 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Occlusion Boundary Inaccuracy
OBI
texture contrast
SDM
CSM
1 13 20
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
0.05 Bias
B
texture contrast
SDM
CSM
Figure 4: Matching error results. The CSM algorithm: red solid, the SDM algorithm: blue dashed.
of a textured plane (background). Twenty stereo images of the scene have been captured
under 20 different texture contrast values, which emulate varying signal-to-noise ratio.
Three images of those are shown in Fig. 3 (the dark stripes are shadows). The confidence
level parameters were set to α=20σ2/1000 and β=0.05, where σ2is the image inten-
sity variation. The goal of this test set is not to provide a complete cover of all possible
stereo data but to expose weaknesses related to image feature discriminability.
Types of Error In the experiment, the following six types of error were distinguished:
Mismatch Rate (MIR) measures the accuracy of matching, False Negative Rate (FNR)
measures the disparity map sparsity, False Positive Rate (FPR) measures the quality of
occluded regions detection, Failure Rate (FR) measures the overall disparity map quality
(for view prediction), Occlusion Boundary Inaccuracy (OBI) measures the precision of
occlusion boundary detection, Bias (B) measures the algorithm’s bias to large objects.
The range of all errors is [0,1], only for bias it is [1,1].
Evaluation The results are shown in Fig. 4. Texture contrast (horizontal axis in all
plots) is directly related to signal-to-noise ratio. The respective error rates are shown on
vertical axes. Note that both axes have logarithmic scale (except in Band OBI plots).
Both algorithms reach a low level of MIR very fast and then they stay constant.3The
MIR is normalized by the matching density, we can therefore conclude that the accuracy
of both results is better than 99%, while the SDM results improve about 1.8×. The False
Negative Rate in CSM is about 25%, in SDM about 9%, i.e. the matching density is im-
proved about 3×. The FPR vanishes in both CSM and SDM except for the worst two
contrasts in CSM (where the SDM results are better by about one order of magnitude),
which demonstrates the ability of both algorithms to detect occlusions correctly. The FR
error was improved about 3.3×by applying the SDM approach. Even the OBI results are
improved: the occlusion boundary is detected about 2×more precisely. None of the algo-
rithms exhibits bias Btowards large objects. The CSM slightly “prefers” the background
(larger object), unlike the SDM, which appears to prefer the foreground (smaller object).
However, the bias in the SDM is about 2×smaller than in the CSM.
To conclude, we can say the SDM preserves the good property of CSM (low false
positive rate), while the density of the results is improved 3×, the occlusion boundary
accuracy 2×, the matching accuracy 1.8×and the overall error 3.3×.
Real Outdoor Scenes
The purpose of this section is to demonstrate the ability of our method to cope even with
complex outdoor scenes. We have selected scenes of wide disparity range, thin objects
(obstacles) at the foreground, and slanted ground plane. We show our approach is able to
correctly detect all the objects in the scene without false positive artefacts (illusions). The
confidence level parameters were fixed to the values of α=20 and β=0.05 in both SDM
and CSM. For comparison, we show the results of the state-of-the-art algorithms: MAP
matching via dynamic programming DP (our reimplementation of Cox’s algorithm [7]),
and MAP matching via graph cuts GC (the authors’ implementation [11]). The parameters
of the two MAP algorithms (the occlusion penalty λof DP and λand penalty0of GC)
were manually adjusted to give the visually best results on each of the tested datasets.
The first selected image pair is a photo of a meadow with an apple tree at foreground,
a shrub at midrange, and a forest at background. The sky above the scene is completely
featureless. The input images together with the results are shown in Fig. 5. The main
difference between SDM and CSM is the density, which is about 1.4×greater in SDM
(35% of pixels is matched in CSM and 50% in SDM), while the accuracy is preserved.
The SDM detects correctly even very fine features in the scene, e.g. the tall grasses at
the road side on the right of the tree. The results on the sky and on the tree trunk have
been also improved. The DP exhibits the typical “streaking” artefacts around objects (tree
crown, shrub), and the tree trunk is disconnected. However, the meadow and the forest are
detected correctly. The GC disparity map consists of piecewise constant disparity regions,
which do not correspond to the scene at all. The top of the tree crown has been “cut” and
the trunk of the tree has not been detected precisely: a higher disparity is assigned to
it and this disparity is propagated to the neighbouring meadow, which means GC sees
a constant disparity “wall” standing in place of the meadow. A similar frontoparallel
wall is hallucinated in place of the meadow on the left of the tree. The forest at the
background has not been detected correctly either. Only the sky is correct with unassigned
correspondences, but it partially cuts the objects.
The second image is a photo of a larch grove, shown in Fig. 6. In this image pair, we
can see a mutually occluded region: the area in between the two leftmost trees corresponds
3Which is a remarkable property confirmed in real scenes as well.
Left image Right image Stratified Dense Matching
Confidently Stable Matching Dynamic programming Graph cuts
Figure 5: The Apple Tree image pair: the input images are shown together with the results of
selected methods. The disparity map colour coding has been described in Fig. 3.
Left image Right image Stratified Dense Matching
Confidently Stable Matching Dynamic programming Graph cuts
Figure 6: The Larch Grove image pair: the results of the selected methods. The gaps in some of the
tree trunks appear due to ordering constraint violation in the scene. For colour coding see Fig. 3.
to different parts of the scene background, thus no correspondences can be found there.
This scene violates the ordering constraint, which results in discontinuities in some of the
trees. The SDM results are 2.5×denser (CSM matched 19% of all pixels, while SDM
48%), we can find all the trees, ferns on the ground, and the mutually occluded region
detected correctly, although there are a few more mismatches there than in the CSM. The
DP is not able to cope with occlusions and mutually occluded regions at all. The strong
continuity prior causes interpolation artefacts over low-texture (or ambiguous) regions.
In GC the parameter tuning was rather difficult. The disparity map shown here has been
selected from more than 100 results as visually the best one. The disparity map exhibits
similar artefacts as in the previous scene: Neither mutually occluded nor half-occluded
regions have been identified, and many important structural details are missing altogether.
6 Discussion and Conclusions
In this paper, we haveproposed a new method, Stratified Dense Matching. Our approach
improves the discriminability of matching features by selecting a suitable matching win-
dow shape. The windows are defined to adapt to high-correlation structures in disparity
space that represent all matching hypotheses. Non-constant disparity within one disparity
component is allowed. The window definition in disparity space allows the matching to
adapt to slanted and curved surfaces, scale differences, and discontinuities. The stratified
matching approach is independent on the selection of the matching algorithm, although it
is desirable to use a pre-matching algorithm that guarantees low false negative error and
a final matching algorithm that guarantees low false positive error and mismatch error.
We have demonstrated in a quantitative experiment that not only the quality but also
the density of disparity map is considerably determined by discriminable joint image
features. Further improvement might be achieved by proper local image modeling [6, 1].
Why is the density difference between CSM and SDM so large? The CSM prefers
to reject a tentative match if competitors of similar correlation exist. This happens when
there are broad correlation value maxima in disparity space due to statistical dependence
between neighbouring image pixels, e.g. due to image blur or long correlation length
of the visual texture in the scene. In SDM the pre-matching step traces-out the exact
positions of the maxima and removes their close competitors. So this step in effect breaks
the dependencies by removing the competing matching hypotheses that disturb CSM.
The improvement observed in accuracy (MIR and OBI) results from the adaptation of the
matching window shape, which improves the correlation discriminability.
Note that the SDM lies somewhere in between the algorithms that solve one optimiza-
tion problem per rectified image row (like DP [7]) on the one hand and the algorithms that
solve a global optimization problem on the entire disparity space (like GC [11]) on the
other hand. The component tracing step is semi-local in its nature but still able to tie
together solutions on neighbouring epipolar lines. Its semi-locality brings a great algo-
rithmic advantage over the global optimization approaches.
Our work differs from the work of others in:(1) the definition of disparity components
that represent a piecewise contiguous manifold in disparity space and so (2) avoiding
the definition of area-based matching over a collection of independent patches, and (3)
avoiding explicit local surface model (e.g. planar) by (4) the determination of the window
shape directly by the hypothesized solution (as opposed to just selecting the best one from
a group of pre-defined windows), (5) in splitting the matching problem into two semi-
independent stages with different requirements, (6) re-computing similarity statistics from
disparity components, and (7) joining the segmentation of reliable matching features and
the correspondence search into a single procedure.
Acknowledgments
This work has been supported by the Grant Agency of the Czech Republic under project
GACR 102/01/1371, by the Grant Agency of the Czech Technical University under project
CTU 8306413, and by the Czech Ministry of Education under project MSM 212300013.
References
[1] Stan Birchfield and Carlo Tomasi. A pixel dissimilarity measure that is insensitive to image
sampling. IEEE PAMI, 20(4):401–406, 1998.
[2] Aaron F. Bobick and Stephen S. Intille. Large occlusion stereo. IJCV, 33(3):181–200, 1999.
[3] Yuri Boykov, Olga Veksler, and Ramin Zabih. Disparity component matching for visual cor-
respondence. In CVPR, pages 470–475, 1997.
[4] Yuri Boykov, Olga Veksler, and Ramin Zabih. A variable window approach to early vision.
IEEE PAMI, 20(12):1283–1294, 1998.
[5] Qian Chen and G´
erard Medioni. A volumetric stereo matching method: Application to image-
based modeling. In CVPR, pages 29–34, 1999.
[6] Maureen Clerc. Wavelet-based correlation for stereopsis. In ECCV, pages 495–509, 2002.
[7] Ingemar J. Cox, Sunita L. Higorani, Satish B. Rao, and Bruce M. Maggs. A maximum likeli-
hood stereo algorithm. CVIU, 63(3):542–567, 1996.
[8] Davi Geiger, Bruce Ladendorf, and Alan Yuille. Occlusions and binocular stereo. IJCV,
14:211–226, 1995.
[9] Hiroshi Ishikawa and Davi Geiger. Occlusions, discontinuities, and epipolar lines in stereo.
In ECCV, pages 232–248, 1998.
[10] Takeo Kanade and Masatoshi Okutomi. A stereo matching algorithm with an adaptive win-
dow: Theory and experiment. IEEE PAMI, 16(9):920–932, 1994.
[11] Vladimir Kolmogorov and Ramin Zabih. Computing visual correspondence with occlusions
using graph cuts. In ICCV, pages 508–515, 2001.
[12] Jana Kostkov´
a. Stereoscopic matching: Problems and solutions. RR CTU–CMP–2002–13,
Center for Machine Perception, Czech Technical University, 2002.
[13] Jana Kostkov´
a, Jan ˇ
Cech, and Radim ˇ
S´
ara. Dense stereomatching algorithm performance for
view prediction and structure reconstruction. In SCIA, pages 101–107, 2003.
[14] R. A. Lane, N. A. Thacker, and N. L. Seed. Stretch-correlation as a real-time alternative to
feature-based stereo matching algorithms. IVC, 12(4):203–212, 1994.
[15] Masatoshi Okutomi, Yasuhiro Katayama, and Setsuko Oka. A simple stereo algorithm to
recover precise object boundaries and smooth surfaces. In Proc. of Workshop on Stereo and
Multi-Baseline Vision, pages 158–165, 2001.
[16] G. P. Otto and T. K. W. Chau. ’Region-growing’ algorithm for matching of terrain images.
IVC, 7(2):83–94, 1989.
[17] S´
ebastien Roy. Stereo without epipolar lines: A maximum-flow formulation. IJCV,
34(2/3):147–161, 1999.
[18] Radim ˇ
S´
ara. Finding the largest unambiguous component of stereo matching. In ECCV, pages
900–914, 2002.
[19] Daniel Scharstein, Richard Szeliski, and Ramin Zabih. A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms. IJCV, 47(1):7–42, 2002.
[20] Hai Tao, Harpreet S. Sawhney, and Rakesh Kumar. A global matching framework for stereo
computation. In ICCV, pages 532–539, 2001.
[21] Olga Veksler. Stereo matching by compact windows via minimum ratio cycle. In ICCV, pages
540–547, 2001.
[22] Ye Zhang and Chandra Kambhamettu. Stereo matching with segmentation-based cooperation.
In ECCV, pages 556–571, 2002.
... Cette contrainte peut aussiêtre violée lorsque, par exemple, l'angle entre le plan image et la direction induite pas les positions des deux objets est important. Les contraintes d'ordre et d'unicité ontété très utilisées ; une variante de ces contraintes, la consistance faible, a mêmé eté proposée dans [58]. Cette contrainte permet d'interdire moins de correspondances. ...
... La contrainte de symétrie ou consistance forte [58] est souvent utilisée et elle est définie par : ...
... Dans [66], après une première mise en correspondance de points particuliers, la zone de recherche est adaptée pour les autres pixels. Dans [32,58], une nouvelle forme de fenêtre de corrélation est déterminéeà partir des résultats d'une première mise en correspondance. Dans [67,106], les coûts locaux sont modifiés en fonction de ceux obtenusà l'étape précédente. ...
Article
Full-text available
In binocular stereovision, the accuracy of the 3D reconstruction depends on the accuracy of matching results. Consequently, matching is an important task. Our first goal is to present a state of the art of matching methods. We define a generic and complete algorithm based on essential components to describe most of the matching methods. Occlusions are one of the most important difficulties and we also present a state of the art of methods dealing with occlusions. Finally, we propose matching methods using two correlation measures to take into account occlusions. The results highlight the best method that merges two disparity maps obtained with two different measures.
... In contrast, the worst performance is obtained for the Lamp2 image pair, since it is poorly textured. Felzenszwalb [25] 15.2 18.7 13.0 23.3 32.0 7.8 11.4 7.0 16.5 26.0 Kolmogorov [26] 8.2 16.5 26.2 30.3 65.7 4.1 8.1 19.0 21.0 60.7 Cech [27] 7.2 15.8 17.4 18.8 36.7 4.4 10.2 9.7 11.2 27.1 Kostková [28] 7.2 13.5 14.2 17.9 31.5 5.3 10.1 8.2 13.0 26.7 Geiger [9] 5.0 11.5 10.8 13.3 17.5 2.7 7.3 4.5 8.7 10.4 Proposed method (Canny Sensors 2017, 17, 1680 9 of 15 method proposed provides better results than any of the other methods. In Table 1, the results of the error rate (%) achieved by the Canny detector are better than those obtained by DoG. ...
... In contrast, the worst performance is obtained for the Lamp2 image pair, since it is poorly textured. [25] 15.2 18.7 13.0 23.3 32.0 7.8 11.4 7.0 16.5 26.0 Kolmogorov [26] 8.2 16.5 26.2 30.3 65.7 4.1 8.1 19.0 21.0 60.7 Cech [27] 7.2 15.8 17.4 18.8 36.7 4.4 10.2 9.7 11.2 27.1 Kostková [28] 7.2 13.5 14.2 17.9 31.5 5.3 10.1 8.2 13.0 26.7 Geiger [9] 5.0 11.5 10.8 13.3 17.5 2.7 7.3 4.5 8.7 10.4 Proposed method (Canny (b) ground truth disparity maps; Disparity maps by (c) dual-cross-bilateral (DCB) grid [23]; (d) adaptive weight [21]; (e) MGM [24]; and (f) proposed method. Figure 8 shows a comparison of the error rates with respect to the threshold values of the Canny detector and DoG method. ...
Article
Full-text available
Dense disparity map estimation from a high-resolution stereo image is a very difficult problem in terms of both matching accuracy and computation efficiency. Thus, an exhaustive disparity search at full resolution is required. In general, examining more pixels in the stereo view results in more ambiguous correspondences. When a high-resolution image is down-sampled, the high-frequency components of the fine-scaled image are at risk of disappearing in the coarse-resolution image. Furthermore, if erroneous disparity estimates caused by missing high-frequency components are propagated across scale space, ultimately, false disparity estimates are obtained. To solve these problems, we introduce an efficient hierarchical stereo matching method in two-scale space. This method applies disparity estimation to the reduced-resolution image, and the disparity result is then up-sampled to the original resolution. The disparity estimation values of the high-frequency (or edge component) regions of the full-resolution image are combined with the up-sampled disparity results. In this study, we extracted the high-frequency areas from the scale-space representation by using difference of Gaussian (DoG) or found edge components, using a Canny operator. Then, edge-aware disparity propagation was used to refine the disparity map. The experimental results show that the proposed algorithm outperforms previous methods.
... Cette contrainte est équivalente à la contrainte de consistance forte [KS03] qui se définie de la même façon que sur un ensemble d'appariement. Elle consiste à vérifier la contrainte d'unicité sur les primitives de la droite vers la gauche et vice versa. ...
Thesis
Full-text available
Cette thèse s'inscrit dans le cadre du projet CPER Bramms, dont un des objectifs était de développer une méthode d'acquisition de la surface du buste féminin. Les travaux menés ont donc eu pour but la conception, le développement et la réalisation d'une machine de mesure tridimensionnelle adaptée aux objets vivants. Parmi le nombre important de méthodes de mesures tridimensionnelles existantes, l'attention a été portée sur la mise en correspondance par stéréovision ainsi que sur l'utilisation de lumière structurée. La mise en correspondance par stéréovision consiste à retrouver les pixels homologues dans deux images d'une même scène, prise de deux points de vue différents. Une des manières de réaliser la mise en correspondance est de faire appel à des mesures de corrélation. Les algorithmes utilisés se heurtent alors à certaines difficultés : les changements de luminosité, les bruits, les déformations, les occultations, les zones peu texturées et les larges zones homogènes. L'utilisation de lumière structurée a permis essentiellement d'ajouter de l'information dans les zones homogènes lors des travaux menés. En développant cette approche, une méthode de reconstruction originale basée sur l'exploitation d'un motif particulier projeté sur la surface a ainsi été conçue. Un appariement basé sur la comparaison de signatures de points particuliers du motif a été mis en place. Ce procédé permet une reconstruction éparse en une unique acquisition et simplifie l'étape de gestion du nuage de points pour en faire un maillage surfacique
Article
High-speed 3D shape measurement (or imaging) has seen tremendous growths over the past decades, especially the past few years due to the improved speed of computing devices and reduced costs of hardware components. 3D shape measurement technologies have started penetrating more into our daily lives than ever before with the recent release of iPhone X that has an built-in 3D sensor for Face ID, along with prior commercial success of inexpensive commercial sensors (e.g., Microsoft Kinect). This paper overviews the primary state-of-the-art 3D shape measurement techniques based on structured light methods, especially those that could achieve high measurement speed and accuracy. The fundamental principles behind those technologies will be elucidated, experimental results will be presented to demonstrate capabilities and/or limitations for those popular techniques, and finally present our perspectives on those remaining challenges to be conquered to make advanced 3D shape measurement techniques ubiquitous.
Article
Full-text available
We present a robust approach for computing disparity maps with a supervised learning-based confidence prediction. This approach takes into consideration following features. First, we analyze the characteristics of various confidence measures in the random forest framework to select effective confidence measures depending on the characteristics of the training data and matching strategies, such as similarity measures and parameters. We then train a random forest using the selected confidence measures to improve the efficiency of confidence prediction and to build a better prediction model. Second, we present a confidence-based matching cost modulation scheme based on the predicted confidence values to improve the robustness and accuracy of the (semi-) global stereo matching algorithms. Finally, we apply the proposed modulation scheme to popularly used algorithms to make them robust against unexpected difficulties that could occur in an uncontrolled environment using challenging outdoor datasets. The proposed confidence measure selection and cost modulation schemes are experimentally verified from various perspectives using the KITTI and Middlebury datasets.
Article
Full-text available
This paper presents a method to achieve high-speed and high-accuracy 3D surface measurement using a custom-designed mechanical projector and two high-speed cameras. We developed a computational framework that can achieve absolute shape measurement in sub-pixel accuracy through: 1) capturing precisely phase-shifted fringe patterns by synchronizing the cameras with the projector; 2) generating a rough disparity map between two cameras by employing a standard stereo-vision method using texture images with encoded statistical patterns; and 3) utilizing the wrapped phase as a constraint to refine the disparity map. The projector can project binary patterns at a speed of up to 10,000 Hz, and the camera can capture the required number of phase-shifted fringe patterns with 1/10,000 second, and thus 3D shape measurement can be realized as high as 10,000 Hz regardless the number of phase-shifted fringe patterns required for one 3D reconstruction. Experimental results demonstrated the success of our proposed method.
Conference Paper
We present an improved non-learning-based confidence measure based on tree agreement for the stereo matching problem. To use confidence information for accurate matching, we propose an improved method to assemble a cost aggregation table with confidence measure. The proposed confidence measure and cost aggregation method were evaluated using the KITTI and HCI datasets. Compared to other non-learning-based confidence measures, the proposed confidence measure showed the best ability to detect wrongly-estimated pixels. By using estimated confidence measure, we showed that the proposed cost aggregation method improved disparity map quality compared to previous methods. The proposed algorithm could estimate disparity relatively accurately even in some very challenging outdoor scenes.
Chapter
Three-dimensional (3D) reconstruction using stereo-correlation relates to the automatic extraction of data about the scene's 3D structure from 2 to N images acquired simultaneously. In this context, in order to estimate depth within a scene, the 3D points are triangulated using their projections in the images taken from different viewpoints and the characteristics of the capture system. This chapter introduces the difficulties related to homologue searches as well as primitives and capture geometry. It then examines the generic algorithms of two existing approaches with the most commonly used constraints and costs. In global approaches, the difficult aspects are the initialization of disparities, the choice of the stop conditions, the update of the disparity function and cost, which are dependent on the optimization method used. The chapter concentrates on the occlusion problem by describing two approaches, the first being stereoscopic and the other being multiscopic.
Article
Full-text available
A method for solving the stereo matching problem in the presence of large occlusion is presented. A data structure—the disparity space image—is defined to facilitate the description of the effects of occlusion on the stereo matching process and in particular on dynamic programming (DP) solutions that find matches and occlusions simultaneously. We significantly improve upon existing DP stereo matching methods by showing that while some cost must be assigned to unmatched pixels, sensitivity to occlusion-cost and algorithmic complexity can be significantly reduced when highly-reliable matches, or ground control points, are incorporated into the matching process. The use of ground control points eliminates both the need for biasing the process towards a smooth solution and the task of selecting critical prior probabilities describing image formation. Finally, we describe how the detection of intensity edges can be used to bias the recovered solution such that occlusion boundaries will tend to be proposed along such edges, reflecting the observation that occlusion boundaries usually cause intensity discontinuities.
Conference Paper
Full-text available
The knowledge of stereo matching algorithm properties and behaviour under varying conditions is crucial for the selection of a proper method for the desired application. In this paper we study the behaviour of four representative matching algorithms under varying signal-to-noise ratio in six types of error statistics. The errors are focused on basic matching failure mechanisms and their definition observes the principles of independence, symmetry and completeness. A ground truth experiment shows that the best choice for view prediction is the Graph Cuts algorithm and for structure reconstruction it is the Confidently Stable Matching.
Conference Paper
Full-text available
Position disparity between two stereoscopic images, combined with camera calibration information, allow depth recovery. The measurement of position disparity is known to be ambiguous when the scene reflectance displays repetitive patterns. This problem is reduced if one analyzes scale disparity, as in shape from texture, which relies on the deformations of repetitive patterns to recover scene geometry from a single view. These observations lead us to introduce a new correlation measure based not only on position disparity, but on position and scale disparity. Local scale disparity is expressed as a change in the scale of wavelet coefficients. Our work is related to the spatial frequency disparity analysis of Jones and Malik (ECCV92). We introduce a new wavelet-based correlation measure, and we show its application to stereopsis. We demonstrate its ability to reproduce perceptual results which no other method of our knowledge had accounted for.
Conference Paper
Full-text available
Stereo matching is an ill-posed problem for at least two principal reasons: (1) because of the random nature of match similarity measure and (2) because of structural ambiguity due to repetitive patterns. Both ambiguities require the problem to be posed in the regularization framework. Continuity is a natural choice for a prior model. But this model may fail in low signal-to-noise ratio regions. The resulting artefacts may then completely spoil the subsequent visual task. A question arises whether one could (1) find the unambiguous component of matching and, simultaneously, (2) identify the ambiguous component of the solution and then, optionally, (3) regularize the task for the ambiguous component only. Some authors have already taken this view. In this paper we define a new stability property which is a condition a set of matches must satisfy to be considered unambiguous at a given confidence level. It turns out that for a given matching problem this set is (1) unique and (2) it is already a matching. We give a fast algorithm that is able to find the largest stable matching. The algorithm is then used to show on real scenes that the unambiguous component is quite dense (10–80%) and error-free (total error rate of 0.3–1.4%), both depending on the confidence level chosen.
Article
Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods. Our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms. We have also produced several new multi-frame stereo data sets with ground truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today's best-performing stereo algorithms.
Article
Binocular stereo is the process of obtaining depth information from a pair of left and right cameras. In the past occlusions have been regions where stereo algorithms have failed. We show that, on the contrary, they can help stereo computation by providing cues for depth discontinuities. We describe a theory for stereo based on the Bayesian approach. We suggest that a disparity discontinuity in one eye's coordinate system always corresponds to an occluded region in the other eye thus leading to an occlusion constraint or monotonicity constraint. The constraint restricts the space of possible disparity values, simplifying the computations, and gives a possible explanation for a variety of optical illusions. Using dynamic programming we have been able to find the optimal solution to our system and the experimental results support the model.
Article
We have analysed the requirements for a robust stereo vision algorithm for use in typical industrial applications. For such applications the views obtained in the two cameras have large differences in visual appearance due to the orientation difference between the two cameras and the close proximity of illumination sources. We have concluded that for this category of problem, feature-based methods should be more robust than conventional, area-based approaches, and this conclusion appears to be borne out in the published literature. However, correlation-based approaches are more suited to efficient implementation on available hardware. The technique which we have developed, called Stretch-Correlation, is based on the cross-correlation of warped image blocks which have been preprocessed to maximize the useful information content. Our new method models the severe warping effects encountered in difficult stereo problems and effectively relaxes the front-o-parallel constraint which is normally imposed in area-based disparity calculation. This algorithm imposes effectively most of the local constraints present in feature-based algorithms, and can be efficiently implemented on available hardware.
Article
The paper describes and discusses a new algorithm for stereo matching, which has been designed to work well with data from the SPOT satellite1. It is basically an extension of Gruen's adaptive least squares correlation algorithm2.3, so that whole images can be automatically matched, instead of just selected patches. Initial results on the quality of results are presented and discussed, together with a theoretical analysis of the potential speed on both conventional and multiprocessor architectures, and a brief discussion of the running speeds achieved. In addition, a method for assessing the accuracy of the matching without using any external reference data is presented.
Conference Paper
In this paper we present a new stereo matching algorithm that produces accurate dense disparity maps and explicitly detects occluded areas. This algorithm extends the original cooperative algorithms in two ways. First, we design a method of adjusting the initial matching score volume to guarantee that correct matches have high matching scores. This method propagates “good” disparity information within or among image segments based on certain disparity confidence measurement criterion, thus improving the robustness of the algorithm. Second, we develop a scheme of choosing local support areas by enforcing the image segmentation information. This scheme sees that the depth discontinuities coincide with the color or intensity boundaries. As a result, the foreground fattening errors are drastically reduced. Extensive experimental results demonstrate the effectiveness of our algorithm, both quantitatively and qualitatively. Comparison between our algorithm and some other representative algorithms is also reported.
Conference Paper
This paper presents a new global matching framework for stereo computation. In this framework, the second view is first predicted from the reference view using the depth information. A global match measure is then defined as the similarity function between the predicted image and the actual image. Stereo computation is converted into a search problem where the goal is to find the depth map that maximizes the global match measure. The major advantage of this framework is that the global visibility constraint is inherently enforced in the computation. This paper explores several key components of this framework including (1) three color segmentation based depth representations, (2) an incremental warping algorithm that dramatically reduces the computational complexity, and (3) scene constraints such as the smoothness constraint and the color similarity constraint. Experimental results using different types of depth representations are presented. The quality of the computed depth maps is demonstrated through image-based rendering from new viewpoints