The Panum Proxy Algorithm for Dense Stereo Matching over a Volume of Interest
ABSTRACT Stereo matching algorithms conventionally match over a range of disparities sufficient to encompass all visible 3D scene points. Human vision however does not do this. It works over a narrow band of disparities - Panums fusional band - whose typical range may be as little as 1/20 of the full range of disparities for visible points. Points inside the band are fused visually and the remainder of points are seen as "diplopic" - that is with double vision. The Panum band restriction is important also in machine vision, both with active (pan/tilt) cameras, and with high resolution cameras and digital pan/tilt.
- [Show abstract] [Hide abstract]
ABSTRACT: Stereo matching algorithms conventionally match over a range of disparities sufficient to encompass all visible 3D scene points. Human vision, however, works over a narrow band of disparities-Panum's fusional band-whose typical range may be as little as 1/20 of the full range of disparities for visible points. Only points inside the band are fused visually; the remainder of points are seen diplopically. A probabilistic approach is presented for dense stereo matching under the Panum band restriction. It is shown that existing dense stereo algorithms are inadequate in this problem setting and the main problem is segmentation, marking the image into the areas that fall inside the band. An approximation is derived that makes up for missing out-of-band information with a "proxy" based on image autocorrelation. It is shown that the Panum Proxy algorithm achieves accuracy close to what can be obtained when the full disparity band is available, and with gains of between one and two orders of magnitude in computation time. There are also substantial gains in computation space. Panum band processing is also demonstrated in an active stereopsis framework.IEEE Transactions on Software Engineering 03/2010; 32(3):416-30. · 2.29 Impact Factor - SourceAvailable from: cse.yorku.ca[Show abstract] [Hide abstract]
ABSTRACT: This paper presents methods for efficient recovery of accurate binocular disparity estimates in the vicinity of 3D surface discontinuities. Of particular concern are methods that impact coarse-to-fine, local block-based matching as it forms the basis of the fastest and the most resource efficient stereo computation procedures. A novel coarse-to-fine refinement procedure that adapts match window support across scale to ameliorate corruption of disparity estimates near boundaries is presented. Extensions are included to account for half-occlusions and colour uniformity. Empirical results show that incorporation of these advances in the standard coarse-to-fine, block matching framework reduces disparity errors by more than a factor of two, while performing little extra computation, preserving low complexity and the parallel/pipeline nature of the framework. Moreover, the proposed advances prove to be beneficial for CTF global matchers as well.Image and Vision Computing 03/2010; · 1.58 Impact Factor
Page 1
The Panum Proxy algorithm for dense stereo matching over a volume of interest
A. Agarwal and A. Blake
Microsoft Research Ltd.
7 J J Thomson Ave, Cambridge, CB3 0FB, UK
http://research.microsoft.com/vision/cambridge
Abstract
Stereo matching algorithms conventionally match over a
range of disparities sufficient to encompass all visible 3D
scene points. Human vision however does not do this. It
works over a narrow band of disparities — Panum’s fu-
sional band — whose typical range may be as little as 1/20
of the full range of disparities for visible points. Points in-
side the band are fused visually and the remainder of points
are seen as “diplopic” — that is with double vision. The
Panum band restriction is important also in machine vision,
both with active (pan/tilt) cameras, and with high resolution
cameras and digital pan/tilt.
A probabilistic approach is presented for dense stereo
matching under the Panum band restriction.
shown that existing dense stereo algorithms are inadequate
in this problem setting. Secondly it is shown that the main
problem is segmentation, separating the (left) image into
the areas that fall respectively inside and outside the band.
Thirdly, anapproximationisderivedthatmakesupformiss-
ing out-of-band information with a “proxy” based on image
autocorrelation. Lastly it is shown that the Panum Proxy
algorithm achieves accuracy close to what can be obtained
when the full disparity band is available.
First it is
1. Introduction
In attentional stereo vision, the viewer steers a volume
of interest around the scene. This is a problem that has re-
ceived a good deal of attention in the realms of oculomotor
control[5, 8] and sparse stereo e.g.[14]. In the area of dense
stereo however e.g.[12, 6, 2, 3, 10, 15] the issue of restrict-
ing attention to a volume, with a limited range of depth or
equivalently disparity, has not been addressed. It is of con-
siderable importance from the point of view of efficiency,
particularly with high resolution or head-mounted cameras,
inrestrictingcomputationtoavolumeofinterestwhichmay
be only a small fraction of the visible volume. In principle
also, it is most unsatisfying that conventional stereo algo-
rithms need to explore an irrelevant background, simply in
order to establish significiant properties of the foreground
— a form of the celebrated “frame” problem of Artificial
Intelligence.
1.1. The Panum band
The geometry of the situation is illustrated in figure 1.
For a particular field of view of each camera, potential
matches between left and right images form a diamond-
shaped region in each epipolar plane.
[13] the space of possible matches is restricted further to
the “Panum band” (see figure). This is typically around 5
mrad wide, and cuts down the number of possible foveal
matches by around an order of magnitude. High quality
stereo cameras with narrow fields of view can also benefit
from a Panum band restriction in a similar way.
The motivation for studying Panum band stereo is then
threefold.
1. It is conceptually appealing to develop a stereo algo-
rithm which focuses on a volume of interest, in the manner
known to prevail in human vision. Why should a stereo al-
gorithm expend needless attention to the entire background
of a scene?
2. Computational cost for stereo matching grows linearly
(or faster) with volume of interest. This is true both for both
main components of stereo matching: cost computation and
global optimization (whether by graph-cut (GC), dynamic
programming (DP) or belief propagation (BP)). Restricting
the size of the matching volume is therefore critical for ef-
ficiency. For stereo geometry similar to human vision, the
saving in computational cost is at least an order of magni-
tude, due to the reduced range of depth (disparity). Usually
there is a further factor of saving, due to the concomitant
restriction in image area over which matching occurs. The
best stereo algorithms (GC, DP or BP [15]) do not currently
come close to real time. This is not going to be solved any
time soon by Moore’s law because camera resolution is in-
creasing faster than processing power.
3. Computational cost, we have argued, necessitates the re-
striction of stereo matching to a Panum volume. However,
existing dense stereo algorithms are not capable of satisfac-
In human vision
Page 2
Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20062
Left
Right
a)
m
0
Left
n
Right
disparity
b)
Figure 1. The space of possible matches restricted to a Panum
band. a) View from above of rays in a single epipolar plane, form-
ing a diamond-shaped match space; the Panum band forms a rib-
bon across the diamond and thus cuts down the set of possible
matches. b) The match-space represents the situation in (a) in a
standardised diagram, in which the diamond-shaped match-space
becomes a square, whose sides are respectively a left and right
epipolar line, and restricted to the Panum band as shown. A possi-
ble matching path is shown dashed.
tory operation over a Panum band, as this paper will show.
A new algorithm is needed.
1.2. The Panum Proxy algorithm
The principle of the Panum Proxy stereo algorithm is
therefore as follows.
1.
Compute match scores or likelihoods for disparities
within the Panum band.
2. Aggregate those scores to compute a total likelihood, at
each point, that there is a within-band (foreground) match.
3. The same cannot be done for the background likelihood,
as that would require match scores outside the band.
However, it is shown that an autocorrelation-like measure
can be used to estimate the background likelihood.
4. Use the true foreground likelihood and the estimated
background likelihood, in a graph cut algorithm, to achieve
a segmentation.
5. Once segmentation is complete, perform conventional
stereo matching e.g. [3], but restricted to the image regions
that have been labelled as in-band.
Note that the restriction to the Panum band in the seg-
mentation step 4 is indeed essential, because the complex-
ity of segmentation is dominated by the cost of computing
stereo match scores, and this is linear in match volume.
The resulting stereo disparity map can be used, for ex-
ample, to synthesise a new view, as in figure 2, in which
Figure 2. Fusion and diplopia with the Panum Proxy algo-
rithm. Results of the Panum Proxy algorithm are illustrated here
for a frame from one of the six Microsoft stereo datasets. The
matched stereogram shows fusion within the Panum band but
diplopia — double vision — elsewhere.
case the view is fused within the Panum band, but diplopic
outside it, just as in human stereo vision.
2. Probabilistic framework for stereo matching
First we outline the notation for probabilistic stereo
matching. Pixels in the rectified left and right images are
L = {Lm} and R = {Rn} respectively, and jointly we
denote the two images z = (L,R). Left and right pix-
els are associated by any particular matching path (fig. 1).
Frequently in stereo matching the so-called “ordering con-
straint” is imposed, and this means that each move in figure
1b) is allowed only in the positive quadrant [1, 12]. Stereo
“disparity” is d = {dm, m = 0,...,N} and disparity is
simply related to image coordinates as dm= m − n.
In algorithms that deal explicitly with occlusion [10, 7]
an array x of state variables x = {xm}, takes values
xm ∈ {M,O} according to whether the pixel is matched
or occluded.
This sets up the notation for a path in epipolar match-
space which is a sequence ((d1,x1),(d2,x2),...) of dis-
parities and states. A Gibbs energy E(z,d,x;Θ,Φ) can
be defined for the posterior over all epipolar paths taken
together and notated (d,x), given the image data z. Para-
meters Φ and Θ relate respectively to prior and likelihood
terms in the posterior. Then the Gibbs energy can be glob-
Page 3
Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20063
ally minimised to obtain a segmentation x and disparities
d.
2.1. Prior distribution over matching paths
A Bayesian model for the posterior distribution
p(x,d | z) is set up as a product of prior and likelihood:
p(x,d | z) ∝ p(x,d)p(z | x,d).
The prior distribution p(x,d) ∝ exp−λE0(x,d) is fre-
quently decomposed, in the interests of tractability, as a
Markov model. An MRF (Markov Random Field) prior for
(x,d) is specified as a product of clique potentials Vm,m?
over all pixel pairs (m,m?) ∈ N deemed to be neigh-
bouring in the left image. The potentials are chosen to
favour matches over occlusions, to impose limits on dispar-
ity change along an epipolar line, and to favour figural con-
tinuity between matching paths in adjacent epipolar line-
pairs.
(1)
2.2. Stereo matching likelihood
The stereo likelihood is:
p(z | x,d) ∝
?
m
exp−UM
m(xm,dm)
(2)
where the pixelwise negative log-likelihood ratio, for match
vs. non-match, is
?
where M(...) is a suitable measure of goodness of match
between two patches, often based on normalised sum-
squared difference (SSD) or correlation scores [15].
UM
m(xm,dm) =
M(LP
M0
m, RP
n)if xm= M
if xm= O,
(3)
3. Restricting conventional stereo matching to
a Panum band
We looked at two dense stereo matching algorithms
which are considered competitive [15], one referred to as
BVZ[4]thatusesgraph-cutoptimization; theotherKZalso
using graph-cut but also with explicit allowance for occlu-
sion [10]. The question is whether these algorithms can be
applied to the Panum problem simply by reducing the dis-
parity range available for matching. Following conventions
for stereo testing, we took the four image pairs Tsukuba,
sawtooth, venus and map on the Middlebury database1, to-
gether with supplied ground truth, and calculated error mea-
sures. Over foreground, an error is counted wherever com-
puted disparity is in error by more than 1 pixel. For back-
ground regions, the true disparity is of course out of range,
so an incorrect disparity is considered to be as follows:
1http://cat.middlebury.edu/stereo
Figure 3. Stereo matching error rates for the KZ algorithm
constrained to the Panum band. The error rate data show that
background error is greatly magnified when the Panum band con-
straint is imposed, while foreground error barely changes.
BVZ: not at the endstop of the Panum band;
KZ: wherever the state is not occluded: xm?= O, and the
disparity is not at the endstop of the Panum band.
In each case we used the operating parameters recom-
mended for the algorithms: for BVZ, disparity gradient
penalty [3] λ = 20, and for KZ, λ = 10 with occlusion
penalty [10] K = 50.
Results for the KZ algorithm are shown in figure 3.
Results for the (simpler) BVZ algorithm are similar, but
omitted here. In both cases, disparity error over foreground
regions is not much affected by the Panum band restriction
(in fact improved slightly because of the added constraint).
Over background regions, error for both algorithms rises
substantially. The conventional stereo algorithms simply
fail over the background, generating many random dispari-
ties.
The conclusion from this experiment is that the con-
ventional algorithms, when restricted to the Panum band,
work perfectly well over foreground regions. All that is
required to make the algorithms usable, is reliable iden-
tification of those pixels whose disparities fall within the
band. In other words, successful Panum-band stereo could
be achieved if only segmentation into foreground (within
band) and background could be achieved reliably. There-
fore the remainder of the paper considers the problem
of foreground/background segmentation under the Panum
band constraint.
3.1. Can graph-cut stereo be adapted for segmenta-
tion?
One possibility, for a more subtle adaptation of the exist-
ing KZ algorithm, is that its ability to label occlusions could
Page 4
Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 2006 4
Figure 4. Segmentation error from conventional graph-cut
stereo. The KZ algorithm is tuned here to use its occlusion la-
bels to indicate background, but error rates are very high compared
withwhatisattainableusingLGCsegmentationwiththefullrange
of disparities.
be extended to label background points. This is reasonable
because, given the restricted Panum band, both occlusions
and background points represent failures to obtain a stereo
match. In order to give the KZ algorithm every chance
of success, parameter value K was explored to minimise
labelling error rate and this yielded parameters λ = 10,
K = 10, quite different from the optimal operating point
for regular use of KZ for stereo matching. Results are given
in figure 4, showing segmentation error for each of the six
test videos in the Microsoft stereo-segmentation database2.
Labelling error-rates (equal error-rate) for the 6 datasets
vary between 7% and 41%, and are in all cases many times
worse than are obtainable from full, unconstrained stereo
segmentation, in the form of LGC (Layered Graph Cut) [9].
Since the aim, with Panum-band stereo, is to approach the
quality of full, unconstrained stereo, the performance of KZ
in this mode is far from acceptable.
3.2. Segmentation of the in-band image region
Given the results and discussion so far, the aim of the
remainder of the paper is to develop a segmentation algo-
rithm, to label all “foreground” points with an accuracy ap-
proaching full LGC, but without any computation out of the
Panum band. Segmentation could be done in one of two
ways. Either it could proceed simultaneously with com-
putation of disparity; or in a separate pass, preceding the
computation of disparity. Simultaneous segmentation and
disparity determination perhaps has the attraction of greater
elegance. On the other hand, separate segmentation could
be achieved by marginalising the stereo likelihood over dis-
parities d, and then performing energy minimisation with
respect to labels x only. A separate labelling pass should
2research.microsoft.com/vision/cambridge/i2i
surely be more efficient, since full consideration of dispar-
ity need then only occur within the foreground region.
4. Stereo segmentation
First we summarise the full LGC (Layered Graph Cut)
algorithm [9] for segmentation by marginalisation of stereo
likelihoods. Then in the next section the full LGC energy
function is approximated to stay within the Panum band re-
striction.
For LGC, the matched state M is further subdivided into
foreground match F and background match B. LGC deter-
mines segmentation x as the minimum of an energy func-
tion E(z,x;Θ), in which stereo disparity d does not appear
explicitly. Instead, the stereo match likelihood (2) in sec-
tion 2.2 is marginalised over disparity, aggregating support
from each putative match, to give a likelihood p(L | x,R)
for each of the three label-types occurring in x: fore-
ground, background and occlusion (F,B,O). Segmentation
is therefore a ternary problem, and it can be solved (approx-
imately) by iterative application of a binary graph-cut al-
gorithm, augmented for a multi-label problem by so-called
α-expansion [4]. The energy function for LGC is composed
of two terms:
E(z,x;Θ,Φ) = V (z,x;Θ) + US(z,x,Φ)
(4)
representing energies for spatial coherence/contrast and
stereo likelihood.
4.1. Encouraging coherence
The coherence energy V (z,x;Θ) is a sum, over cliques,
of pairwise energies with potential coefficients Fm,m? now
defined as follows. Cliques consist of horizontal, verti-
cal and diagonal neighbours on the square grid of pixels.
For vertical and diagonal cliques it acts as a switch ac-
tive across a transition in or out of the foreground state:
Fm,m?[x,x?] = γ if exactly one of the variables x,x?equals
F, and Fm,m?[x,x?] = 0 otherwise. Horizontal cliques,
along epipolar lines, inherit the same cost structure, ex-
cept that certain transitions are disallowed on geometric
grounds. These constraints are imposed via infinite cost
penalties:
Fm,m?[x = F,x?= O] = ∞; Fm,m?[x = O,x?= B] = ∞.
where [9] γ = log(2√WMWO) and parameters WMand
WO are the mean widths (in pixels) of matched and oc-
cluded regions respectively.
4.2.Encouragingboundarieswherecontrastishigh
A tendency for segmentation boundaries in images to
align with contours of high contrast is achieved by defin-
ing prior penalties Fk,k? which are suppressed where image
Page 5
Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20065
contrast is high [3, 4, 11], multiplying them by a discount
factor C∗
factor ?/(1 + ?) wherever the contrast across (Lm,Lm?) is
high — see [9] for details. Previously, maximal discount-
ing has been obtained [3] by setting ? = 0. Here, as in
stereo segmentation [10], ? = 1 tends to give the best re-
sults, though sensitivity to the precise value of ? is relatively
mild.
m,m?(Lm,Lm?) which suppresses the penalty by a
4.3. Foreground likelihood
The remaining term in (4) is US(z,x) which captures the
influence of stereo matching likelihood on the probability of
a particular segmentation. It is defined to be
?
where US
US(z,x) =
m
US
m(xm)
(5)
m(xm) = −logp(Lm| xm= F,R).
Now, marginalising out disparity, foreground likelihood is
?
where, from (2),
(6)
p(Lm|xm= F,R) =
d
p(Lm|dm= d,R)p(dm= d|xm= F)
(7)
p(Lm| dm= d,R) ∝ f(L,d,R) = exp−UM
m(xm,dm),
(8)
using the log-likelihood ratio defined in (3). As a shorthand,
we write:
?
and, as before, in terms of likelihood ratios, this becomes:
p(L | F) =
d
p(L | d,R)p(d | F)
(9)
L(L | F) ≡p(L | F)
p(L | O)=
?
d
f(L,d,R)p(d | F)
(10)
where f(L,d,R) is the match/non-match likelihood ratio
as above.
4.4. Background likelihood
Since the distribution p(dm= d | xm= F) is defined to
be zero outside the Panum fusional area, it is perfectly pos-
sible, under the Panum assumptions, to compute L(L | F)
in (10). However, the same cannot be said for
L(L | B) ≡p(L | B)
p(L | O)=
?
d
L(L | d)p(d | B)
(11)
since the corresponding summation is entirely outside the
Panum band DFof disparities, in that p(d | B) is non-zero
only outside the Panum band. Each pixel Lmwould there-
fore have to be compared with pixels in the right image R
that are unreachable because they are outside the band.
Figure 5. Segmentation error using a simple threshold in place
of background likelihood. Error curves are shown as a func-
tion of threshold θ for six subjects from the Microsoft database.
(Error-rates are total foreground and background error, averaged
over each sequence.) Horizontal dashed lines show corresponding
error rates for full (non-Panum) LGC segmentation, as a bench-
mark. The substantial shortfall suggests that it should be possible
to improve considerably on simple thresholding.
4.5.Asimplethresholdasproxyforthebackground
likelihood?
Before going to some trouble to approximate the back-
ground likelihood, it is worth looking at the simplest possi-
ble approach, and treating the problem as novelty detection.
Inthatview, wehaveamodelL(L|F)forthepositiveclass,
and no model of the background class. Then the likelihood
ratioclassifierL(L|F) > L(L|B)issimplifiedtoathresh-
old rule, replacing the background likelihood by a constant
L(L | B) = θ. Segmentation under this model, for variable
threshold θ, is exhibited in figure 5. It appears that a con-
stant threshold θ = 1 yields close to the best error for each
of the 6 datasets, so there would be no need for an adaptive
algorithm. However, the best error rate achieved is between
2 and 8 times higher than the error achieved (dashed lines)
by full LGC. Again, therefore, there is strong motivation
to look for a model and an algorithm that performs better
under the Panum-band restriction.
5. The Panum Proxy algorithm
In the previous section, it was shown that the Panum-
band constraint means that information required for com-
puting background likelihood L(L|B) is missing, and that
replacing L(L|B) with a simple threshold constant gives
poor results. Therefore in this section an approximation for
L(L|B) is developed.
5.1. Deriving the approximate likelihood
We assume that p(d | F) is uniform over the Panum band
so that p(d | F) = 1/|DF| and similarly, for the background,