Page 1

The Panum Proxy algorithm for dense stereo matching over a volume of interest

A. Agarwal and A. Blake

Microsoft Research Ltd.

7 J J Thomson Ave, Cambridge, CB3 0FB, UK

http://research.microsoft.com/vision/cambridge

Abstract

Stereo matching algorithms conventionally match over a

range of disparities sufficient to encompass all visible 3D

scene points. Human vision however does not do this. It

works over a narrow band of disparities — Panum’s fu-

sional band — whose typical range may be as little as 1/20

of the full range of disparities for visible points. Points in-

side the band are fused visually and the remainder of points

are seen as “diplopic” — that is with double vision. The

Panum band restriction is important also in machine vision,

both with active (pan/tilt) cameras, and with high resolution

cameras and digital pan/tilt.

A probabilistic approach is presented for dense stereo

matching under the Panum band restriction.

shown that existing dense stereo algorithms are inadequate

in this problem setting. Secondly it is shown that the main

problem is segmentation, separating the (left) image into

the areas that fall respectively inside and outside the band.

Thirdly, anapproximationisderivedthatmakesupformiss-

ing out-of-band information with a “proxy” based on image

autocorrelation. Lastly it is shown that the Panum Proxy

algorithm achieves accuracy close to what can be obtained

when the full disparity band is available.

First it is

1. Introduction

In attentional stereo vision, the viewer steers a volume

of interest around the scene. This is a problem that has re-

ceived a good deal of attention in the realms of oculomotor

control[5, 8] and sparse stereo e.g.[14]. In the area of dense

stereo however e.g.[12, 6, 2, 3, 10, 15] the issue of restrict-

ing attention to a volume, with a limited range of depth or

equivalently disparity, has not been addressed. It is of con-

siderable importance from the point of view of efficiency,

particularly with high resolution or head-mounted cameras,

inrestrictingcomputationtoavolumeofinterestwhichmay

be only a small fraction of the visible volume. In principle

also, it is most unsatisfying that conventional stereo algo-

rithms need to explore an irrelevant background, simply in

order to establish significiant properties of the foreground

— a form of the celebrated “frame” problem of Artificial

Intelligence.

1.1. The Panum band

The geometry of the situation is illustrated in figure 1.

For a particular field of view of each camera, potential

matches between left and right images form a diamond-

shaped region in each epipolar plane.

[13] the space of possible matches is restricted further to

the “Panum band” (see figure). This is typically around 5

mrad wide, and cuts down the number of possible foveal

matches by around an order of magnitude. High quality

stereo cameras with narrow fields of view can also benefit

from a Panum band restriction in a similar way.

The motivation for studying Panum band stereo is then

threefold.

1. It is conceptually appealing to develop a stereo algo-

rithm which focuses on a volume of interest, in the manner

known to prevail in human vision. Why should a stereo al-

gorithm expend needless attention to the entire background

of a scene?

2. Computational cost for stereo matching grows linearly

(or faster) with volume of interest. This is true both for both

main components of stereo matching: cost computation and

global optimization (whether by graph-cut (GC), dynamic

programming (DP) or belief propagation (BP)). Restricting

the size of the matching volume is therefore critical for ef-

ficiency. For stereo geometry similar to human vision, the

saving in computational cost is at least an order of magni-

tude, due to the reduced range of depth (disparity). Usually

there is a further factor of saving, due to the concomitant

restriction in image area over which matching occurs. The

best stereo algorithms (GC, DP or BP [15]) do not currently

come close to real time. This is not going to be solved any

time soon by Moore’s law because camera resolution is in-

creasing faster than processing power.

3. Computational cost, we have argued, necessitates the re-

striction of stereo matching to a Panum volume. However,

existing dense stereo algorithms are not capable of satisfac-

In human vision

Page 2

Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20062

Left

Right

a)

m

0

Left

n

Right

disparity

b)

Figure 1. The space of possible matches restricted to a Panum

band. a) View from above of rays in a single epipolar plane, form-

ing a diamond-shaped match space; the Panum band forms a rib-

bon across the diamond and thus cuts down the set of possible

matches. b) The match-space represents the situation in (a) in a

standardised diagram, in which the diamond-shaped match-space

becomes a square, whose sides are respectively a left and right

epipolar line, and restricted to the Panum band as shown. A possi-

ble matching path is shown dashed.

tory operation over a Panum band, as this paper will show.

A new algorithm is needed.

1.2. The Panum Proxy algorithm

The principle of the Panum Proxy stereo algorithm is

therefore as follows.

1.

Compute match scores or likelihoods for disparities

within the Panum band.

2. Aggregate those scores to compute a total likelihood, at

each point, that there is a within-band (foreground) match.

3. The same cannot be done for the background likelihood,

as that would require match scores outside the band.

However, it is shown that an autocorrelation-like measure

can be used to estimate the background likelihood.

4. Use the true foreground likelihood and the estimated

background likelihood, in a graph cut algorithm, to achieve

a segmentation.

5. Once segmentation is complete, perform conventional

stereo matching e.g. [3], but restricted to the image regions

that have been labelled as in-band.

Note that the restriction to the Panum band in the seg-

mentation step 4 is indeed essential, because the complex-

ity of segmentation is dominated by the cost of computing

stereo match scores, and this is linear in match volume.

The resulting stereo disparity map can be used, for ex-

ample, to synthesise a new view, as in figure 2, in which

Figure 2. Fusion and diplopia with the Panum Proxy algo-

rithm. Results of the Panum Proxy algorithm are illustrated here

for a frame from one of the six Microsoft stereo datasets. The

matched stereogram shows fusion within the Panum band but

diplopia — double vision — elsewhere.

case the view is fused within the Panum band, but diplopic

outside it, just as in human stereo vision.

2. Probabilistic framework for stereo matching

First we outline the notation for probabilistic stereo

matching. Pixels in the rectified left and right images are

L = {Lm} and R = {Rn} respectively, and jointly we

denote the two images z = (L,R). Left and right pix-

els are associated by any particular matching path (fig. 1).

Frequently in stereo matching the so-called “ordering con-

straint” is imposed, and this means that each move in figure

1b) is allowed only in the positive quadrant [1, 12]. Stereo

“disparity” is d = {dm, m = 0,...,N} and disparity is

simply related to image coordinates as dm= m − n.

In algorithms that deal explicitly with occlusion [10, 7]

an array x of state variables x = {xm}, takes values

xm ∈ {M,O} according to whether the pixel is matched

or occluded.

This sets up the notation for a path in epipolar match-

space which is a sequence ((d1,x1),(d2,x2),...) of dis-

parities and states. A Gibbs energy E(z,d,x;Θ,Φ) can

be defined for the posterior over all epipolar paths taken

together and notated (d,x), given the image data z. Para-

meters Φ and Θ relate respectively to prior and likelihood

terms in the posterior. Then the Gibbs energy can be glob-

Page 3

Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20063

ally minimised to obtain a segmentation x and disparities

d.

2.1. Prior distribution over matching paths

A Bayesian model for the posterior distribution

p(x,d | z) is set up as a product of prior and likelihood:

p(x,d | z) ∝ p(x,d)p(z | x,d).

The prior distribution p(x,d) ∝ exp−λE0(x,d) is fre-

quently decomposed, in the interests of tractability, as a

Markov model. An MRF (Markov Random Field) prior for

(x,d) is specified as a product of clique potentials Vm,m?

over all pixel pairs (m,m?) ∈ N deemed to be neigh-

bouring in the left image. The potentials are chosen to

favour matches over occlusions, to impose limits on dispar-

ity change along an epipolar line, and to favour figural con-

tinuity between matching paths in adjacent epipolar line-

pairs.

(1)

2.2. Stereo matching likelihood

The stereo likelihood is:

p(z | x,d) ∝

?

m

exp−UM

m(xm,dm)

(2)

where the pixelwise negative log-likelihood ratio, for match

vs. non-match, is

?

where M(...) is a suitable measure of goodness of match

between two patches, often based on normalised sum-

squared difference (SSD) or correlation scores [15].

UM

m(xm,dm) =

M(LP

M0

m, RP

n) if xm= M

if xm= O,

(3)

3. Restricting conventional stereo matching to

a Panum band

We looked at two dense stereo matching algorithms

which are considered competitive [15], one referred to as

BVZ[4]thatusesgraph-cutoptimization; theotherKZalso

using graph-cut but also with explicit allowance for occlu-

sion [10]. The question is whether these algorithms can be

applied to the Panum problem simply by reducing the dis-

parity range available for matching. Following conventions

for stereo testing, we took the four image pairs Tsukuba,

sawtooth, venus and map on the Middlebury database1, to-

gether with supplied ground truth, and calculated error mea-

sures. Over foreground, an error is counted wherever com-

puted disparity is in error by more than 1 pixel. For back-

ground regions, the true disparity is of course out of range,

so an incorrect disparity is considered to be as follows:

1http://cat.middlebury.edu/stereo

Figure 3. Stereo matching error rates for the KZ algorithm

constrained to the Panum band. The error rate data show that

background error is greatly magnified when the Panum band con-

straint is imposed, while foreground error barely changes.

BVZ: not at the endstop of the Panum band;

KZ: wherever the state is not occluded: xm?= O, and the

disparity is not at the endstop of the Panum band.

In each case we used the operating parameters recom-

mended for the algorithms: for BVZ, disparity gradient

penalty [3] λ = 20, and for KZ, λ = 10 with occlusion

penalty [10] K = 50.

Results for the KZ algorithm are shown in figure 3.

Results for the (simpler) BVZ algorithm are similar, but

omitted here. In both cases, disparity error over foreground

regions is not much affected by the Panum band restriction

(in fact improved slightly because of the added constraint).

Over background regions, error for both algorithms rises

substantially. The conventional stereo algorithms simply

fail over the background, generating many random dispari-

ties.

The conclusion from this experiment is that the con-

ventional algorithms, when restricted to the Panum band,

work perfectly well over foreground regions. All that is

required to make the algorithms usable, is reliable iden-

tification of those pixels whose disparities fall within the

band. In other words, successful Panum-band stereo could

be achieved if only segmentation into foreground (within

band) and background could be achieved reliably. There-

fore the remainder of the paper considers the problem

of foreground/background segmentation under the Panum

band constraint.

3.1. Can graph-cut stereo be adapted for segmenta-

tion?

One possibility, for a more subtle adaptation of the exist-

ing KZ algorithm, is that its ability to label occlusions could

Page 4

Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20064

Figure 4. Segmentation error from conventional graph-cut

stereo. The KZ algorithm is tuned here to use its occlusion la-

bels to indicate background, but error rates are very high compared

withwhatisattainableusingLGCsegmentationwiththefullrange

of disparities.

be extended to label background points. This is reasonable

because, given the restricted Panum band, both occlusions

and background points represent failures to obtain a stereo

match. In order to give the KZ algorithm every chance

of success, parameter value K was explored to minimise

labelling error rate and this yielded parameters λ = 10,

K = 10, quite different from the optimal operating point

for regular use of KZ for stereo matching. Results are given

in figure 4, showing segmentation error for each of the six

test videos in the Microsoft stereo-segmentation database2.

Labelling error-rates (equal error-rate) for the 6 datasets

vary between 7% and 41%, and are in all cases many times

worse than are obtainable from full, unconstrained stereo

segmentation, in the form of LGC (Layered Graph Cut) [9].

Since the aim, with Panum-band stereo, is to approach the

quality of full, unconstrained stereo, the performance of KZ

in this mode is far from acceptable.

3.2. Segmentation of the in-band image region

Given the results and discussion so far, the aim of the

remainder of the paper is to develop a segmentation algo-

rithm, to label all “foreground” points with an accuracy ap-

proaching full LGC, but without any computation out of the

Panum band. Segmentation could be done in one of two

ways. Either it could proceed simultaneously with com-

putation of disparity; or in a separate pass, preceding the

computation of disparity. Simultaneous segmentation and

disparity determination perhaps has the attraction of greater

elegance. On the other hand, separate segmentation could

be achieved by marginalising the stereo likelihood over dis-

parities d, and then performing energy minimisation with

respect to labels x only. A separate labelling pass should

2research.microsoft.com/vision/cambridge/i2i

surely be more efficient, since full consideration of dispar-

ity need then only occur within the foreground region.

4. Stereo segmentation

First we summarise the full LGC (Layered Graph Cut)

algorithm [9] for segmentation by marginalisation of stereo

likelihoods. Then in the next section the full LGC energy

function is approximated to stay within the Panum band re-

striction.

For LGC, the matched state M is further subdivided into

foreground match F and background match B. LGC deter-

mines segmentation x as the minimum of an energy func-

tion E(z,x;Θ), in which stereo disparity d does not appear

explicitly. Instead, the stereo match likelihood (2) in sec-

tion 2.2 is marginalised over disparity, aggregating support

from each putative match, to give a likelihood p(L | x,R)

for each of the three label-types occurring in x: fore-

ground, background and occlusion (F,B,O). Segmentation

is therefore a ternary problem, and it can be solved (approx-

imately) by iterative application of a binary graph-cut al-

gorithm, augmented for a multi-label problem by so-called

α-expansion [4]. The energy function for LGC is composed

of two terms:

E(z,x;Θ,Φ) = V (z,x;Θ) + US(z,x,Φ)

(4)

representing energies for spatial coherence/contrast and

stereo likelihood.

4.1. Encouraging coherence

The coherence energy V (z,x;Θ) is a sum, over cliques,

of pairwise energies with potential coefficients Fm,m? now

defined as follows. Cliques consist of horizontal, verti-

cal and diagonal neighbours on the square grid of pixels.

For vertical and diagonal cliques it acts as a switch ac-

tive across a transition in or out of the foreground state:

Fm,m?[x,x?] = γ if exactly one of the variables x,x?equals

F, and Fm,m?[x,x?] = 0 otherwise. Horizontal cliques,

along epipolar lines, inherit the same cost structure, ex-

cept that certain transitions are disallowed on geometric

grounds. These constraints are imposed via infinite cost

penalties:

Fm,m?[x = F,x?= O] = ∞; Fm,m?[x = O,x?= B] = ∞.

where [9] γ = log(2√WMWO) and parameters WMand

WO are the mean widths (in pixels) of matched and oc-

cluded regions respectively.

4.2.Encouragingboundarieswherecontrastishigh

A tendency for segmentation boundaries in images to

align with contours of high contrast is achieved by defin-

ing prior penalties Fk,k? which are suppressed where image

Page 5

Panum Proxy Dense Stereo Matching — Agarwal and Blake, Proc CVPR 20065

contrast is high [3, 4, 11], multiplying them by a discount

factor C∗

factor ?/(1 + ?) wherever the contrast across (Lm,Lm?) is

high — see [9] for details. Previously, maximal discount-

ing has been obtained [3] by setting ? = 0. Here, as in

stereo segmentation [10], ? = 1 tends to give the best re-

sults, though sensitivity to the precise value of ? is relatively

mild.

m,m?(Lm,Lm?) which suppresses the penalty by a

4.3. Foreground likelihood

The remaining term in (4) is US(z,x) which captures the

influence of stereo matching likelihood on the probability of

a particular segmentation. It is defined to be

?

where US

US(z,x) =

m

US

m(xm)

(5)

m(xm) = −logp(Lm| xm= F,R).

Now, marginalising out disparity, foreground likelihood is

?

where, from (2),

(6)

p(Lm|xm= F,R) =

d

p(Lm|dm= d,R)p(dm= d|xm= F)

(7)

p(Lm| dm= d,R) ∝ f(L,d,R) = exp−UM

m(xm,dm),

(8)

using the log-likelihood ratio defined in (3). As a shorthand,

we write:

?

and, as before, in terms of likelihood ratios, this becomes:

p(L | F) =

d

p(L | d,R)p(d | F)

(9)

L(L | F) ≡p(L | F)

p(L | O)=

?

d

f(L,d,R)p(d | F)

(10)

where f(L,d,R) is the match/non-match likelihood ratio

as above.

4.4. Background likelihood

Since the distribution p(dm= d | xm= F) is defined to

be zero outside the Panum fusional area, it is perfectly pos-

sible, under the Panum assumptions, to compute L(L | F)

in (10). However, the same cannot be said for

L(L | B) ≡p(L | B)

p(L | O)=

?

d

L(L | d)p(d | B)

(11)

since the corresponding summation is entirely outside the

Panum band DFof disparities, in that p(d | B) is non-zero

only outside the Panum band. Each pixel Lmwould there-

fore have to be compared with pixels in the right image R

that are unreachable because they are outside the band.

Figure 5. Segmentation error using a simple threshold in place

of background likelihood. Error curves are shown as a func-

tion of threshold θ for six subjects from the Microsoft database.

(Error-rates are total foreground and background error, averaged

over each sequence.) Horizontal dashed lines show corresponding

error rates for full (non-Panum) LGC segmentation, as a bench-

mark. The substantial shortfall suggests that it should be possible

to improve considerably on simple thresholding.

4.5.Asimplethresholdasproxyforthebackground

likelihood?

Before going to some trouble to approximate the back-

ground likelihood, it is worth looking at the simplest possi-

ble approach, and treating the problem as novelty detection.

Inthatview, wehaveamodelL(L|F)forthepositiveclass,

and no model of the background class. Then the likelihood

ratioclassifierL(L|F) > L(L|B)issimplifiedtoathresh-

old rule, replacing the background likelihood by a constant

L(L | B) = θ. Segmentation under this model, for variable

threshold θ, is exhibited in figure 5. It appears that a con-

stant threshold θ = 1 yields close to the best error for each

of the 6 datasets, so there would be no need for an adaptive

algorithm. However, the best error rate achieved is between

2 and 8 times higher than the error achieved (dashed lines)

by full LGC. Again, therefore, there is strong motivation

to look for a model and an algorithm that performs better

under the Panum-band restriction.

5. The Panum Proxy algorithm

In the previous section, it was shown that the Panum-

band constraint means that information required for com-

puting background likelihood L(L|B) is missing, and that

replacing L(L|B) with a simple threshold constant gives

poor results. Therefore in this section an approximation for

L(L|B) is developed.

5.1. Deriving the approximate likelihood

We assume that p(d | F) is uniform over the Panum band

so that p(d | F) = 1/|DF| and similarly, for the background,