Scene flow estimation by growing correspondence seeds.
ABSTRACT A simple seed growing algorithm for estimating scene flow in a stereo setup is presented. Two calibrated and synchronized cameras observe a scene and output a sequence of image pairs. The algorithm simultaneously computes a disparity map between the image pairs and optical flow maps between consecutive images. This, together with calibration data, is an equivalent representation of the 3D scene flow, i.e. a 3D velocity vector is associated with each reconstructed point. The proposed method starts from correspondence seeds and propagates these correspondences to their neighborhood. It is accurate for complex scenes with large motions and produces temporally-coherent stereo disparity and optical flow results. The algorithm is fast due to inherent search space reduction. An explicit comparison with recent methods of spatiotemporal stereo and variational optical and scene flow is provided.
-
Citations (0)
-
Cited In (0)
Page 1
Scene Flow Estimation by Growing Correspondence Seeds
JanˇCech, Jordi Sanchez-Riera, Radu Horaud
INRIA Rhˆ one-Alpes, 38330 Montbonnot, France
{jan.cech,jordi.sanchez-riera,radu.horaud}@inrialpes.fr
Abstract
A simple seed growing algorithm for estimating scene
flow in a stereo setup is presented. Two calibrated and
synchronized cameras observe a scene and output a se-
quence of image pairs. The algorithm simultaneously com-
putes a disparity map between the image pairs and opti-
cal flow maps between consecutive images. This, together
with calibration data, is an equivalent representation of
the 3D scene flow, i.e. a 3D velocity vector is associated
with each reconstructed point. The proposed method starts
from correspondence seeds and propagates these corre-
spondences to their neighborhood. It is accurate for com-
plex scenes with large motions and produces temporally-
coherent stereo disparity and optical flow results. The al-
gorithm is fast due to inherent search space reduction. An
explicit comparison with recent methods of spatiotemporal
stereo and variational optical and scene flow is provided.
1. Introduction
A sequence of image pairs gathered with calibrated and
synchronized cameras contains more information to esti-
mate depth and 3D motion than a single stereopair or a
single image sequence. There are approaches [17, 15, 14]
which exploit the extra temporal information to estimate
disparity maps, but do not estimate the motion explicitly,
we call them a spatiotemporal stereo.
Other methods estimate a complete scene flow benefit-
ing from a coupled stereo and optical flow correspondence
problem. Scene flow was introduced in [16] as a dense
3D motion field. It can be estimated with: (1) variational
methods [1, 6, 13], which are usually well suited for simple
scenes with a dominant surface; (2) discrete MRF formula-
tions[10,7], whichinvolveexpensivediscreteoptimization,
and (3) local methods finding the correspondences greedily,
which are efficient [5] but not so accurate.
We propose a seed growing algorithm to estimate the
scene flow in a binocular-video setup. A basic principle of
the seed growing methods is that correspondences are found
in a small neighborhood around an initial set of seed corre-
(a)(b) disparity
(c) horizontal optical flow(d) vertical optical flow
Figure 1: Output of the proposed algorithm on ETH dataset
as color coded maps. For disparity, warmer colors are closer
to the camera. In optical flow, green color is zero mo-
tion, warmer colors is left and up motion, colder colors is
right and down motion respectively. Black color denotes
unmatched pixels.
spondences. Thisideahasbeenadoptedinstereo[3,4,9,8],
but to the best of our knowledge, it has not been used for
scene flow. The advantage of such approaches is a fast per-
formance compared to global variational and MRF meth-
ods, and a good accuracy compared to purely local meth-
ods, since neighboring pixel relations are not ignored com-
pletely.
Our proposed algorithm can simultaneously estimate ac-
curate temporally-coherent disparity and optical flow maps
of a scene with a rich 3D structure and large motion be-
tween time instances. Small local variations of disparity
and flows are captured by the growing process while large
displacement are found due to the seeds. Boundaries be-
tween objects and different motions are naturally well pre-
served without smoothing artifacts. Nevertheless, the algo-
rithm produces semi-dense (unambiguous) results only, but
3129
inria-00590274, version 1 - 14 Jul 2011
Author manuscript, published in "IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11) (2011)"
Page 2
Ss
S
ˆSs
ˆS
D0
D1
Fh
Fv
D0
I1
r
I1
l
I0
r
I0
l
predictor
prematcher
grow
scene flow
(GCSF)
(GCS)
grow stereo
Figure 2: Overview of the proposed algorithm (GCSFs).
they are dense enough for many potential applications, see
Fig. 1.
The rest of the paper is organized as follows. In Sec. 2,
we present the proposed algorithm in details. In Sec. 3, we
describe the evaluation and comparison with state-of-the-art
methods. Sec. 4 concludes the paper.
2. Algorithm Description
The proposed algorithm for growing correspondences of
scene flow in a sequence of stereo images (GCSFs) is sum-
marized in Fig. 2. At each time instance t, it takes as input
two epipolarly rectified image pairs, a pair I0
t − 1 (last frame), and the consecutive pair I1
(current frame). The output at each time instance is a dis-
parity map D0holding the stereo correspondences from the
last frame t−1, disparity map D1holding correspondences
found between I1
flow maps Fhand Fvrespectively, encoding the correspon-
dences between consecutive images I0
Notice that having full camera calibration, this represen-
tation fully determines the scene flow, since D0gives a re-
construction of 3D points X0, D1a reconstruction of 3D
points X1(after the motion), and Fh,Fvgives the mapping
between these two sets.
First, a prematcher is run to deliver initial corre-
spondences, the seeds. They are used in subsequent
growing processes. The prematcher finds sparse corre-
spondences of interest points between left and right im-
ages and between consecutive images.
(x0
of 4 pixels, i.e. projections of a 3D point X0∈ X0into
I0
into I1
cal flow correspondences, see Fig. 3. Beside the set of
thesesceneflowseeds, theprematcheralsooutputthestereo
seeds ss = (x0
correspondences between I0
Then, thestereoseedsSsaregrownbyastereoalgorithm
l,I0
l,I1
rfor time
rfor time t
land I1
r, and horizontal and vertical optical
land I1
l.
Each seed s =
l,x0
r,y0,x1
l,x1
r,y1) ∈ S represents a correspondence
l,I0
rand the same 3D point after the motion X1∈ X1
l,I1
r. The seed encapsulates both stereo and opti-
l,x0
r,y0) ∈ Sswhich is a set of two-pixel
land I0
r.
Figure 3: A sequence of consecutive epipolarly rectified
stereo images. A seed correspondence s sketched by filled
circles, its right neighborhood N1by empty circles.
(GCS), which computes a disparity map D0between I0
I0
images are an input of the subsequent algorithm (GCSF),
which jointly grows disparity map D1, and the optical flow
maps Fh, Fv.
The solution at time t contains lots of information about
the solution at time t+1, i.e. when a new frame is available.
This information, is exploited in the proposed algorithm by
predicting the seeds for the growing processes in the next
time instance. Considering the motion of pixels from pre-
vious solution, the predictor estimates new correspondence
seedsˆS andˆSs. These seeds are unified with current seeds
given by the prematcher. It means, that starting from the
second frame, the growing processes work with larger and
richer sets of seeds. The prematcher remains connected for
all frames in order to capture the dynamic scene events in
which objects suddenly appears. This process is repeated
with each subsequent frame.
Details of the algorithm are described below. First, we
describe in detail the procedure for growing the scene flow,
since it is the essential part. Afterward, we give further de-
tails on the rest of the algorithm.
2.1. Growing scene flow (GCSF)
land
r. Disparity map D0together with seeds S and the input
The algorithm is presented in pseudocode as Alg. 1. It
takes as input two rectified image pairs I0
secutive pair I1
a disparity map D0for a previous frame t − 1, and the pa-
rameters α (temporal consistency enforcement), β (optical
flow regularization), and τ (growing threshold). The output
are maps of disparity D1and optical flows Fh, Fv.
First, the algorithm computes a photometric consistency
statistic of the 4-pixel correspondence by average correla-
tion
corr(s)=c11
l,I0
rand the con-
l,I1
r, a set of initial correspondence seeds S,
lr(x1
l,y1
l;x1
r,y1
l)+c01
ll(x0
l,y0
l;x1
3
l,y1
l)+c01
rr(x0
r,y0
r;x1
r,y1
r)
.
(1)
Left-right correlation c11
tered at pixels I1
lris between small windows cen-
l,y1
l(x1
l) and I1
r(x1
l,y1
l). Similarly the cor-
3130
inria-00590274, version 1 - 14 Jul 2011
Page 3
Algorithm 1 Growing the scene flow (GCSF)
Require: rectified images I0
initial correspondence seeds S,
disparity map D0,
parameters α, β, τ.
l, I0
r, I1
l, I1
r,
1: Compute similarity s.c=corr(s)+α for all seeds s ∈ S.
2: repeat
3:
Draw the seed s ∈ S of the best similarity s.c.
4:
if s.c ≥ τ then Update output maps. endif
5:
for each of the four best neighbors i ∈ {1,2,3,4}
t∗
do
6:
ti.c = corrβ
7:
if ti.c≥τ and all pixels in t not matched yet then
8:
Update output maps.
9:
Update the seed queue S = S ∪ {t∗
10:
end if
11:
end for
12: until S is empty.
13: return disparity map D1, flow maps Fh, Fv.
i= (x0
l,x0
r,y0,x1
l,x1
r,y1) = argmax
t∈Ni(s|D0)
corrβ
s(t),
s(t∗
i),
i}.
relations c01
left and right sequences. All the correlations are MNCC
statistics [12] on 5 × 5 pixel widows. Seed correlation s.c
is enhanced by a small positive α to enforce temporal con-
sistency, Step 1. The set S is organized as a correlation
priority queue. The seed s ∈ S is removed from the top of
the queue, Step 3. If its consistency exceeds threshold τ in
Step. 4, output maps are updated by
lland c01
rrare between consecutive images in the
D1(x1
Fh(x1
l,y1) = x1
l,y1) = x1
l− x1
l− x0
r,
(2)
l,
Fv(x1
l,y1) = y1− y0.
For all four neighbors (right, left, up, down) of seed s,
the best correlating candidate in Ni(s|D0) is found, Step 5.
For instance
N1(s|D0) =??
x1
k∈L
(x0
l+ 1,x0
l+ 1,x1
l+ 1 − D0(x0
r+ 1,y1) + (0,0,0,k)?,
l+ 1,y0),y0,
(3)
where L = {(0,0,0),(±1,0,0),(0,±1,0),(0,0,±1)} is a
set of seven local search vectors having the stereo or tem-
poral disparity less or equal to one, see Fig. 3. Notice the
candidates depend on the previous disparity D0. The other
neighbors N2,N3,N4are defined similarly.
The optical flow generally suffers from a well known
aperture problem. This is not completely avoided in a joint
stereo setup. Therefore we regularize assuming the seed
has a correct flow, new candidates having a different flow
are penalized by lower correlation
corr(t)β
s= corr(t) − β||s.f − t.f||1,
(4)
where notation .f = (x1
vector of optical flows of respective seeds s and t, where β
is a small positive constant.
If the highest correlation exceeds a threshold τ and any
of the pixels in t is unmatched so far, then a new match is
found, Step 7. Output maps are updated by (2) in Step 8,
and the found match becomes a new seed, Step 9. Up to
four seeds are created in each growing step. The process
continues until there are no seeds in the queue, Step 12.
l− x0
l,x1
r− x0
r,y1− y0) means a
Default values of algorithm parameters were found em-
pirically and set to α = β = 0.05, τ = 0.6 in all our real-
data experiments. The value of temporal consistency pa-
rameter α in Step 1 is a trade-off between a temporal coher-
ence of the results and an ability to capture fast changes in
the motion. We observed that for α = 0, the results are not
so temporally coherent, certain matches in the 3D surface
were randomly disappearing and reappearing due to noise
or various degradations in the image sequence. Small α > 0
causes that already matched points have a better position in
the priority queue and higher chance to be matched. On
the other hand, when α is too high, we observed matching
errors in sudden changes of object’s motion, since wrong
(incorrectly predicted) seeds were accepted in Step 4.
Parameter β in (4) regularizes the growing process to
handle the aperture problem. When β = 0, we observed ar-
tifacts of the optical flow estimation in edge-like structures.
Growing process finds the matches based on local maxima
of correlation, which need not necessarily correspond to the
correct solution due to various noise in the images. Very
small β > 0 helps. However, when β is too large, the solu-
tion is biased towards seeds and locally flat around them.
The last parameter τ directly controls the trade-off be-
tween the density of the solution and mismatch rate.
Note that MNCC statistic in (1) is not invariant to defor-
mation of local image neighborhoods between correspond-
ing pixels related by optical flow, which occurs due to cam-
era or scene motion. A general assumption, which is hardly
preserved, is a fronto-parallel surface undergoing a fronto-
parallel motion [17]. Nevertheless the statistic is insensi-
tive enough to violations of this assumption. We show in
the experiments that the algorithm works well under non-
trivial motion and non-planar or slanted surfaces. In cases
where this could be a problem, a simple extension would
be to associate a set of parameters capturing the local affine
transformations with the seed, as in [3, 8] in the context of
wide-baseline stereo matching.
2.2. Growing stereo (GCS)
A seed growing algorithm [4] for stereo matching be-
tween images I0
similar in spirit to Alg. 1, however the neighborhoods Ni
are different. This algorithm is reported being not very sen-
land I0
ris used. The growing procedure is
3131
inria-00590274, version 1 - 14 Jul 2011
Page 4
sitive to wrong seeds, which is achieved by a robust match-
ing which selects the final solution among competing corre-
spondence hypotheses from the growing process. In the ex-
periments, we compare this algorithm when run frame-by-
frame with the same algorithm integrated in the proposed
pipeline shown in Fig. 2.
2.3. Prematcher
The task of the prematcher is to deliver sparse corre-
spondences of interest points. This is achieved in our im-
plementation by matching Harris points and tracking them
using multi-level version of LK tracker [11]. The stereo
seeds Ssare simply those Harris points which satisfy the
epipolar constraint, and whose 5×5 MNCC correlation ex-
ceeds threshold τ. The scene flow seeds S are obtained by
tracking the stereo seeds from I0
The point matches which violates the epipolar constraint
between I1
lto I1
land from I0
rto I1
r.
land I1
rare discarded from the set.
The algorithm is not limited to Harris seeds. Any other
seeds, e.g. from wide-baseline matching of distinguished
regions, or other more sophisticated tracking techniques,
could be used.
2.4. Predictor
The predictor estimates seeds for processing of the next
frame based on the current solution and other assumptions
on the motion of points. In our implementation, we use a
simple assumption, that the point moves constantly in the
image plane, i.e. its optical flow remains the same in a sub-
sequent frame. For each matched pixel (x1
predicted seed ˆ s = (ˆ x0
l,y1) in D1, the
r, ˆ y1) is
l, ˆ x0
r, ˆ y0, ˆ x1
l, ˆ x1
ˆ x0
ˆ x0
ˆ y0=y1,
l=x1
r=x1
l,
l− D1(x1
ˆ x1
ˆ x1
ˆ y1=y1+ Fv(x1
l=x1
r=ˆ x0
l+ Fh(x1
r+ (ˆ x0
l,y1),
r− x0
l,y1),
l,y1),
r),
(5)
where x0
y0= y1− Fh(x1
in (2). Notice that for stereo seed ˆ ss= (ˆ x0
parity map D1is only ‘translated’ into the seed representa-
tion and subsequently grown again by stereo [4] to provide
new disparity map D0. This is important since certain pix-
els may not be matched in D1due to motion occlusions,
and they are hereby recovered.
r= x0
l− D0(x0
l,y1). It follows from the output maps
l,y0) and x0
l= x1
l− Fh(x1
l,y1),
l, ˆ x0
r, ˆ y0), the dis-
The constant motion assumption is rather na¨ ıve. More
correct would be to use more sophisticated dynamic motion
models and Kalman filtering. Nevertheless, despite the sim-
plicity, thepredictorusuallyhelpsproducingenoughcorrect
seeds. When the assumption of the constant motion is vio-
lated, the affected seeds become wrong with low correlation
and they are placed in an unfavorable position in the prior-
ity queue. Such regions are grown from other correct seeds
(sparse Harris seeds from prematcher, or other seeds where
the assumption holds).
2.5. Complexity of the algorithm
The algorithm has low complexity. Assuming n × n
images, any algorithm searching the correspondences ex-
haustively has the complexity at least O(n5) per frame [5],
which is the size of the search space without limiting the
ranges for disparity and horizontal and vertical flow. How-
ever, the proposed algorithm has the complexity O(n2) per
frame, since it searches the correspondences in a neighbor-
hood of the seeds tracing discrete manifolds of a high cor-
relation defined above the pixels of the reference image.
3. Experiments
The experiments demonstrate that the proposed algo-
rithm produces accurate semi-dense results and that it ben-
efits from a joint disparity – optical flow formulation in a
sequence of stereo images. The proposed method is com-
pared with a recent spatiotemporal stereo algorithm by Siz-
intsev et al. [15], with a variational scene flow algorithm by
Huguet and Devernay [6], and with a recent optical flow by
Brox and Malik [2]. The experiments show that our algo-
rithm is more precise in disparity than [15] and [6], and in
optical flow comparable to [6], and slightly inferior to [2].
3.1. Synthetic Data
To quantitatively evaluate and compare the methods, we
carried out an experiment with simulated data. The syn-
thetic scene consists of three moving objects: a sphere per-
forming a complicated rotation while moving slowly to the
right and away from the cameras, a small vertical bar mov-
ing very fast to the left (30 pixels/frame), and a slanted
background plane moving towards the cameras. The scene
was textured randomly with a white noise, see Fig. 4. The
scene was synthesized using Blender. The resulting se-
quence has 25 frames of stereopair images and each frame
has associated ground-truth disparity, optical flow maps,
and maps of stereo and motion occlusions.
The algorithms were tested under noise perturbation of
data. An independent Gaussian noise was added into each
image of the stereo sequence. The experiment was per-
formed with several noise levels, starting from σ = 0 (no
noise) up to σ = 1 where the variation of the noise is the
same as of the image signal.
For all the experiments, we measured an average ratio
of correctly matched pixels in non-occluded regions, i.e.
number of all pixels without mismatches (error ≥ 1 pixel)
and non-matches divided by total number of pixels, over all
frames in the sequence. Notice, this evaluation is very strict
for algorithms which do not give fully dense results, like
3132
inria-00590274, version 1 - 14 Jul 2011
Page 5
Images
D0
Fh
Fv
(a) I0
l
(b) ground truth(c) ground truth(d) ground truth
(e) I1
l
(f) GCSFs(g) GCSFs(h) GCSFs
(i) I0
r
(j) Sizintsev-2009 (k) Brox-2010(l) Brox-2010
(m) I1
r
(n) Huguet-2007 (o) Huguet-2007 (p) Huguet-2007
Figure 4: Synthetic experiment. Disparity and optical flow
maps of the 6th frame of the sequence: Ground-truth maps
with marked occlusions, results of tested algorithms.
ours. However this is an easy way to simply compare semi-
dense and fully-dense results. On the other hand, since the
mismatches are counted the same as unmatched pixels, we
relax the correlation threshold τ = 0 for all synthetic ex-
periments, other parameters remained of the default values
(α = β = 0.05). This is the only exception in all the exper-
iments in this paper.
This statistic was measured for both disparity and optical
flow errors. Optical flow is usually evaluated by average an-
gular error, however the proposed algorithm is of the pixel
level accuracy and therefore this usual evaluation would not
be suitable. We understand the optical flow as pixel match-
ing problem, similar to stereo without epipolar constraint.
It is important to capture gross errors of the optical flow es-
timates, i.e. mismatches by more than 1 pixel error. This
evaluation is again fair for classical sub-pixel optical flow
methods, sincethe ground-truthis provided witha sub-pixel
precision.
Results of the experiment are shown in Fig. 5a. In case
of stereo, we compared the proposed algorithm (GCSFs)
which jointly estimates disparity and optical flow with: a
seed growing algorithm which computes disparity maps
frame-by-frame independently [4] (GCS), scene flow algo-
rithm [6] (Huguet-2007), and the spatiotemporal stereo [15]
(Sizintsev-2009). We can see, there is not much differ-
ence for GCSFs and GCS for low level of noise, however
0 0.20.4
σ (noise)
0.60.81
0.6
0.7
0.8
0.9
1
ratio of correct pixels
Disparity
Sizintsev−2009
Huguet−2007
GCS
GCSFs
0 0.2 0.4
σ (noise)
0.60.81
0.6
0.7
0.8
0.9
1
ratio of correct pixels
Optical flow
Brox−2010
Huguet−2007
GCF
GCSFs
(a) The error statistics evaluated over the entire scene
0 0.20.4
σ (noise)
0.60.81
0.5
0.6
0.7
0.8
0.9
1
ratio of correct pixels
Disparity
Sizintsev−2009
Huguet−2007
GCS
GCSFs
00.20.4
σ (noise)
0.60.81
0.4
0.5
0.6
0.7
0.8
0.9
1
ratio of correct pixels
Optical flow
Brox−2010
Huguet−2007
GCF
GCSFs
(b) The error statistics evaluated only in the area of the thin vertical bar.
Figure 5: Algorithm accuracy under contamination with a
Gaussian Noise. The signal has equal variance as the noise
for σ = 1.
the GCSFs is more stable for higher level of noise. Algo-
rithm [15], while performing well in slow moving regions,
has severe difficulties with the quickly moving bar even
without noise, see Fig. 4j, which causes its inferior perfor-
mance compared to the proposed method. Algorithm [6]
has also severe difficulties with this scene. Corresponding
disparity map of GCSFs is shown in Fig. 4f. We can see
no significant mismatches in either part of the scene, object
boundaries are well preserved except for small phenomena
due to fluctuations of the window similarity statistic. There
are also small mismatches in occluded regions, since the
threshold τ is relaxed, but they are not included in the eval-
uation.
In case of optical flow, we compared the flow pro-
vided by proposed GCSFs algorithm with another seed
growing algorithm which frame-by-frame independently
searches the stereo-correspondences without epipolar con-
straint (GCF). This growing mechanism was used in [3].
Additionally we compare this with a recent variational
method which can handle large displacement [2] (Brox-
2010) and with the scene flow [6] (Huguet-2007).
can see, the results are even slightly better without noise
for GCF then for GCSFs. This is because GCF allows
non-bijective matching, while GCSFs insists on uniqueness
which may cause small 1-pixel gaps of unmatched pixels
between different motion layers. However, with increasing
level of noise GCSFs outperforms its frame-by-frame seed
growing counterpart. Results of [2] an [6] are compara-
We
3133
inria-00590274, version 1 - 14 Jul 2011
Page 6
ble with GCSFs for low level of noise. For stronger noise
these methods are significantly better than GCSFs. This is
natural, since these global methods have reported excellent
properties under perturbation by this kind of noise. Op-
tical flow maps of GCSFs are shown in Fig. 4g–4h. Ob-
ject and motion boundaries are well preserved, there are no
clear mismatches, there are a few 1-pixel gaps as mentioned
above. Notice that, the motion occlusion on the bar, which
is due to its motion behind the sphere in the next frame, has
a ‘correct’ motion estimate, despite there is no evidence in
data. This is a side effect of the prediction. Optical flow
maps of [2] are shown in Fig. 4k–4l. They are very precise
inside the objects, however visually, there are some imper-
fections in motion boundaries of the objects.
Although the plot of [6] suggests its good overall perfor-
mance, there are strong artifacts around the quickly moving
bar, see Fig. 4o–4p. Since the bar is relatively small with re-
spect to the rest of the image, where the algorithm performs
excellently, the error statistics do not reflect visually dis-
turbing artifacts. Therefore, we evaluated the error statistics
additionally in the area of the vertical bar only, see Fig. 5b.
Then, we can see the low performance of [6] compared to
other algorithms.
The favorable results of the proposed GCSFs algorithm
compared to the frame-by-frame independent seed growing
methods are a consequence of: (1) joint disparity and opti-
cal flow estimates which constrain each other, and (2) good
temporal consistency and coherence. The mechanism is the
following. When data are weak due to noise, there is a lack
of correctly matched seeds and the growing process is either
stopped early (by the condition in Step 7 of Alg. 1) for con-
servative choice of threshold τ, or produces mismatches if
τ is relaxed. However, if we feed partially grown disparity
and optical flow maps as the seeds to GCSF algorithm (us-
ing the predictor), it grows them further if they were correct.
This effect is repeated, and after certain number of frames,
high quality seeds are accumulated.
3.2. Real data
The proposed algorithm was tested on real data as well.
For all these experiments, we used default values of param-
eters of the proposed GCSFs algorithm, α = β = 0.05,
τ = 0.6. We show results on CAVA dataset of INRIA1,
where the stereo camera is static, and on the dataset of ETH
Z¨ urich2acquired by a mobile stereo platform.The results of
tested algorithms are shown in Fig. 6 and 7 as disparity D1
and optical flow Fh,Fvmaps.
For INRIA dataset, the results of the proposed GCSFs al-
gorithm, Fig. 6b–6d, are sufficiently dense even for weakly
textured office environment. Important scene structures are
matched. Notice sharply preserved boundaries between ob-
1http://perception.inrialpes.fr/CAVA_Dataset/
2http://www.vision.ee.ethz.ch/˜aess/dataset/
jects in both disparity and optical flow. We can see a left-
down motion of the man coming through the door, which
are closing afterward performing a slower left motion. One
ofthewomeniswalkingtotherighttoreachthechair, while
moving her arm down. We can also recognize a hand ges-
ticulation of the sitting man.
ETH dataset represents a complex scene with both cam-
era forward motion and motion of pedestrians. There are
up to 30 pixel displacements between consecutive frames.
In our results, Fig. 7b–7d, we can see a motion of the pla-
nar sidewalk close to cameras and well captured depth and
motion boundaries of the people walking. There are only
few small mismatches which are visible in disparity map.
This is in the region of the leftmost building which effects
complicated non-Lambertian mirror like reflections. Some
small mismatches can be found in optical flow in edge-like
structures, which are consequence of improperly handled
aperture problem.
Results of the spatiotemporal stereo [15] can be seen in
Fig. 6f and Fig. 7f. Disparity maps were thresholded ac-
cording to a stequel significance map to remove spurious
matches. The threshold was set to 0.4 according to author’s
recommendation. After the thresholding, the disparity map
on INRIA has roughly the same density as our result. How-
ever, the results are not so precise. It seems that all objects
are fattened and especially those which moves in front of
the weakly textured regions, see the walking woman and
the man coming through the door in Fig. 6f. These artifacts
are probably caused by the large spatiotemporal extent of
the matching elements (stequels). The method has severe
difficulties with the ETH sequence. The part of the scene
which is close to cameras and hereby undergoes a fast mo-
tion is not captured by this algorithm, Fig. 7f. Matching of
stequels probably does not work well for large displacement
between frames.
Results of the large displacement optical flow [2] are
shown in Fig. 6g–6h and Fig. 7g–7h. They are more or
less consistent with our results, but they are fully dense.
The motion boundaries seem to be a little bit fuzzy, but this
could be only in the motion occluded regions, where there
is no evidence in data. There are a few small patchy mis-
matches in ETH.
Results of the variational scene flow algorithm are
shown [6] in Fig. 6i–6k and Fig. 7i–7k. The disparity maps
are erratic, the algorithm fails dramatically in stereo for
these scenes. This failure is probably due to a complexity
of the scene (many occlusions, complicated motions, and
varying strength of the texture), and perhaps also due to im-
proper initialization and consequent problems with conver-
gence. The optical flow given by this method is surprisingly
much better than the stereo disparity. Nevertheless, we can
see typical artifacts of smoothed motion boundaries, which
is a consequence of the prior term winning over the data.
3134
inria-00590274, version 1 - 14 Jul 2011
Page 7
Images
D1(disparity)
Fh(horizontal motion)
Fv (vertical motion)
(a) I0
l
(b) GCSFs(c) GCSFs(d) GCSFs
(e) I1
l
(f) Sizintsev-2009 [15] (g) Brox-2010 [2](h) Brox-2010 [2]
(i) Huguet-2007 [6] (j) Huguet-2007 [6](k) Huguet-2007 [6]
Figure 6: Real experiments: Results on INRIA dataset. This figure is better seen in the electronic version of the paper.
GCSFs
Sizintsev-2009 [15]
Brox-2010 [2]
Huguet-2007 [6]
1.5 seconds
35 seconds
3 minutes
3 hours
Table 1: Average running time per frame of VGA images.
For both sequences, our results are temporally coherent
without flickering artifacts, which is not the case of results
using [15] and [6]. Results of [2] are fairly stable tempo-
rally, despite computed frame-by-frame.
3.3. Running time of tested algorithms
An average running time per frame of the tested algo-
rithms is shown in Tab. 1. These times were measured on
our synthetic sequence of 640 × 480 images, using a stan-
dard PC (Intel Core 2 2.6 GHz, 6 GB memory, Linux). Our
GCSFs algorithm is faster by order of magnitudes than the
other tested methods. Our implementation is not optimized
and partially in Matlab. For the other algorithms we had
binaries.
4. Conclusions
We presented an algorithm which jointly estimates semi-
dense disparity and optical flow of a stereo sequence by
growing correspondence seeds. We experimentally proved
that results are more accurate and temporally coherent than
frame-by-frame independent algorithms.We tested with two
different publicly available datasets and performed a quanti-
tative ground-truth experiment. We made a fair comparison
with state-of-the art methods spanning over spatiotemporal
stereo, and variational methods for optical and scene flow.
The proposed algorithm is a practically well working
trade-off between simple local methods and theoretically
sound global MRF algorithms, since local relations between
adjacent pixels are considered. It can be also viewed as a
‘semi-supervised’ matching algorithm, where a few initial
seeds are propagated. We plan to investigate properties of
this propagation (growing) as a diffusion process on the cor-
respondence manifold.
Acknowledgement. The research was supported by EC
project FP7-ICT-247525-HUMAVIPS.
3135
inria-00590274, version 1 - 14 Jul 2011
Page 8
Images
D1(disparity)
Fh(horizontal motion)
Fv (vertical motion)
(a) I0
l
(b) GCSFs(c) GCSFs(d) GCSFs
(e) I1
l
(f) Sizintsev-2009 [15](g) Brox-2010 [2](h) Brox-2010 [2]
(i) Huguet-2007 [6](j) Huguet-2007 [6](k) Huguet-2007 [6]
Figure 7: Real experiments: Results on ETH dataset. This figure is better seen in the electronic version of the paper.
References
[1] T. Basha, Y. Moses, and N. Kiryati. Multi-view scene flow
estimation: A view centered variational approach. In CVPR,
2010.
[2] T. Brox and J. Malik. Large displacement optical flow: de-
scriptor matching in variational motion estimation. IEEE
Trans. on PAMI, 2010. In press.
[3] J.ˇCech, J. Matas, and M. Perd’och. Efficient sequential cor-
respondence selection by cosegmentation. IEEE Trans. on
PAMI, 32(9), 2010.
[4] J.ˇCech and R.ˇS´ ara. Efficient sampling of disparity space for
fast and accurate matching. In BenCOS Workshop, CVPR,
2007.
[5] M. Gong. Real-time joint disparity and disparity flow esti-
mation on programmable graphics hardware. CVIU, 113(1),
2009.
[6] F. Huguet and F. Devernay. A variational method for scene
flow estimation from stereo sequences. In ICCV, 2007.
[7] M. Isard and J. MacCormick. Dense motion and disparity
estimation via loopy belief propagation. In ACCV, 2006.
[8] J. Kannala and S. S. Brandt.
matching using match propagation. In CVPR, 2007.
Quasi-dense wide baseline
[9] M. Lhuillier and L. Quan. Match propagation for image-
based modeling and rendering. IEEE Trans. on PAMI, 24(8),
2002.
[10] F. Liu and V. Philomin. Disparity estimation in stereo se-
quences using scene flow. In BMVC, 2009.
[11] B.Lucasand T.Kanade. An iterative imageregistrationtech-
nique with an application to stereo vision. In IJCAI, 1981.
[12] H.P.Moravec. Towardsautomaticvisualobstacleavoidance.
In IJCAI, page 584, 1977.
[13] J.-P. Pons, R. Kerive, O. Faugeras, and G. Hermosillo. Vari-
ational stereovision and 3D scene flow estimation with sta-
tistical similarity measures. In ICCV, 2003.
[14] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodg-
son. Real-time spatiotemporal stereo matching using the
dual-cross-bilateral grid. In ECCV, 2010.
[15] M. Sizintsev and R. P. Wildes. Spatiotemporal stereo via spa-
tiotemporal quadratic element (stequel) matching. In CVPR,
2009.
[16] S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade.
Three-dimensional scene flow. In ICCV, 1999.
[17] L. Zhang, B. Curless, and S. M. Seitz. Spacetime stereo:
Shape recovery for dynamic scenes. In CVPR, 2003.
3136
inria-00590274, version 1 - 14 Jul 2011