Page 1

Motion Detail Preserving Optical Flow Estimation∗

Li Xu Jiaya Jia

The Chinese University of Hong Kong

{xuli,leojia}@cse.cuhk.edu.hk

Yasuyuki Matsushita

Microsoft Research Asia

yasumat@microsoft.com

Abstract

We discuss the cause of a severe optical flow estima-

tion problem that fine motion structures cannot always be

correctly reconstructed in the commonly employed multi-

scale variational framework. Our major finding is that sig-

nificant and abrupt displacement transition wrecks small-

scale motion structures in the coarse-to-fine refinement. A

novel optical flow estimation method is proposed in this

paper to address this issue, which reduces the reliance of

the flow estimates on their initial values propagated from

the coarser level and enables recovering many motion de-

tails in each scale. The contribution of this paper also in-

cludes adaption of the objective function and development

of a new optimization procedure. The effectiveness of our

method is borne out by experiments for both large- and

small-displacement optical flow estimation.

1. Introduction

The variational framework [13], together with the

coarse-to-fine refinement [1], is widely adopted in optical

flow estimation [7, 8]. In the Middlebury optical flow eval-

uation website [2], almost all top-ranked methods use this

strategy.

Brox et al. [6], in computing large-displacement optical

flow, pointed out that if the flow structures are smaller than

their displacements, the latter may not be well estimated.

In this paper, we show that this issue also applies to small-

displacementmotion. TakingFigure1 as anexample,dueto

the camera motion, the foreground toy deer has its motion

significantly differing from that of the background(average

displacements d = −2 and d = 21 respectively). This

example is in fact very challenging for the coarse-to-fine

variational optical flow estimation.

As shown in Figure 1(e), in a coarse level, the nar-

row neck entirely disappears and only the significant back-

ground motion is estimated. This makes the emerging fore-

ground pixels in the finer scale (Figure 1(f)) are with their

actual motion significantly different from the initial esti-

∗The work described in this paper was fully supported by a grant from

the Research Grants Council of the Hong Kong Special Administrative

Region (Project No. 412708).

Coarse Level

Finer Level

(f)

(g)

(e)

(a)(b)

(c) (d)

Figure 1. Motion detail preserving problem. (a)-(b) Two input

patches. (c) Flow estimate using the coarse-to-fine variational set-

ting. (d) Our flow estimate. (e)-(f) Two consecutive levels in the

pyramid. Flow maps are visualized using the color code in (g).

mate from the background, violating the linearization as-

sumption and accordingly leading to a highly unstable mo-

tion estimation process. The final flow result shown in Fig-

ure 1(c) includes considerable errors. This example dis-

closes one problem of the general coarse-to-fine variational

model – that is, the inclination to diminish small motion

structures when spatially significant and abrupt change of

the displacements exists.

We address the motion detail preserving problem in this

paper and propose a unified framework for high-quality

flow estimation in both large and small displacement set-

tings. Central to our method is a novel selection scheme to

compute extensive initial flow vectors in each image level.

This makes the following optimization not completely rely

ontheresultfromthepreviousscaleandthuscapabletocor-

rect estimation error in the top-down refinement. Our flow

result shown in Figure 1(d) contains fine structures. More

examples are included in Section 5 and in the technical re-

port [24].

This paper also contributes in the following ways. First,

we use robust sparse feature matching to produce extended

flow initialization, which helps enforce the linearization

conditionin the variational setting. Second, in the flow esti-

mation model, we propose the selective combination of the

1

Page 2

color and gradient constraints in defining the data term, ro-

bust to outliers. Third, we propose a fast variable-splitting-

based optimization method to refine flow maps. It is highly

parallel, compatible with modern GPU computation archi-

tecture.

Finally, we employ the Mean Field approximationto en-

ablesolvingtheobjectivefunction,whichinvolvesbothdis-

crete and continuous variables and is commonly regarded

challenging to solve. Extensive experiments visually and

quantitatively validate the performance of our approach in

maintaining details for both large- and small-displacement

motion.

2. Related Work

Modern optical flow estimation is usually posed as an

energy minimization problem. Black and Anandan [4] re-

placed the quadratic penalty functions in [13] with non-

convex robust functions to reject outliers. Sun et al. [21]

useda learning-basedframeworkfor boththe matchingcost

(data term) and flow derivatives (smoothness term).

Effortsalsohavebeenputintoimprovingtheopticalflow

constraints. Haussecker and Fleet [12] proposed a phys-

ical constraint to model brightness change. Lempitsky et

al. [14] computed the matching cost only using high fre-

quency components. Pre-filtering on the input images was

suggested in [21] and [17] to handle illumination variation.

These models are flexible, but at the same time require to

solve highly non-convexobjective functions.

In [7], Brox et al. introducedthe gradientconstancycon-

straint to complement the brightness one. L1norm is used

as the penalty function for both the data and smoothness

terms so that the energy is convex after linearization. Sim-

ilar compromise between robustness and complexity was

alsomadein[8, 26]. However,we will showlaterthatdirect

addition of the brightness and gradient terms is not optimal

and propose a selection model to improve it.

Almost all the above methods rely on the coarse-to-fine

warping to deal with motion larger than one pixel [1, 3]. As

discussed in Section 1, this strategy has inherentproblem to

recover small-scale structures in many situations. Adaptive

window is used in stereo matching [19] to handle incorrect

initialization near depth boundary. It assumes at least the

nearby disparities are correctly initialized, which might not

be true for small-scale structures that are totally eliminated

in the coarse level.

Using discrete optimization, Lempitsky et al. [14] pro-

posed fusing flow proposals obtained from different flow

estimation methods with various parameter settings. This is

proven effective to find the optimal values among the given

proposals. But the sufficiency and optimality of the propos-

als cannot be controlled. Also, the methods [16, 13] that

generatethe proposals still employ the conventionalcoarse-

to-fine warping. So it is possible that none of the proposals

-5-4 -3-2 -1012345

0

2

4

6

8

10

α=1

α=0

α=0.5

eff. energy

-5-4-3 -2-1012345

0

2

4

6

8

10

α=1

α=0

α=0.5

eff. energy

(a)(b) Data costs for P1 (c) Data costs for P2

Figure 2. Data cost distributions for two points. (a) shows a patch

of the “RubberWhale” example, where two points P1 (138,278)

and P2 (141,299) are highlighted. (b) and (c) plot different data

costs (vertical axis) for P1 and P2. The ground truth displacement

is moved to 0 (horizontal axis) for illustration.

preserve small-scale motion structures. In comparison, our

method computes a few high confidence flow candidates in

each level, and thus is not entirely dependent of the flow

initialization from the previous scale.

In recent large displacement optical flow estimation,

Brox et al. [6] performedregion-baseddescriptormatching.

Thismethodcaneffectivelyrecoverlargedisplacementflow

by adjusting the objective to favor matching results, albeit

sometimes vulnerable to matching outliers. Steinbr¨ ucker

and Pock [20] extended the numerical scheme of [25] and

searched over all possible values for the large displacement

flow. As discrete labels are used in the search step, results

can be lack of sub-pixel accuracy.

3. Optical Flow Model

The Total Variation/L1model [7, 8, 25] was provenvery

effective in flow estimation. We base our data penalty func-

tion on the L1norm to reject outliers and use the Total Vari-

ation (TV) for regularization.

3.1. Robust Data Function

As the color constancy constraint is often violated when

illumination or exposure changes, adding gradient con-

stancy constraints was proposed [7, 8]. Denoting by u =

(u,v)Tthe flow vector representing the displacement be-

tween frames I1and I2, the data term for flow estimation

can generally be written as

ED(u)=

?

x

1

2τ?∇I2(x + u) − ∇I1(x)?,

1

2?I2(x + u) − I1(x)? +

(1)

where τ is a weight. This function, due to the addition of

two terms, is less accurate in terms of modeling the confi-

dence of pixel correspondence than only using one out of

the two terms that is more appropriate.

Figure 2 shows an example where the patch in (a) con-

tains two points P1 and P2. Their data cost distributions

with respect to different displacement values are plotted

in (b) and (c) respectively (ground truth displacements are

Page 3

shifted to 0). It is noticeable that the color constraint (blue

curvein (b))does not producethe minimumenergynear the

ground truth value because the color constancy is violated

given point P1 moving out of the shadow. Adding the color

and gradient terms using Eq. (1) also results in an undesir-

able distribution (dashed magenta curve) as the cost at the

ground truth point is not even a local minimum. Similarly,

in Figure 2(c), only the color constancy holds as point P2

undergoes rotational motion which alters image gradients.

So it is not ideal as well to add the two constraints in the

data function definition.

The above analysis indicates that a good model should

only incorporate the more informative constraint, but not

both of them. We accordingly define a binary weight map

α(x) : Z2?→ {0,1} to switch between the two terms. The

new data function is expressed as

ED(u,α)=

?

x

α(x)?I2(x + u) − I1(x)? +

(1 − α(x))τ?∇I2(x + u) − ∇I1(x)?. (2)

When α(x) = 1, the gradient constraint is favored. Other-

wise, we implement color constancy. Our empirical investi-

gation provided in Section 5 shows that this model can lead

to higher quality results than various alternatives.

3.2. Edge-Preserving Regularization

The regularization term for optical flow estimation is

generally edge preserving [21, 22]. We define our smooth-

ness term as

ES(u) =

?

x

ω(x)?∇u(x)?,

(3)

where x ∈ Z2, indexing the 2D coordinates and ?∇u(x)?

is the common TV regularizer. ω(x) is the simple structure

adaptive map that maintains motion discontinuity [22]:

ω(x) = exp(−?∇I1?κ),

(4)

where κ = 0.8 in our experiments. The final objective

function is thus defined as

E(u,α) = ED(u,α) + λES(u),

(5)

where λ is the regularization weight.

3.3. Mean Field Approximation

Minimizing Eq. (5) involves simultaneously computing

two fields: continuous u and binary α, which is commonly

regarded as computationally intractable. We employ the

Mean Field (MF) Approximation [9] to simplify the prob-

lem by first canceling out the binary process by integration

over α [24]. The probability of a particular state of the sys-

tem is given by

P(u,α) =1

Ze−βE(u,α),

(6)

where β is the inverse temperature and Z is the partition

function, defined as

Z =

?

{u}

?

{α=0,1}

e−βE(u,α).

(7)

We then compute the sum over all possible αs (as described

in [24]) with the saddle point approximation, yielding

Z ≈ max

u

e−β{λES(u)−?

x

1

βln(e−βDI(u,x)+e−βD∇I(u,x))},

(8)

and the effective potential

Eeff(u) = λES(u) −

?

x

1

βln(e−βDI(u,x)+ e−βD∇I(u,x)).

(9)

where DI(u,x) = ?I2(x + u) − I1(x)? and D∇I(u,x) =

τ?∇I2(x+u)−∇I1(x)?. It indicates that the flow estimate

by minimizing Eq. (9) is actually the Mean Field (MF) ap-

proximation of minimizing Eq. (5). The effective energy is

therefore written as

Eeff(u) = Eeff

D(u) + λES(u),

(10)

where the effective data function is

Eeff

D(u) =

?

x

−1

βln(e−βDI(u,x)+ e−βD∇I(u,x)).

(11)

The optimality of Eq. (10) does not depend on the estimate

of α. Moreover, although Eq. (10) is non-convex and is

not easy to solve using continuous optimization, there is no

obstacle to apply discrete optimization if candidate labels

can be obtained. We propose a robust algorithm, described

in the next section, to estimate u.

Note that the effective data term can also be deemed as

a robust function which selectively combines the color and

gradient constancy constraints. This can be clarified by tak-

ing the partial derivative with respect to the variable u on

the data term, which yields

∂uEeff

D(u) =

?

x

¯ α(x)∂uDI+ (1 − ¯ α(x))∂uD∇I,

where ¯ α(x) is the flow-dependent weight, written as

¯ α(x) =

1

1 + eβ(DI(u,x)−D∇I(u,x)).

(12)

¯ α(x) is the MF-approximation of α(x).

equates that of α(x) (Eq. (2)) in constraint selection. The

cost distributionsofthe new effectivedata functionare plot-

ted in Figures 2(b)and (c) usinggreen crossedcurves. They

indicatethattheeffectiveenergyapproximatestheloweren-

velope of the two data costs (α = 0 and α = 1), which is

exactly what we need for accurate flow estimation.

So its effect

Page 4

Input: a pair of images for optical flow estimation

1. Construct pyramids for both of the images and set the

initial level l = 0 and ul= 0 for all pixels.

2. Propagate ulto level l + 1.

3. Extended Flow Initialization (Section 4.1)

3.1. Detect and match SIFT features in level l + 1.

3.2. Generate multiple flow vectors as candidates.

3.3. Optimize flow using QPBO (Eq. (10)).

4. Continuous Flow Optimization (Section 4.2)

4.1. Compute the α map (Eq.(12)).

4.2. Solve the TV/L1energy function in Eq.(14).

5. Ifl ?= n−1wherenisthetotalnumber of levels, l = l+1

and go to Step 2.

6. Occlusion-aware Refinement (Section 4.3)

Output: optical flow map

Table 1. Overview of our method.

4. Optimization Framework

Traditional optical flow estimation, due to the use of the

variational setting, relies excessively on the coarse-to-fine

refinement. As discussed in Section 1, this process could

fail to recover ubiquitous fine motion details due to the pos-

sible large discrepancy between the initial flow estimates

and the ground truth displacements in each level.

In this section, based on Eeffand ¯ α, we propose an un-

conventional method to optimize Eq. (5). Specifically, be-

cause Eeff

D(u) is not dependant of ¯ α, we first infer multi-

ple high-confidence flow candidates and apply discrete op-

timization to select the optimal ones. With this result, ¯ α in

Eq. (12) is then quickly estimated. We finally improve the

subpixel accuracy of the flow estimates with the estimated

¯ α using continuous optimization. This procedure is found

surprisingly effective to dampen estimation errors caused

by the occasionallybiased flow results from the coarse level

computation.

Our overall algorithm is sketched in Table 1 based on

iteratively processing images in a top-down fashion. The

steps are detailed further below.

4.1. Extended Flow Initialization

We address the general flow initialization problem in

eachimage levelby estimatingmultiple displacementsfrom

the reference to target images using the SIFT features de-

tection and matching [15]. The displacement vectors are

denoted as {uv

n}, as shown in Figure 3(a). They are

newpotentialflowcandidatesexceptthosethatalreadyexist

in the flow map ucpropagated from the immediate coarser

scale (Figure 3(b)). To robustly screen out the duplicated

vectors, we compute the Euclidean distance between each

uv

iand all uc

tered at the reference feature of uv

than 1 (pixel), we regard uv

ias a genuine flow candidate.

We repeat this process for all is, and denote the m new can-

0,...,uv

js where pixel j is within a 5 × 5 window cen-

i. If all results are greater

(a)

(e)

}

(b) (c)(d)

(f)

Figure 3. Extended flow initialization. (a) One of the images over-

laid with the computed feature motion vectors. (b) Flow field uc

propagated from the coarser level. (c) New displacements com-

puted using (a) and (b). They are candidate flow vectors for all

pixels. (d) Optimized flow map u0with respect to the candidates

in the current image level. (e)-(f) show close-ups of (b) and (d).

didates as uv

k0,...,uv

km−1. Figure 3(c) shows an example.

This strategy significantly reduces the system depen-

dence on the coarse-scale flow estimation. It is notable as

well that feature matching initially produces considerable

vectors distributed in the whole image, as shown in Figure

3(a); but they reduce to less than 15 candidates after local

comparison with ucin the given example. Only the most

distinctive flow vectors are retained.

The m new vectors uv

original uc, represent possible motion in the present image

scale. We model the selection of the optimal flow among

the m + 1 candidates for each pixel as a labeling problem,

where the objective function is given in Eq. (10). Upper

of Figure 3(d) demonstrates the color coded labels. This

problem can be solved by discrete optimization efficiently

because on the one hand the number of candidates is small,

thanks to the screening; on the other hand, Eq. (10) does

not involve α, simplifying the computation.

k0,...,uv

km−1, together with the

We adopt the Quadratic Pseudo-Boolean Optimization

(QPBO) [18] to solve this MRF problem. The fusion move

step [14] is used to repeatedlyfuse the candidates until each

gets visited twice. Also, to avoid the checker-board-like ar-

tifacts commonly produced near motion boundaries in dis-

crete optimization, we employ the anisotropic representa-

tion of the TV regularizer ?∇u? = ?∇u?1+ ?∇v?1with

8-neighbor discretization [10]. The output is the flow map

denoted as u0. One result is shown in Figure 3(d), which

contains better recovered motion structure compared to the

map ucin Figure 3(b). Close-ups are shown in Figures 3(e)

and (f).

Our method can work directly on the input images with-

out employing the multscale framework. But it will suffer

from expensive and possibly unstable computation because

hundreds of or more labels might be produced simultane-

ously in the original resolution.

Page 5

(a) u0map (b) ¯ α(x) map (c) urmap (d) Close-up(e) Close-up

Figure 4. Continuous optimization. Errors are further reduced in

this step. (d) and (e) show close-ups of (a) and (c).

4.2. Continuous Flow Optimization

The flow estimates from the previous step are taken into

Eq. (12) to compute ¯ α. One result is shown in Figure 4(b).

Considering that Eq. (11) is highly non-convex, we take ¯ α

back to Eq. (5) for optimization in the variational model.

As color images are used, we denote by Ik

{Ir,Ig,Ib,∂xI,∂yI} the set of channels to be included in

the data term and use αk∈ {¯ α, ¯ α, ¯ α,(1 − ¯ α)τ,(1 − ¯ α)τ}

to represent the corresponding weights. Then the energy in

Eq. (5) is written as

∈

E(u) =

?

x

?

k

αk(x)|Ik

2(x + u) − Ik

1(x)| + λ(x)?∇u(x)?,

(13)

where λ(x) := λω(x). With the initially computed flow

u0from the previous step, we solve for the flow increments

du = (du,dv)Tby minimizing Eq. (13). The final flow

vector is u = u0+ du. By convention, the Taylor expan-

sion of Eq. (13) at point x + u0yields the linearized func-

tion

E(u) =

?

x

?

k

αk(x)|Ik

xdu + Ik

ydv + Ik

t|

+λ(x)?∇(u0+ du)(x)?

(14)

given small du. In Eq. (14), Ix = ∂xI2(x + u0), Iy =

∂yI2(x + u0), and It= I2(x + u0) − I1(x). To preserve

motion discontinuity, we employ the rotational invariant

isotropic form of the TV regularizer, written as

?∇u? =

?(∂xu)2+ (∂yu)2+ (∂xv)2+ (∂yv)2.

(15)

Our Solver

probleminto three simpler ones, each of which can have the

globally optimal solution. The key technique is a variable-

splitting method with auxiliary variables p and w, repre-

sentingthesubstituteddatacostandflowderivativesrespec-

tively, to move a few terms out of the non-differentiable L1

norm expression. This scheme is found very efficient and

essential to produce high quality results.

The derivatives of each flow vector comprise four el-

ements, i.e., ∇du = (∂xdu,∂ydu,∂xdv,∂ydv)T.

each element, we introduce a corresponding auxiliary vari-

able.The set of the variables is denoted as w

(wdux,wduy,wdvx,wdvy)T. Then Eq. (14) can be trans-

formed into

1

2η?Ik

We propose decomposing the optimization

For

=

?

x

?

k

xdu + Ik

ydv + Ik

t− pk?2+ αk?pk?

+1

2θ?∇du − w?2+ λ?∇u0+ w?.

(16)

In this function,

encourages that pkapproaches Ik

1

2θ?∇du−w?2+λ?∇u0+ w?2makes w similar to ∇du.

It also can be observed that Eq. (16) is equivalent to Eq.

(14) upon convergencewhen θ → 0 and η → 0.

Besides efficiency and reliability, Eq. (16) makes opti-

mization highly parallel, fully compatible with GPU accel-

eration. The result optimality is guaranteed in each step.

Our algorithm proceeds with the following iterations where

the initial u = u0.

1

2η?Ik

xdu + Ik

ydv + Ik

xdu + Ik

t− pk?2+ αk?pk?

ydv + Ik

t, and

1. Fix u to estimate p. The simplified objective function is

min

?

x

?

k

1

2η?Ik

xdu + Ik

ydv + Ik

t− pk?2+ αk?pk?.

(17)

Single variable optimization can be achieved in this step.

The optimal solution is given by the shrinkage formula [11]

pk= sign(ok)max(|ok| − ηαk,0),

(18)

where ok:= Ik

xdu+Ik

ydv+Ik

tis the optical flow constraint.

2. Fix u and solve for w. The function reduces to

min

?

x

1

2θ?∇du − w?2+ λ?∇u0+ w?2.

(19)

Similarly, unique solution is guaranteed by the shrinkage

formula

wdux= max(?∇u?2− θλ,0)

∂xu

?∇u?2

− ∂xu0,

(20)

where u = u0+ du. Solutions for wduy, wdvx, and wdvy

can similarly be derived. The computation in this step is

also quick and is highly parallel in nature.

3. Fix w,p and solve for u. The objective function is

min

?

x

?

k

1

2η?Ik

xdu + Ik

ydv + Ik

t− pk?2+1

2θ?∇du − w?2.

(21)

It is quadratic and thus the corresponding Euler-Lagrange

equations of Eq. (21) are linear w.r.t. du and dv. Globally

optimal solution can be directly obtained by solving the lin-

ear system [13] in this step.

Our method iterates among optimizing (18), (20) and

(21) until convergence. Note that θ and η are critical pa-

rameters that should have very small values. It was found

howeverfixing them constants typicallyresults in slow con-

vergence. We thus adopt the continuation scheme [11],

which initially sets θ and η to relatively large values to al-

lowwarm-starting, andthendecreasesthem initerationsto-

ward the desired convergence. Our algorithm is sketched in

Table 2, where ηminand θminare set to 0.1 and 0.01 respec-

tively. η0and θ0are the respective initial values, configured

as η0= 3n×ηminand θ0= 3n×θmin, where n denotes the

number of iterations. More explanations are given in [24].

Page 6

Continuous Flow Optimization

η ← η0

repeat

Compute pkusing Eq.(18)

θ ← θ0

repeat

Compute w using Eq. (20).

Compute du by solving Eq. (21).

θ ← θ/3

until θ < θmin

η ← η/3

until η < ηmin

Table 2. Algorithm for continuous flow optimization.

(a) Occlusion and flow

Figure 5. Occlusion-aware refinement. (a) Flow estimate overlaid

with the occlusion map (o(x) > 0.5). (b) and (c) show results

before and after the final refinement in an image scale.

(b) ur

(c) Final result

Figures 4(d) and (e) show flow maps before and after the

continuous refinement in an image scale. We denote by ur

the refined flow map.

4.3. Occlusion-Aware Refinement

Our final step is for handling large occlusion in the com-

puted flow map. Although cross-checking is effective in

occlusion detection, it needs to compute flow fields bidirec-

tionally. Our strategy is based on an observation that mul-

tiple pixels mapping to the same point in the target image

using forward warping are possibly occluded by each other.

Thus, we detect occlusion using the mappinguniqueness

criterion [5], expressed as

o(x) = T(m(x + u(x)) − 1,0,1),

(22)

where m(x) is the count of reference pixels mapped to po-

sition x + u(x) in the target view using forward warping.

T(a,l,h) is a function that truncates the value of a if it is

out of [l,h]. Eq. (22) indicates if there exist more than

one referencepixels mapping to x+u(x), the occlusion la-

bel for the reference x is set. Although this simple method

sometimes fattens the occlusionregion,it seldom leaves out

true occluded pixels, and thus does not harm the final flow

estimation.

Our measure of the data confidence based on the occlu-

sion detection is expressed as

c(x) = max(1 − o(x),0.01).

(23)

R.W.Urban2 Avg.

2

2.5

3

3.5

4

4.5

5

5.5

AAE

α=1

α=0

α=0.5

ours

Figure 6. Flow estimation errors w.r.t. different αs.

The larger o(x) is, the less we trust the data term. 0.01 is

to make c(x) always larger than 0. The final energy used to

refine the disparity map is

E′(u) = c(x)ED(u) + λES(u),

(24)

which can be efficiently optimized with our continuous

solver. The final result of the “Grove” example in one im-

age scale is shown in Figure 5 where the detected occlusion

map overlays the flow estimate. (b) and (c) compare the ur

map computed from the continuous flow optimization step

with the final occlusion-aware refinement result.

5. Evaluation and Experimental Results

In this section, we present our results and comparison in

both small- and large-displacement settings. τ in Eq. (2) is

set to 1/1.4, which is learned from the Middlebury training

image set by equaling the color and gradient costs. In order

to reduce the sampling artifacts in Eq. (12), we filter DI

and D∇I with a small Gaussian kernel with the standard

deviation 1.0. β, λ, η, and θ are empirically set to 5, 12,

0.1, and 0.01 respectively. For feature matching, we use the

implementationofLowe[15]withdefaultparametervalues.

5.1. Evaluation of the Data Term

We first evaluate the effectiveness of our selective com-

bination strategy in defining the data function. We compare

our method with the those that set α = 0.5, α = 1, and

α = 0respectively. Forfairness,wedonotusethecomplete

framework to generate our results, which would otherwise

produce flow estimates with even smaller error. Instead, we

optimize Eq. (11) simply by fusing the two flow maps com-

puted with α = 1 and α = 0 respectively using graph cuts.

Figure 6 shows the error comparison on the Middlebury

training data [2] where the ground truth flow map is avail-

able. The two representative examples are “RubberWhale”

(“R.W.” for short) and “Urban2”. It can be noticed that the

average angular error (AAE) for “Urban2” is small when

usingthe colorconstraintwhile the gradientconstraintis fa-

vored in “RubberWhale” due to illumination change. Sim-

ply adding these two constraints produces AAE always in

between [8]. In comparison, our method locally selects the

optimal term and is more effectivefor energyminimization.

Page 7

(a)(b) (c)(d)

(e) (f) (g) (h)

Figure 7. Visual comparison with different α settings. (a) and (b)

show two image patches. (c) and (d) show flow results computed

using the color and gradient constraints respectively. (e) is the

ground truth flow field. (f) shows the result with α = 0.5. (g) is

the flow map obtained using our selective combination model. (h)

shows the corresponding ¯ α map.

(a) Ours(b) g.t.(c) [21] (d) [7](e) Ours

Figure 8. Visual comparison of the small-displacement optical

flow results. (b) shows the ground truth flow map.

Figure 7 shows the visual comparison. Red arrows in

(a) and (f) indicate pixels violating the color constancy as-

sumption. The blue arrows highlight the edge of the wheel,

of which the gradient varies. (c) and (d) show results by

respectively setting α = 1 and α = 0. (f) shows the result

with α = 0.5, where problems caused by using either of the

constraints still present. Our selective combination model

helps more robustly reject outliers, as shown in 7(g).

5.2. Middlebury Optical Flow Benchmarking

In this subsection, we evaluate our optical flow esti-

mation method using the traditional small displacement

data. Table 3 lists the average ranks of the top-performing

flow estimation methodsonMiddleburyevaluationwebsite.

Many of the small-scale motion structures can be recovered

by our method.

Regardingthe runningtime, the extended flow initializa-

tion uses about 3 minutes in the fine image level (resolution

640×480), taking the Urban sequence as an example. The

continuous flow refinement takes about one minute using

the single thread CPU implementation. With GPU acceler-

ation, this process can be further speeded up.

We visuallycompareflowresultsforoneexampleinFig-

ure 8, which shows that our flow estimate contains more

motion details. In Figure 9, we show an extensive compar-

ison of results produced by a number of state-of-the-art op-

tical flow methods, including those employing non-convex

(a)(b) g.t. (c) [4](d) [21](e) [7]

(f) [14](g) [26]

Figure 9. Extensive visual comparison.

(h) [22](i) [23](j) Ours

(a) Inputs(b) Warping(c) g.t. (d) Ours (e) [6]

(f) C2F [7]

Figure 10. Visual comparison on a large-displacement optical flow

example [6].

(g) [6] (h) [20](i) Ours(j) Flow

penalty functions [4, 21], using TV/L1model to reject out-

liers [7], minimizing energy in a continuous-discrete fash-

ion [14], and applying advanced smoothness terms to han-

dle motion discontinuity [26, 22, 23]. The inadequate abil-

ity to handle large motion discrepancy on narrow objects

in the traditional multiscale variational framework makes

many results still lack of a few details.

5.3. Large-Displacement Optical Flow Estimation

Our method can naturally deal with large-displacement

flowestimationwithoutanymodificationonthe framework.

Figure 10 shows an example containing significant artic-

ulated motion of a running person (published in [6]). (a)

shows a two-object-overlaid image from the HumanEva-II

benchmark dataset. Note that the fast foot movement can-

not be estimated correctlyin the conventionalcoarse-to-fine

scheme [7], as shown in Figure 10(f). (b) shows the back-

ward warping result using out dense flow estimate. The

close-ups are shown in (d) and (e) for comparison. Our

method successfully recovers the shape of the left foot.

Note that the pixels in the occluded region are unrecover-

able for all optical flow estimation methods. The flow mag-

nitude maps are shown in the second row. The maps in (g)

and (h) are produced by the methods of [6] and [20], both

ofwhichdedicatetolarge-displacementflowestimationand

Page 8

MethodOur MethodAdaptive [22] Comp. OF [26] Aniso. Hurber-L1 [23]Brox et al. [7]

Avg. AAE Rank

Avg. EPE Rank

4.5

4.0

6.0

5.9

5.9

7.3

7.6

7.8

11.5

11.1

Table 3. The average ranking of the methods with top performance on the Middlebury optical flow evaluation website (at the time of

submission). The two types of ranking are based on average angular errors (AAEs) and average end-point errors (EPEs) respectively.

do not performbest in handlingsmall displacementmotion.

Several other examples are included in [24].

6. Concluding Remarks

In this paper, we have presented a novel optical flow

estimation method to reduce the reliance on the coarse

level flow estimation in the variationalsetting for small-size

salient motion structure estimation. Other main contribu-

tions include the selective combination of color and gradi-

ent constraints, feature matching to find appropriate motion

candidates, the mean field approximation to simplify opti-

mization, and a variable splitting technique to enable fast

and reliable flow estimation.

It is notable that although the sparse feature matching is

a useful strategy to find novel flow candidates, occasion-

ally it might not perform well enough especially when a

very small region is entirely textureless. Exhaustive search

canbeusedto solvethis problemwithhighercomputational

cost.

References

[1] P. Anandan. A computational framework and an algorithm

for the measurement of visual. IJCV, 2:283–310, 1989.

[2] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and

R. Szeliski. A database and evaluation methodology for op-

tical flow. In ICCV, 2007.

[3] J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani. Hi-

erarchical model-based motion estimation. In ECCV, pages

237–252, 1992.

[4] M. J. Black and P. Anandan. The robust estimation of mul-

tiple motions: Parametric and piecewise-smooth flow fields.

CVIU, 63(1):75–104, 1996.

[5] M. Z. Brown, D. Burschka, and G. D. Hager. Advances in

computational stereo. IEEE Trans. Pattern Anal. Mach. In-

tell., 25(8):993–1008, 2003.

[6] T. Brox, C. Bregler, and J. Malik. Largedisplacement optical

flow. In CVPR, 2009.

[7] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High ac-

curacy optical flow estimation based on atheory for warping.

In ECCV (4), pages 25–36, 2004.

[8] A. Bruhn and J. Weickert. Towards ultimate motion esti-

mation: Combining highest accuracy with real-time perfor-

mance. In ICCV, pages 749–755, 2005.

[9] D. Geiger and F. Girosi. Parallel and deterministic algo-

rithms for mrfs surface reconstruction and integration. A.I.

Memo 1114, MIT, 1989.

[10] D. Goldfarb and W. Yin. Parametric maximum flow algo-

rithmsfor fast total variation minimization. Technical Report

07-09, Rice University, 2007.

[11] E. T. Hale, W. Yin, and Y. Zhang. Fixed-point continuation

for l1-minimization: Methodology and convergence. SIAM

Journal on Optimization, 19(3):1107–1130, 2008.

[12] H. W. Haussecker and D. J. Fleet. Computing optical flow

with physical models of brightness variation. IEEE Trans.

Pattern Anal. Mach. Intell., 23(6):661–673, 2001.

[13] B. K. P. Horn and B. G. Schunck. Determining optical flow.

Artif. Intell., 17(1-3):185–203, 1981.

[14] V. Lempitsky, S. Roth, and C. Rother. Fusionflow: Discrete-

continuous optimization for optical flow estimation.

CVPR, 2008.

[15] D. G. Lowe. Distinctive image features from scale-invariant

keypoints. IJCV, 60(2):91–110, 2004.

[16] B. D. Lucas and T. Kanade. An iterative image registra-

tion technique with an application to stereo vision. In IJCAI,

pages 674–679, 1981.

[17] S. M.Seitz and S. Baker. Filter flow. In ICCV, 2009.

[18] C. Rother, V. Kolmogorov, V. S. Lempitsky, and M. Szum-

mer. Optimizing binary mrfs via extended roof duality. In

CVPR, 2007.

[19] M. Sizintsev and R. P. Wildes. Efficient stereo with accurate

3-d boundaries. In BMVC, pages 237–246, 1996.

[20] F. Steinbr¨ ucker and T. Pock. Largedisplacement optical flow

computation without warping. In ICCV, 2009.

[21] D. Sun, S. Roth, J. P. Lewis, and M. J. Black. Learning

optical flow. In ECCV (3), pages 83–97, 2008.

[22] A. Wedel, D. Cremers, T. Pock, and H. Bischof. Structure-

and motion-adaptive regularization for high accuracy optic

flow. In ICCV, 2009.

[23] M. Werlberger, W. Trobin1, T. Pock, A. Wedel, D. Cremers,

and H. Bischof. Anisotropic huber-l1 optical flow. In BMVC,

2009.

[24] L. Xu, J. Jia, and Y. Matsushita. A unified framework for

large- and small-displacement optical flow estimation. Tech-

nical report, The Chinese University of Hong Kong, 2010.

www.cse.cuhk.edu.hk/%7eleojia/projects/flow/index.html.

[25] C. Zach, T. Pock, and H. Bischof. A duality based approach

for realtime tv-l1 optical flow. Pattern Recognition (Proc.

DAGM), pages 214–223, 2007.

[26] H. Zimmer, A. Bruhn, J. Weickert, B. R. Levi Valgaerts and,

Agust´ ın Salgado, and H.-P. Seidel. Complementary optic

flow. In EMMCVPR, 2009.

In