Page 1

Motion Detail Preserving Optical Flow Estimation∗

Li Xu Jiaya Jia

The Chinese University of Hong Kong

{xuli,leojia}@cse.cuhk.edu.hk

Yasuyuki Matsushita

Microsoft Research Asia

yasumat@microsoft.com

Abstract

We discuss the cause of a severe optical flow estima-

tion problem that fine motion structures cannot always be

correctly reconstructed in the commonly employed multi-

scale variational framework. Our major finding is that sig-

nificant and abrupt displacement transition wrecks small-

scale motion structures in the coarse-to-fine refinement. A

novel optical flow estimation method is proposed in this

paper to address this issue, which reduces the reliance of

the flow estimates on their initial values propagated from

the coarser level and enables recovering many motion de-

tails in each scale. The contribution of this paper also in-

cludes adaption of the objective function and development

of a new optimization procedure. The effectiveness of our

method is borne out by experiments for both large- and

small-displacement optical flow estimation.

1. Introduction

The variational framework [13], together with the

coarse-to-fine refinement [1], is widely adopted in optical

flow estimation [7, 8]. In the Middlebury optical flow eval-

uation website [2], almost all top-ranked methods use this

strategy.

Brox et al. [6], in computing large-displacement optical

flow, pointed out that if the flow structures are smaller than

their displacements, the latter may not be well estimated.

In this paper, we show that this issue also applies to small-

displacementmotion. TakingFigure1 as anexample,dueto

the camera motion, the foreground toy deer has its motion

significantly differing from that of the background(average

displacements d = −2 and d = 21 respectively). This

example is in fact very challenging for the coarse-to-fine

variational optical flow estimation.

As shown in Figure 1(e), in a coarse level, the nar-

row neck entirely disappears and only the significant back-

ground motion is estimated. This makes the emerging fore-

ground pixels in the finer scale (Figure 1(f)) are with their

actual motion significantly different from the initial esti-

∗The work described in this paper was fully supported by a grant from

the Research Grants Council of the Hong Kong Special Administrative

Region (Project No. 412708).

Coarse Level

Finer Level

(f)

(g)

(e)

(a)(b)

(c) (d)

Figure 1. Motion detail preserving problem. (a)-(b) Two input

patches. (c) Flow estimate using the coarse-to-fine variational set-

ting. (d) Our flow estimate. (e)-(f) Two consecutive levels in the

pyramid. Flow maps are visualized using the color code in (g).

mate from the background, violating the linearization as-

sumption and accordingly leading to a highly unstable mo-

tion estimation process. The final flow result shown in Fig-

ure 1(c) includes considerable errors. This example dis-

closes one problem of the general coarse-to-fine variational

model – that is, the inclination to diminish small motion

structures when spatially significant and abrupt change of

the displacements exists.

We address the motion detail preserving problem in this

paper and propose a unified framework for high-quality

flow estimation in both large and small displacement set-

tings. Central to our method is a novel selection scheme to

compute extensive initial flow vectors in each image level.

This makes the following optimization not completely rely

ontheresultfromthepreviousscaleandthuscapabletocor-

rect estimation error in the top-down refinement. Our flow

result shown in Figure 1(d) contains fine structures. More

examples are included in Section 5 and in the technical re-

port [24].

This paper also contributes in the following ways. First,

we use robust sparse feature matching to produce extended

flow initialization, which helps enforce the linearization

conditionin the variational setting. Second, in the flow esti-

mation model, we propose the selective combination of the

1

Page 2

color and gradient constraints in defining the data term, ro-

bust to outliers. Third, we propose a fast variable-splitting-

based optimization method to refine flow maps. It is highly

parallel, compatible with modern GPU computation archi-

tecture.

Finally, we employ the Mean Field approximationto en-

ablesolvingtheobjectivefunction,whichinvolvesbothdis-

crete and continuous variables and is commonly regarded

challenging to solve. Extensive experiments visually and

quantitatively validate the performance of our approach in

maintaining details for both large- and small-displacement

motion.

2. Related Work

Modern optical flow estimation is usually posed as an

energy minimization problem. Black and Anandan [4] re-

placed the quadratic penalty functions in [13] with non-

convex robust functions to reject outliers. Sun et al. [21]

useda learning-basedframeworkfor boththe matchingcost

(data term) and flow derivatives (smoothness term).

Effortsalsohavebeenputintoimprovingtheopticalflow

constraints. Haussecker and Fleet [12] proposed a phys-

ical constraint to model brightness change. Lempitsky et

al. [14] computed the matching cost only using high fre-

quency components. Pre-filtering on the input images was

suggested in [21] and [17] to handle illumination variation.

These models are flexible, but at the same time require to

solve highly non-convexobjective functions.

In [7], Brox et al. introducedthe gradientconstancycon-

straint to complement the brightness one. L1norm is used

as the penalty function for both the data and smoothness

terms so that the energy is convex after linearization. Sim-

ilar compromise between robustness and complexity was

alsomadein[8, 26]. However,we will showlaterthatdirect

addition of the brightness and gradient terms is not optimal

and propose a selection model to improve it.

Almost all the above methods rely on the coarse-to-fine

warping to deal with motion larger than one pixel [1, 3]. As

discussed in Section 1, this strategy has inherentproblem to

recover small-scale structures in many situations. Adaptive

window is used in stereo matching [19] to handle incorrect

initialization near depth boundary. It assumes at least the

nearby disparities are correctly initialized, which might not

be true for small-scale structures that are totally eliminated

in the coarse level.

Using discrete optimization, Lempitsky et al. [14] pro-

posed fusing flow proposals obtained from different flow

estimation methods with various parameter settings. This is

proven effective to find the optimal values among the given

proposals. But the sufficiency and optimality of the propos-

als cannot be controlled. Also, the methods [16, 13] that

generatethe proposals still employ the conventionalcoarse-

to-fine warping. So it is possible that none of the proposals

-5-4 -3-2 -1012345

0

2

4

6

8

10

α=1

α=0

α=0.5

eff. energy

-5-4-3 -2-1012345

0

2

4

6

8

10

α=1

α=0

α=0.5

eff. energy

(a)(b) Data costs for P1 (c) Data costs for P2

Figure 2. Data cost distributions for two points. (a) shows a patch

of the “RubberWhale” example, where two points P1 (138,278)

and P2 (141,299) are highlighted. (b) and (c) plot different data

costs (vertical axis) for P1 and P2. The ground truth displacement

is moved to 0 (horizontal axis) for illustration.

preserve small-scale motion structures. In comparison, our

method computes a few high confidence flow candidates in

each level, and thus is not entirely dependent of the flow

initialization from the previous scale.

In recent large displacement optical flow estimation,

Brox et al. [6] performedregion-baseddescriptormatching.

Thismethodcaneffectivelyrecoverlargedisplacementflow

by adjusting the objective to favor matching results, albeit

sometimes vulnerable to matching outliers. Steinbr¨ ucker

and Pock [20] extended the numerical scheme of [25] and

searched over all possible values for the large displacement

flow. As discrete labels are used in the search step, results

can be lack of sub-pixel accuracy.

3. Optical Flow Model

The Total Variation/L1model [7, 8, 25] was provenvery

effective in flow estimation. We base our data penalty func-

tion on the L1norm to reject outliers and use the Total Vari-

ation (TV) for regularization.

3.1. Robust Data Function

As the color constancy constraint is often violated when

illumination or exposure changes, adding gradient con-

stancy constraints was proposed [7, 8]. Denoting by u =

(u,v)Tthe flow vector representing the displacement be-

tween frames I1and I2, the data term for flow estimation

can generally be written as

ED(u)=

?

x

1

2τ?∇I2(x + u) − ∇I1(x)?,

1

2?I2(x + u) − I1(x)? +

(1)

where τ is a weight. This function, due to the addition of

two terms, is less accurate in terms of modeling the confi-

dence of pixel correspondence than only using one out of

the two terms that is more appropriate.

Figure 2 shows an example where the patch in (a) con-

tains two points P1 and P2. Their data cost distributions

with respect to different displacement values are plotted

in (b) and (c) respectively (ground truth displacements are

Page 3

shifted to 0). It is noticeable that the color constraint (blue

curvein (b))does not producethe minimumenergynear the

ground truth value because the color constancy is violated

given point P1 moving out of the shadow. Adding the color

and gradient terms using Eq. (1) also results in an undesir-

able distribution (dashed magenta curve) as the cost at the

ground truth point is not even a local minimum. Similarly,

in Figure 2(c), only the color constancy holds as point P2

undergoes rotational motion which alters image gradients.

So it is not ideal as well to add the two constraints in the

data function definition.

The above analysis indicates that a good model should

only incorporate the more informative constraint, but not

both of them. We accordingly define a binary weight map

α(x) : Z2?→ {0,1} to switch between the two terms. The

new data function is expressed as

ED(u,α)=

?

x

α(x)?I2(x + u) − I1(x)? +

(1 − α(x))τ?∇I2(x + u) − ∇I1(x)?. (2)

When α(x) = 1, the gradient constraint is favored. Other-

wise, we implement color constancy. Our empirical investi-

gation provided in Section 5 shows that this model can lead

to higher quality results than various alternatives.

3.2. Edge-Preserving Regularization

The regularization term for optical flow estimation is

generally edge preserving [21, 22]. We define our smooth-

ness term as

ES(u) =

?

x

ω(x)?∇u(x)?,

(3)

where x ∈ Z2, indexing the 2D coordinates and ?∇u(x)?

is the common TV regularizer. ω(x) is the simple structure

adaptive map that maintains motion discontinuity [22]:

ω(x) = exp(−?∇I1?κ),

(4)

where κ = 0.8 in our experiments. The final objective

function is thus defined as

E(u,α) = ED(u,α) + λES(u),

(5)

where λ is the regularization weight.

3.3. Mean Field Approximation

Minimizing Eq. (5) involves simultaneously computing

two fields: continuous u and binary α, which is commonly

regarded as computationally intractable. We employ the

Mean Field (MF) Approximation [9] to simplify the prob-

lem by first canceling out the binary process by integration

over α [24]. The probability of a particular state of the sys-

tem is given by

P(u,α) =1

Ze−βE(u,α),

(6)

where β is the inverse temperature and Z is the partition

function, defined as

Z =

?

{u}

?

{α=0,1}

e−βE(u,α).

(7)

We then compute the sum over all possible αs (as described

in [24]) with the saddle point approximation, yielding

Z ≈ max

u

e−β{λES(u)−?

x

1

βln(e−βDI(u,x)+e−βD∇I(u,x))},

(8)

and the effective potential

Eeff(u) = λES(u) −

?

x

1

βln(e−βDI(u,x)+ e−βD∇I(u,x)).

(9)

where DI(u,x) = ?I2(x + u) − I1(x)? and D∇I(u,x) =

τ?∇I2(x+u)−∇I1(x)?. It indicates that the flow estimate

by minimizing Eq. (9) is actually the Mean Field (MF) ap-

proximation of minimizing Eq. (5). The effective energy is

therefore written as

Eeff(u) = Eeff

D(u) + λES(u),

(10)

where the effective data function is

Eeff

D(u) =

?

x

−1

βln(e−βDI(u,x)+ e−βD∇I(u,x)).

(11)

The optimality of Eq. (10) does not depend on the estimate

of α. Moreover, although Eq. (10) is non-convex and is

not easy to solve using continuous optimization, there is no

obstacle to apply discrete optimization if candidate labels

can be obtained. We propose a robust algorithm, described

in the next section, to estimate u.

Note that the effective data term can also be deemed as

a robust function which selectively combines the color and

gradient constancy constraints. This can be clarified by tak-

ing the partial derivative with respect to the variable u on

the data term, which yields

∂uEeff

D(u) =

?

x

¯ α(x)∂uDI+ (1 − ¯ α(x))∂uD∇I,

where ¯ α(x) is the flow-dependent weight, written as

¯ α(x) =

1

1 + eβ(DI(u,x)−D∇I(u,x)).

(12)

¯ α(x) is the MF-approximation of α(x).

equates that of α(x) (Eq. (2)) in constraint selection. The

cost distributionsofthe new effectivedata functionare plot-

ted in Figures 2(b)and (c) usinggreen crossedcurves. They

indicatethattheeffectiveenergyapproximatestheloweren-

velope of the two data costs (α = 0 and α = 1), which is

exactly what we need for accurate flow estimation.

So its effect

Page 4

Input: a pair of images for optical flow estimation

1. Construct pyramids for both of the images and set the

initial level l = 0 and ul= 0 for all pixels.

2. Propagate ulto level l + 1.

3. Extended Flow Initialization (Section 4.1)

3.1. Detect and match SIFT features in level l + 1.

3.2. Generate multiple flow vectors as candidates.

3.3. Optimize flow using QPBO (Eq. (10)).

4. Continuous Flow Optimization (Section 4.2)

4.1. Compute the α map (Eq.(12)).

4.2. Solve the TV/L1energy function in Eq.(14).

5. Ifl ?= n−1wherenisthetotalnumber of levels, l = l+1

and go to Step 2.

6. Occlusion-aware Refinement (Section 4.3)

Output: optical flow map

Table 1. Overview of our method.

4. Optimization Framework

Traditional optical flow estimation, due to the use of the

variational setting, relies excessively on the coarse-to-fine

refinement. As discussed in Section 1, this process could

fail to recover ubiquitous fine motion details due to the pos-

sible large discrepancy between the initial flow estimates

and the ground truth displacements in each level.

In this section, based on Eeffand ¯ α, we propose an un-

conventional method to optimize Eq. (5). Specifically, be-

cause Eeff

D(u) is not dependant of ¯ α, we first infer multi-

ple high-confidence flow candidates and apply discrete op-

timization to select the optimal ones. With this result, ¯ α in

Eq. (12) is then quickly estimated. We finally improve the

subpixel accuracy of the flow estimates with the estimated

¯ α using continuous optimization. This procedure is found

surprisingly effective to dampen estimation errors caused

by the occasionallybiased flow results from the coarse level

computation.

Our overall algorithm is sketched in Table 1 based on

iteratively processing images in a top-down fashion. The

steps are detailed further below.

4.1. Extended Flow Initialization

We address the general flow initialization problem in

eachimage levelby estimatingmultiple displacementsfrom

the reference to target images using the SIFT features de-

tection and matching [15]. The displacement vectors are

denoted as {uv

n}, as shown in Figure 3(a). They are

newpotentialflowcandidatesexceptthosethatalreadyexist

in the flow map ucpropagated from the immediate coarser

scale (Figure 3(b)). To robustly screen out the duplicated

vectors, we compute the Euclidean distance between each

uv

iand all uc

tered at the reference feature of uv

than 1 (pixel), we regard uv

ias a genuine flow candidate.

We repeat this process for all is, and denote the m new can-

0,...,uv

js where pixel j is within a 5 × 5 window cen-

i. If all results are greater

(a)

(e)

}

(b) (c)(d)

(f)

Figure 3. Extended flow initialization. (a) One of the images over-

laid with the computed feature motion vectors. (b) Flow field uc

propagated from the coarser level. (c) New displacements com-

puted using (a) and (b). They are candidate flow vectors for all

pixels. (d) Optimized flow map u0with respect to the candidates

in the current image level. (e)-(f) show close-ups of (b) and (d).

didates as uv

k0,...,uv

km−1. Figure 3(c) shows an example.

This strategy significantly reduces the system depen-

dence on the coarse-scale flow estimation. It is notable as

well that feature matching initially produces considerable

vectors distributed in the whole image, as shown in Figure

3(a); but they reduce to less than 15 candidates after local

comparison with ucin the given example. Only the most

distinctive flow vectors are retained.

The m new vectors uv

original uc, represent possible motion in the present image

scale. We model the selection of the optimal flow among

the m + 1 candidates for each pixel as a labeling problem,

where the objective function is given in Eq. (10). Upper

of Figure 3(d) demonstrates the color coded labels. This

problem can be solved by discrete optimization efficiently

because on the one hand the number of candidates is small,

thanks to the screening; on the other hand, Eq. (10) does

not involve α, simplifying the computation.

k0,...,uv

km−1, together with the

We adopt the Quadratic Pseudo-Boolean Optimization

(QPBO) [18] to solve this MRF problem. The fusion move

step [14] is used to repeatedlyfuse the candidates until each

gets visited twice. Also, to avoid the checker-board-like ar-

tifacts commonly produced near motion boundaries in dis-

crete optimization, we employ the anisotropic representa-

tion of the TV regularizer ?∇u? = ?∇u?1+ ?∇v?1with

8-neighbor discretization [10]. The output is the flow map

denoted as u0. One result is shown in Figure 3(d), which

contains better recovered motion structure compared to the

map ucin Figure 3(b). Close-ups are shown in Figures 3(e)

and (f).

Our method can work directly on the input images with-

out employing the multscale framework. But it will suffer

from expensive and possibly unstable computation because

hundreds of or more labels might be produced simultane-

ously in the original resolution.

Page 5

(a) u0map (b) ¯ α(x) map (c) urmap (d) Close-up(e) Close-up

Figure 4. Continuous optimization. Errors are further reduced in

this step. (d) and (e) show close-ups of (a) and (c).

4.2. Continuous Flow Optimization

The flow estimates from the previous step are taken into

Eq. (12) to compute ¯ α. One result is shown in Figure 4(b).

Considering that Eq. (11) is highly non-convex, we take ¯ α

back to Eq. (5) for optimization in the variational model.

As color images are used, we denote by Ik

{Ir,Ig,Ib,∂xI,∂yI} the set of channels to be included in

the data term and use αk∈ {¯ α, ¯ α, ¯ α,(1 − ¯ α)τ,(1 − ¯ α)τ}

to represent the corresponding weights. Then the energy in

Eq. (5) is written as

∈

E(u) =

?

x

?

k

αk(x)|Ik

2(x + u) − Ik

1(x)| + λ(x)?∇u(x)?,

(13)

where λ(x) := λω(x). With the initially computed flow

u0from the previous step, we solve for the flow increments

du = (du,dv)Tby minimizing Eq. (13). The final flow

vector is u = u0+ du. By convention, the Taylor expan-

sion of Eq. (13) at point x + u0yields the linearized func-

tion

E(u) =

?

x

?

k

αk(x)|Ik

xdu + Ik

ydv + Ik

t|

+λ(x)?∇(u0+ du)(x)?

(14)

given small du. In Eq. (14), Ix = ∂xI2(x + u0), Iy =

∂yI2(x + u0), and It= I2(x + u0) − I1(x). To preserve

motion discontinuity, we employ the rotational invariant

isotropic form of the TV regularizer, written as

?∇u? =

?(∂xu)2+ (∂yu)2+ (∂xv)2+ (∂yv)2.

(15)

Our Solver

probleminto three simpler ones, each of which can have the

globally optimal solution. The key technique is a variable-

splitting method with auxiliary variables p and w, repre-

sentingthesubstituteddatacostandflowderivativesrespec-

tively, to move a few terms out of the non-differentiable L1

norm expression. This scheme is found very efficient and

essential to produce high quality results.

The derivatives of each flow vector comprise four el-

ements, i.e., ∇du = (∂xdu,∂ydu,∂xdv,∂ydv)T.

each element, we introduce a corresponding auxiliary vari-

able.The set of the variables is denoted as w

(wdux,wduy,wdvx,wdvy)T. Then Eq. (14) can be trans-

formed into

1

2η?Ik

We propose decomposing the optimization

For

=

?

x

?

k

xdu + Ik

ydv + Ik

t− pk?2+ αk?pk?

+1

2θ?∇du − w?2+ λ?∇u0+ w?.

(16)

In this function,

encourages that pkapproaches Ik

1

2θ?∇du−w?2+λ?∇u0+ w?2makes w similar to ∇du.

It also can be observed that Eq. (16) is equivalent to Eq.

(14) upon convergencewhen θ → 0 and η → 0.

Besides efficiency and reliability, Eq. (16) makes opti-

mization highly parallel, fully compatible with GPU accel-

eration. The result optimality is guaranteed in each step.

Our algorithm proceeds with the following iterations where

the initial u = u0.

1

2η?Ik

xdu + Ik

ydv + Ik

xdu + Ik

t− pk?2+ αk?pk?

ydv + Ik

t, and

1. Fix u to estimate p. The simplified objective function is

min

?

x

?

k

1

2η?Ik

xdu + Ik

ydv + Ik

t− pk?2+ αk?pk?.

(17)

Single variable optimization can be achieved in this step.

The optimal solution is given by the shrinkage formula [11]

pk= sign(ok)max(|ok| − ηαk,0),

(18)

where ok:= Ik

xdu+Ik

ydv+Ik

tis the optical flow constraint.

2. Fix u and solve for w. The function reduces to

min

?

x

1

2θ?∇du − w?2+ λ?∇u0+ w?2.

(19)

Similarly, unique solution is guaranteed by the shrinkage

formula

wdux= max(?∇u?2− θλ,0)

∂xu

?∇u?2

− ∂xu0,

(20)

where u = u0+ du. Solutions for wduy, wdvx, and wdvy

can similarly be derived. The computation in this step is

also quick and is highly parallel in nature.

3. Fix w,p and solve for u. The objective function is

min

?

x

?

k

1

2η?Ik

xdu + Ik

ydv + Ik

t− pk?2+1

2θ?∇du − w?2.

(21)

It is quadratic and thus the corresponding Euler-Lagrange

equations of Eq. (21) are linear w.r.t. du and dv. Globally

optimal solution can be directly obtained by solving the lin-

ear system [13] in this step.

Our method iterates among optimizing (18), (20) and

(21) until convergence. Note that θ and η are critical pa-

rameters that should have very small values. It was found

howeverfixing them constants typicallyresults in slow con-

vergence. We thus adopt the continuation scheme [11],

which initially sets θ and η to relatively large values to al-

lowwarm-starting, andthendecreasesthem initerationsto-

ward the desired convergence. Our algorithm is sketched in

Table 2, where ηminand θminare set to 0.1 and 0.01 respec-

tively. η0and θ0are the respective initial values, configured

as η0= 3n×ηminand θ0= 3n×θmin, where n denotes the

number of iterations. More explanations are given in [24].