Content uploaded by Aritra Dutta

Author content

All content in this area was uploaded by Aritra Dutta on Nov 23, 2017

Content may be subject to copyright.

Online and Batch Supervised Background Estimation via L1 Regression

Aritra Dutta

KAUST

aritra.dutta@kaust.edu.sa

Peter Richt´

arik

KAUST, Edinburgh, MIPT

peter.richtarik@kaust.edu.sa

Abstract

We propose a surprisingly simple model for supervised

video background estimation. Our model is based on `1re-

gression. As existing methods for `1regression do not scale

to high-resolution videos, we propose several simple and

scalable methods for solving the problem, including iter-

atively reweighted least squares, a homotopy method, and

stochastic gradient descent. We show through extensive ex-

periments that our model and methods match or outperform

the state-of-the-art online and batch methods in virtually all

quantitative and qualitative measures.

1. Introduction

Video background estimation and moving object detec-

tion is a classic problem in computer vision. Among sev-

eral existing approaches, one of the most prevalent ones is

to solve it in a matrix decomposition framework [6,8]. Let

A∈Rm×n0be a matrix encoding n0video frames, each

represented as a vector of size m. Our task is to decom-

pose all frames of the video into background and foreground

frames: A=B+F.

As described above, the problem is ill-posed, and more

information about the structure of the decomposition is

needed. In practice, background videos are often static

or close to static, which typically means that Bis of low

rank [39]. On the other hand, foreground usually repre-

sents objects occasionally moving across the foreground,

which typically means that Fis sparse. These and simi-

lar observations leads to the development of models of the

form [8,6,55,31,14]:

min

Bfrank(B) + fspar (A−B),(1)

where frank is a suitable function that encourages the rank

of Bto be low, and fspar is a suitable function that encour-

ages the foreground Fto be sparse.

Xin et al. [56] recently proposed a background estima-

tion model—generalized fused lasso (GFL)—arising as a

special case of [20] with the choice frank(B) = rank(B)

and fspar(F) = λkFkGFL :

min

Brank(B) + λkA−BkGFL.(2)

In this model, k·kGF L is the “generalized fused lasso”

norm, which arises from the combination of the `1norm (to

encourage sparsity) and a local spatial total variation norm

(to encourage connectivity of the foreground).

Supervised background estimation. In the modern

world, supervised background estimation models play an

important role in the analysis of the data captured from the

surveillance cameras. As the name suggests, these mod-

els rely on prior availability of some “training” background

frames, B1∈Rm×r. Without loss of generality, assume

that the training background frames correspond to the ﬁrst

rframes of B, i.e., B= [B1B2], where B1∈Rm×r

is known and B2∈Rm×nis to be determined, with

n0=r+n. Let A= [A1A2]be partitioned accordingly,

and let F2=A2−B2∈Rm×n. In this setting, [56] further

specialized the model (19) by adding the extra assumption

that rank(B) = rank(B1).As a result, the columns of the

unknown matrix B2can be written as a linear combinations

of the columns of B1. Speciﬁcally, B2can be written as

B1S, where S∈Rr×nis a coefﬁcient matrix. Thus, prob-

lem (19) can be written in the form

min

S0rank(B1[I S0]) + λkA2−B1S0kGFL .(3)

While (6.1) is the the problem Xin et al. [56]wanted to

solve, they did not tackle it directly and instead further as-

sumed that Sis sparse, and solved the modiﬁed problem

min

S0kS0k1+λkA2−B1S0kGFL,(4)

where k·k1denotes the `1norm of matrices.

2. New Model

In this paper we propose a new supervised background

estimation model, one that we argue is much better than (4)

in several aspects. Moreover, our model and the methods

we propose signiﬁcantly outperform other state-of-the-art

methods.

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IRLS, area:0.9488

Homotopy, area:0.9510

SGD1, area:0.9246

SGD2, area:0.9504

0.05 0.1 0.15 0.2 0.25 0.3

0.86

0.88

0.9

0.92

0.94

0.96

Figure 1: ROC curve to compare between our proposed `1re-

gression algorithms on Basic video, frame size 144 ×176.

L1 regression. As in (4), our model is also based on a

modiﬁed version of (3). We do not need to assume any

sparsity on S0, and instead make the trivial observation that

rank(B1[I S0]) = rank(B1). Since B1is known, the ﬁrst

term in the objective function (3) is constant, and hence

does not contribute to the optimization problem. Hence we

may drop it. Moreover, we suggest replacing the GFL norm

by the `1norm. This leads to a very simple L1 (robust)

regression problem:

min

S0∈Rr×nkA2−B1S0k1.(5)

Dimension reduction. The above model can be further

simpliﬁed. It may be the case that the rank of B1∈Rm×r

is smaller1(or much smaller) than r. In such a situation,

we can replace B1in (5) by a thinner matrix, which allows

us to reduce the dimension of the optimization variable S0.

In particular, let B1=QR be the QR decomposition of

B1, where Q∈Rm×k,R∈Rk×r,k= rank(B1), and

Qhas orthonormal columns. Since the column space of B1

is the same as the column space of Q, by using the substi-

tution B1S0=QS, we can reformulate (5) as the lower-

dimensional L1 regression problem:

min

S∈Rk×nf(S) := kA2−QSk1(6)

Decomposition. Let A2= [a1, . . . , an]and S=

[s1, . . . , sn], where ai∈Rm,si∈Rtfor all i∈[n] :=

{1,2, . . . , n}. Our model (6) can be decomposed into n

parts, one for each frame:

f(S) =

n

X

i=1

fi(si), fi(si) := kai−Qsik1,(7)

1If this is not the case, it still may be the case that the column space of

B1can be very well approximated by a space with less or much less than

rdimensions.

where k · k1is the vector `1norm. Therefore, (6) reduces

to nsmall (k-dimensional) and independent `1regression

problems:

min

si∈Rtfi(si), i ∈[n](8)

Advantages of our model. We now list some advantages

of our model (6) as compared to (4). We show that 1) our

model does not involve the unnecessary sparsity inducing

term kS0k1, that 2) our model does not include the trade-

off parameter λand hence issues with tuning this parameter

disappear, that 3) our model involves a simple `1norm as

opposed to the more complicated GFL norm, that 4) the

dimension of Sis smaller (and possibly much smaller) than

that of S0, that 5) our objective is separable across the n

columns of Scorresponding to frames, which means that

we can solve for each column of Sin parallel (for instance

on a GPU), and that 6) for the same reason, we can solve

for each frame as it arrives, in an online fashion.

Further contributions. Our model works well with just a

few training background frames (e.g., r= 10). This should

be compared with the 200 training frames in GFL model.

We propose 5 methods for solving the model, out of which

4 can work online and all 5 can work in a batch mode. Our

model solves all the following challenges: static and semi-

static foreground, newly added static foreground, shadows

that are already present in the background and newly cre-

ated by moving foreground, occlusion and disocclusion of

the static and dynamic foreground, the ghosting effect of the

foreground in the background. To the best of our knowl-

edge, no other algorithm can solve all the above challenges

in a single framework.

3. Scalable Algorithms for L1 Regression

The separable (across frames) structure of our model al-

lows us to devise both batch and online background estima-

tion algorithms. To the best of our knowledge, this is the

ﬁrst formulation which can operate in both batch and on-

line mode. Since our problem decomposes across frames

i∈[n], it sufﬁces to describe algorithms for solving the `1

regression problem (8) for a single i. This problem has the

form

min

x∈Rtφ(x) := kQx −bk1=

m

X

j=1 |q>

jx−bj|,(9)

where x∈Rtcorresponds to one of the reconstruction vec-

tors si, and b∈Rmcorresponds to the related frame ai. We

write b= (b1, . . . , bm)∈Rm, and let qj∈Rtbe the jth

row of Qfor j∈[m].

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IRLS, area:0.9488

GRASTA, s=10%, area:0.7435

iEALM, area:0.9241

ReProCS, area:0.8755

Figure 2: ROC curve to compare between IRLS, iEALM,

GRASTA, and ReProCS on Basic video, frame size 144 ×176.

Five methods. In this work we propose to solve

(9) via four algorithms: (a) iteratively reweighted least

squares (IRLS), (b) homotopy method, (c) stochastic sub-

gradient descent (variant 1), (d) stochastic subgradient de-

scent (variant 2), and (e) Augmented Lagrangian Method of

Multipliers (ALM) (see Appendix 2).

The ﬁrst four algorithms can be used in both batch and

online setting and can deal with grayscale and color images.

If we assume the camera is static, and assume constant il-

lumination throughout the video sequence, then our online

methods can provide a good estimate of the background.

Moreover, all algorithms are robust to the intermittent ob-

ject motion artifacts, that is, static foreground (whenever a

foreground object stops moving for a few frames), which

poses a big challenge to the state-of-the-art methods. Addi-

tionally, our online methods are fast as we perform neither

conventional nor incremental principal component analy-

sis (PCA). In contrast, conventional PCA [29] is an essen-

tial subproblem to numerically solve both RPCA and GFL

problems. In these problems, each iteration involves com-

puting PCA, which operates at a cost O(mn2)and is due

to SVD on a m×nmatrix. We also recall that the state-

of-the-art online, semi-online, or batch incremental algo-

rithms, such as the Grassmannian robust adaptive subspace

estimation (GRASTA) [27], recursive projected compres-

sive sensing algorithm (ReProCS) [24,25,41], or incremen-

tal principal component pursuit (incPCP) [46,44,45], use

either thin or partial PCA as well.

The need for simpler solvers for `1regression. It is nat-

ural to ask: why do we need a new set of algorithms to

solve the classical `1regression problem when there are

several well known solvers, for example, CVX [22,21], `1

magic [47], and SparseLab 2.1-core [1]? It turns out that a

high resolution video sequence (characterized by very large

m) is computationally extremely expensive for the above

mentioned classic solvers. Moreover, we do not need highly

0 100 200 300 400 500 600

-0.2

0

0.2

0.4

0.6

0.8

1

IRLS, mean:0.9524

GRASTA,s=10%, mean:0.5226

iEALM, mean=0.9301

ReProCS, mean:0.7980

Figure 3: Comparison of Mean SSIM (MSSIM) of IRLS,

iEALM, GRASTA, and ReProCS on Basic video. IRLS has the

best MSSIM. To process 600 frames each of size 144 ×176,

iEALM takes 164.03 seconds, GRASTA takes 20.25 seconds, Re-

ProCS takes 14.20 seconds, and our IRLS takes 7.51 seconds.

accurate solutions. Hence, simple and scalable methods are

preferable to more involved and computationally demand-

ing methods. The `1magic software, for example, in our

experiments took 126 minutes (on a computer with Intel i7

Processor and 16 GB memory) to estimate the background

on the Waving Tree dataset with A2∈R19,200×66 . In

contrast, our IRLS method took 0.59 seconds only for 66

frames.

3.1. Iteratively Reweighted Least Squares (IRLS)

In the past decade, IRLS has been used in various do-

mains, ranging from reconstruction of sparse signals from

underdetermined systems, to the low-rank and sparse matrix

minimization problems in face clustering, motion segmen-

tation, ﬁlter design, automatic target detection, to mention

just a few applications [43,11,12,13,40,32,36]. We ﬁnd

that the IRLS algorithm is a good ﬁt to solve (9). Also, each

iteration of IRLS reduces to a single weighted `2regression

problem for an over determined system. To the best of our

knowledge, we are the ﬁrst to use IRLS to propose a back-

ground estimation model.

We now brieﬂy describe IRLS for solving (9). First note

that the cost function fin (9) can be written in the form

φ(x) =

m

X

j=1 |q>

jx−bj|=

m

X

j=1

(q>

jx−bj)2

|q>

jx−bj|.(10)

For x∈Rmand δ > 0deﬁne a diagonal weight matrix via

Wδ(x) := Diag(1/max{|q>

jx−bj|, δ}).Given a current

iterate xk, we may ﬁx the denominator in (10) by substitut-

ing xkfor x, which makes φdependent on xvia xappearing

in the numerator only. The problem of minimizing the re-

sulting function in xis a weighted least squares problem.

The normal equations for this problem have the form

Q>W0(xk)Qx =Q>W0(xk)b. (11)

IRLS is obtained by setting xk+1 to be equal to the solution

of (11). For stability purposes, however, we shall use weight

matrices Wδ(xk)for some threshold parameter δ > 0in-

stead. This leads to the IRLS method:

xk+1 = (Q>Wδ(xk)Q)−1Q>Wδ(xk)b(12)

Osborne [40] and more recently [49] performed a compre-

hensive analysis of the performance of IRLS for `pmini-

mization with 1<p<3.

3.2. Homotopy Method

In this section we generalize the IRLS method (12) by

introducing a homotopy [11] parameter 1≤p≤2. We set

p0= 2 and choose x0∈Rt(in our experiments, random

initialization will do). Consider the function

φp(x, y) :=

m

X

j=1

(q>

jx−bj)2

|q>

jy−bj|2−p.

Note that φ1(x, x)is identical to the `1regression function

φappearing in (10). Given current iterate xk, consider func-

tion φpk(x, xk). This is a weighted least squares function of

x. Our homotopy method is deﬁned by setting

xk+1 = arg min

xφpk(x, xk),

and subsequently decreasing the homotopy parameter as

pk+1 = max{pkη, 1}, where 0< η < 1is a constant

reduction factor.

As in the case of IRLS, the normal equations for the

above problem have the form

Q>W0,pk(xk)Qx =Q>W0,pk(xk)b, (13)

where Wδ,p(x) := Diag(1/max{|q>

jx−bj|2−p, δ}).The

(stabilized) solution of (13) is given by

xk+1 = (Q>Wδ,pk(xk)Q)−1Q>W0,pk(xk)b(14)

As mentioned above, one step of the homotopy scheme

(14) is identical to one step of IRLS (11) when pk= 1.

In practice, however, the homotopy method sometimes per-

forms better (see Figures 1,7, and Table 4).

3.3. Stochastic Subgradient Descent

In this section we propose the use of two variants of

stochastic subgradient descent (SGD) to solve (9):

min

x∈Rtφ(x) := 1

m

m

X

j=1

φj(x),(15)

BG+FG iEALM GRASTA ReP roCSBackground GT IRLS (Ours)

Figure 4: Background recovered on Stuttgart, Wallﬂower, and

I2R dataset. Comparing with the ground truth (second column),

IRLS recovers the best quality background.

where φj(x) := m|q>

jx−bj|. Functions φjare convex,

but not differentiable. However, they are subdifferentiable.

A classical result from convex analysis says that the subd-

ifferential of a sum of convex functions is the sum of the

subdifferentials. Therefore, the subddiferential ∂φ of φis

given by the formula ∂φ(x) = 1

mPm

j=1 ∂φj(x).In partic-

ular, if we choose j∈[m]uniformly at random, and pick

gj(x)∈∂φj(x), then E[gj(x)] ∈∂φ(x). That is, gj(x)is

an unbiased estimator of a subgradient of φat x.

A generic SGD method applied to (15) (or, equivalently,

to (9)) has the form

xk+1 =xk−ηkgi(xk)(16)

An easy calculation using the chain rule for subdiffer-

entials of convex functions gives the following formula for

∂φj(x) = mqj∂|q>

jx−bj|(see, for instance, [38]):

∂φj(x) =

mqj,if q>

jx−bj>0

−mqj,if q>

jx−bj<0

0,otherwise

.(17)

When q>

jxk−bjis nonzero, each iterate of SGD moves

in the direction of either vector qjor −qj, with an appro-

priate stepsize. The initialization of the method (i.e., choice

of x0∈Rtand the learning rate parameters ηk) plays an

important role in the convergence of the method.

BG+FG BG Recovered FG Ground Truth SSIM map

Wav ing Tre e

Basic

Figure 5: Qualitative and Quantitative comparison with supervised GFL and inWLR. GFL and IRLS construct better backgrounds on

the Waving Tree video. On Basic video all the methods have similar performance. However, supervised GFL takes 117.11 seconds

and 6.25 seconds on Waving Tree and Basic video, respectively, to process 1 frame; whereas inWLR takes 3.39 seconds and 17.83

seconds, respectively on those two sequences. In contrast, IRLS takes 0.59 seconds and 7.02 seconds, respectively and recovers the similar

SSIM map.

Candela

Caviar1

Caviar2 Cavignal

HumanBody HallandMonitor

BG+FG incPCP BG IRLS BG (Ours)Homotopy BG(Ours) incPCP FG Homotopy FG(Ours) IRLS FG (Ours)

Figure 6: Background and foreground recovered by online methods on SBI dataset. The videos have static, semi-static foreground, newly

added static foreground, shadows that already present in the background and newly created by moving foreground, and occlusion and

disocclusion of static and dynamic foreground. For a comprehensive review of the dataset we refer the readers to [34].

We consider two variants of SGD depending on the

choice of ηkand on the vector that we output. In SGD 1

we always normalize each stochastic subgradient, and mul-

tiply the resulting vector by R/√k, where kis the iteration

counter, for some constant R > 0which needs to be tuned.

This method is a direct extension of the subgradient descent

Homotopy BG(Ours)

Mean CQM= 43.9386

IRLS BG (Ours)

Mean CQM= 32.1315

incPCP BG

Mean CQM= 20.3904

Homotopy FG ( Ours) IRLS FG (Ours)

incPCP FG

BG+FG

Frame 6

(Occluded FG)

Figure 7: Qualitative and Quantitative comparison on Toscana-HD video. Besides IRLS and Homotopy, the two best methods on

Toscana, that is, Photomontage [3] and SOBS1 [33] have MSSIM 0.9616 and 0.9892 and CQM 50.2416 and 43.3002, respectively [7].

method ηkoutput

SGD 1 R

√kkgj(xk)kxk

SGD 2 B

ρ√KˆxK=1

KPK−1

k=0 xk

Table 1: Two variants of SGD.

method in [38]. The output is the last iterate. While we

provide no theoretical guarantees for this method, it per-

forms well in our experiments. On the other hand, SGD 2

is a more principled method. This arises as a special case

of the SGD mehod described and analyzed in [48]. In this

method, one needs to decide on the number of iterations K

to be performed in advance. The method ultimately out-

puts the average of the iterates. The stepsize ηkis set to

B/ρ√K, where B > 0and ρ > 0are parameters the value

of which can be derived from the following iteration com-

plexity result:

Theorem 1 ([48]).Let x∗be a solution of (15)and let

B > 0be such that kx∗k ≤ B. Further, assume that

kgj(x)k ≤ ρfor all x∈Rtand j∈[m]. If SGD 2 runs for

Kiterations with η=B

ρ√T, then E[φ(ˆxK)] −φ(x∗)≤Bρ

√K,

where ˆxKis given as in Table 1. Moreover, for any > 0to

achieve E[φ(ˆxK)] −φ(x∗)≤it sufﬁces to run SGD 2 for

Kiterations where K≥B2ρ2

2.

4. Numerical Experiments

To validate the robustness of our proposed algorithms,

we tested them on some challenging real world and syn-

thetic video sequences containing occlusion, dynamic back-

ground, static, and semi-static foreground. For this purpose,

we extensively use 19 gray scale and RGB videos from the

Stuttgart, I2R, Wallﬂower, and the SBI dataset [10,30,34,

2,50]. . We refer the readers to Table 2to get an overall

idea of the number of frames of each video sequence used,

video type, and resolution.

For quantitative measure, we use the receiver operating

characteristic (ROC) curve, recall and precision (RP) curve,

the structural similarity index (SSIM), SSIM map [52],

multi-scale structural similarity index (MSSSIM) [53],

and color image quality measure (CQM) [7,57]. Due to

the availability of ground truth (GT) frames, we use the

Stuttgart artiﬁcial dataset (has foreground GT) and the SBI

dataset (have background GTs) to analyze the results quan-

titatively and qualitatively. To calculate the average compu-

tational time we ran each algorithm ﬁve times on the same

dataset and compute the average. Throughout this section,

the best and the 2nd best results are colored with red and

blue, respectively.

4.1. Comparison between our proposed algorithms

First we compare the performance of our proposed al-

gorithms in batch mode on the Basic scenario. Figure 1

shows that all four algorithms are are very competitive and

we note that IRLS has the least computational time. We ran

each of IRLS and Homotopy method for ﬁve iterations, and

SGD 1 and SGD 2 for 5000 iterations. IRLS takes 7.02 sec-

onds, Homotopy takes 8.47 seconds, SGD 1 takes 17.81

seconds, and SGD 2 takes 17.67 seconds. We mention that

the choice of Rin SGD 1 and Band ρin SGD 2 are prob-

lem speciﬁc. Due to computational efﬁciency, we compare

IRLS `1with other batch methods in the next section.

4.2. Comparison with RPCA, GFL, and other state-

of-the-art methods

In this section we compare IRLS with other state-of-

the-art batch background estimation methods, such as,

iEALM [31] of RPCA, GRASTA, and ReProCS on the

Basic scenario. Figure 12a shows that IRLS sweeps the

maximum area under the ROC curve. Additionally, in Fig-

ure 3IRLS has the best mean SSIM (MSSIM) among all

other methods. Moreover, in batch mode, IRLS takes the

least computational time.

Next in Figure 4we present the background recovered

by each method on Stuttgart, Wallﬂower, and I2R dataset.

The video sequences have occlusion, dynamic background,

and static foreground. IRLS can detect the static foreground

and also robust to sudden illumination changes.

Finally, we compare our IRLS with the supervised GFL

model of Xin et al. [56] and inWLR of Dutta et al. [18] (see

Figure 5). For Waving Tree scenario, supervised GFL

uses 200 training frames and it takes 117.11 seconds to

compute the background and foreground from one training

Dataset Video No. of frames Resolution

Stuttgart [10]Basic (Grayscale) 600 144 ×176

Basic (RGB-HD) 600 600 ×800

Lightswitch (RGB-HD) 600 600 ×800

SBI [34]IBMTest2 (RGB) 91 320 ×240

Candela (RGB) 351 352 ×288

Caviar1 (RGB) 610 384 ×288

Caviar2 (RGB) 461 384 ×288

Cavignal (RGB) 258 200 ×136

HumanBody (RGB) 741 320 ×240

HallandMonitor (RGB) 296 352 ×240

Highway1 (RGB) 440 320 ×240

Highway2 (RGB) 500 320 ×240

Toscana(RGB-HD) 6 600 ×800

Wallﬂower [50]Waving Tree(Grayscale) 66 120 ×160

Camouflage (Grayscale) 52 120 ×160

I2R/Li dataset [30]Meeting Room(Grayscale) 1209 64 ×80

Watersurface (Grayscale) 162 128 ×160

Lightswitch(Grayscale) 1430 120 ×160

Lake(Grayscale) 80 72 ×90

Table 2: Data used in this paper.

Algorithm Abbreviation Appearing in Experiment Reference

Iterative Reweighted Laast Squares IRLS Figure 1–11,and Table4,5This paper

Homotopy Homotopy Figure 1,6–11, and Table 4,5This paper

Stochastic Subgradient Descent 1 SGD 1 Figure 1and Table 1This paper

Stochastic Subgradient Descent 2 SGD 2 Figure 1and Table 1This paper

Inexact Augmented Lagrange Method of Multipliers iEALM Figure 12a-4[31]

Supervised Generalized Fused Lasso GFL Figure 5,8[56]

Grassmannian Robust Adaptive Subspace Tracking Algorithm GRASTA Figure 12a–4[27]

Recurssive Projected Compressive Sensing ReProCS Figure 12a–4[24,25,41]

Incremental Weighted Low-Rank inWLR Figure 5[18]

Incremental Principal Component Pursuit incPCP Figure 6–11, and Table 4,5[46,44,45]

Background estimated by weightless neural networks BEWIS Table 4[23]

Independent Multimodal Background Subtraction Multi-Thread IMBS-MT Table 4[5]

RSL2011 - Table 4[42]

Color Median - Table 4[28]

Photomontage - Table 4[3]

Self-Organizing Background Subtraction1 SOBS1 Table 4[33]

Table 3: Algorithms compared in this paper.

frame and the ssim of the FG is 0.9996. inWLR does not

use any training frames and takes 3.39 seconds to compute

the background and foreground from the video sequence

that consists of 66 frames and the MSSIM is 0.9592. In

contrast, IRLS uses 15 training frames and takes 0.59 sec-

onds to process the entire video with an MSSIM 0.9398.

For Basic scenario, supervised GFL again uses 200 train-

ing frames and takes 6.25 seconds to process one training

frame and the ssim of the FG is 0.9462. inWLR does not re-

quire any training frame and takes 17.83 seconds to process

600 frames in a batch-incremental mode and the MSSIM

is 0.9463. In contrast, IRLS uses only 15 training frames

and takes 7.02 seconds to process the entire video and the

MSSIM is 0.9524.

4.3. Online implementation on RGB videos

In this section we show the robustness of two of our algo-

rithms on RGB videos in online mode. Due to the space lim-

itation we only provide results on IRLS and homotopy algo-

rithm (these two methods were also the fastest in the batch

mode). Primarily, we compare our results with incPCP and

GFL [46,44,45,56]. We should mention that besides in-

cPCP, probabilistic robust matrix factorization (PRMF) [51]

and RPCA bilinear projection (RPCA-BL) [35] has on-

line extensions. However, PRMF uses the entire available

data in its batch normalization step and there is no avail-

able implementation of online RPCA-BL. To the best of

our knowledge incPCP is the only state-of-the-art online

method which deals with HD RGB videos in full online

mode. The incPCP code is downloaded from the author’s

Video SOBS1 RSL2011 IMBS-MT BEWIS Color Median IRLS Homotopy incPCP

IBMTest2 0.9954 0.9303 0.9721 0.9602 0.9939 0.9950 0.9953 0.9670

Candela 0.9775 0.9916 0.9893 0.9852 0.9382 0.9995 0.9992 0.9412

Caviar1 0.9781 0.9947 0.9967 0.9813 0.9918 0.9994 0.9993 0.8649

Caviar2 0.9994 0.9962 0.9986 0.9994 0.9994 0.9999 0.9998 0.9935

Cavignal 0.9947 0.9973 0.9982 0.9984 0.7984 0.9989 0.9975 0.8312

HumanBody 0.9980 0.9959 0.9958 0.9866 0.9970 0.9996 0.9990 0.9360

HallandMonitor 0.9832 0.9377 0.9954 0.9626 0.9640 0.9991 0.9992 0.9355

Highway1 0.9968 0.9899 0.9939 0.9886 0.9924 0.9980 0.9985 0.8847

Highway2 0.9991 0.9907 0.9960 0.9942 0.9961 0.9994 0.9997 0.9819

Toscana 0.9616 0.0662 0.8903 0.8878 0.8707 0.9853 0.9996 0.8416

Average 0.9814 0.9491 0.9929 0.9745 0.9542 0.9975 0.9987 0.9177

Table 4: Comparison of average MSSSIM of the different methods on SBI dataset. Source: [2,7,34].

incPCP BG IRLS BG(Ours)

Homotopy BG (Ours) GFL BG incPCP FG IRLS FG (Ours)

Homotopy FG (Ours) GFL FG

BG+FG

Frame 200

(Dynamic FG)

Frame 600

(Static FG)

Figure 8: Qualitative comparison on Basic scenario HD scene. The SSIMs are (the 1st number indicates frame 200 and the 2nd number

indicates frame 600): incPCP 0.021 and 0.0173, IRLS 0.9089 and 0.9731, homotopy 0.9327 and 0.9705, GFL 0.9705 and 0.9310. The

MSSSIMs are: incPCP 0.6315 and 0.4208, IRLS 0.8777 and 0.9746, homotopy 0.9166 and 0.9725, GFL 0.9175 and 0.9645.

website2. As mentioned in the software package we use the

standard PCP (ﬁxed camera) mode for incPCP [46,44] im-

plementation.

Discussions. We use Basic-HD and the SBI dataset to

provide extensive qualitative and quantitative comparison.

The online mode of our algorithm only uses the available

pure background frames to learn the basis Qfor each color

channel and then operate on each test frame in a complete

online mode. Note that we only use 10 training frames

and we strongly believe that one can use even less num-

ber of training frames to obtain almost the similar perfor-

mance. Homotopy uses less iterations than IRLS to pro-

duce a comparable background and hence it is faster than

IRLS in online mode. In Figure 6and 9we compare IRLS

and homotopy against incPCP on the SBI dataset. Compare

to the ghosting appearances in the incPCP backgrounds,

our online methods construct a clean background for each

video sequence. We also removed the static foreground, oc-

cluded foreground, and the foreground shadows. In Fig-

ure 7and 8we show our performance on HD video se-

quences. In addition to incPCP, we compared with super-

vised GFL on the Basic-HD (see Figure 8). Supervised

GFL uses 200 training frames (the average processing time

of the training frames is 7.31 seconds) and takes 431.78

seconds to process each test frame and produce a compa-

2https://sites.google.com/a/istec.net/prodrig/Home/en/pubs/incpcp

rable quantitative result as online IRLS and homotopy. For

computational time comparison with incPCP we refer to

Table 5. Finally we provide the results of online IRLS

and homotopy on one of the most challenging HD video

sequences, that is, Lighswitch of the Stuttgart dataset.

This scenario is a nighttime scenario and has varying illu-

mination effects throughout the video sequence. Starting

from frame 125 the illumination suddenly changes. Addi-

tionally, it has reﬂections, trafﬁc light change, and move-

ments of the tree leaves. We used 10 daytime pure back-

ground frames for training purpose and by using them

we estimated the nighttime scene. As expected in Fig-

ure 10 both IRLS and homotopy perform pretty well with

the changing illumination which can be veriﬁed from the

pure Lightswitch BG frame (Figure 10 third column).

Additionally, we compare our quantitative results against

other state-of-the-art algorithms, such as, the adaptive neu-

ral background algorithm aka Self-Organizing Background

Subtraction1 (SOBS1) [33], Photomontage [3], Color Me-

dian, RSL2011 [42], Independent Multimodal Background

Subtraction Multi-Thread (IMBS-MT) [5], background es-

timated by weightless neural networks (BEWIS) [23] on

SBI dataset. We refer Table 4(Source: [7]) and Figure 7. Fi-

nally in Figure 11 we provide the mean CQM of the online

methods on SBI dataset and Basic-HD video. In online

mode, IRLS and Homotopy outperform incPCP in mean

CQM and mean MSSSIM in each video.

Highway1Highway2 IBMtest

BG+FG incPCP BG Homot opy BG(Ours) IRL S BG( Ours) incPCP FG Homotopy FG(Ours) IR LS FG (Ours)

Figure 9: Background and foreground recovered by online methods on SBI dataset. The videos have shadows that already present in the

background and newly created by moving foreground, occlusion and disocclusion of dynamic foreground. For a comprehensive review of

the dataset we refer the readers to [34].

BG+FG T raining Frame IRLS BG IRLS FG

Homotopy BG Homotopy FG

Frame 210 Frame 600

Original BG, No FG

Figure 10: Background and foreground recovered by our proposed online methods on Lightswitch video. Both IRLS and homotopy

captures the effect of change in illumination, irregular movements of the tree leaves, and reﬂections. Comparing with the No FG image

both of our proposed method do pretty well.

5. Conclusion

We proposed a novel and fast model for supervised video

background estimation. Moreover, it is robust to several

background estimation challenges. We used the simple and

well-known `1regression technique and provided several

online and batch background estimation methods that can

process high resolution videos accurately. Our extensive

qualitative and quantitative comparison on real and syn-

thetic video sequences demonstrated that our supervised

model outperforms the state-of-the-art online and batch

methods in almost all cases.

6. Appendix 1: Historical Comments

We start by making a connection between the supervised

GFL model proposed by Xin et al. [56] and the constrained

low-rank approximation problem of Golub et al. [20].

6.1. Golub’s constrained low-rank approximation

problem

In 1987, Golub et al. [20] formulated the following con-

strained low-rank approximation problem: Given A=

[A1A2]∈Rm×n0with A1∈Rm×rand A2∈Rm×n,

ﬁnd AG= [ ˜

B1˜

B2]such that, ˜

B1∈Rm×r,˜

B2∈Rm×n,

solve:

[˜

B1˜

B2] = arg min

B=[B1B2]

B1=A1

rank(B)≤r

kA−Bk2

F,(18)

where k·kFdenotes the Frobenius norm of matrices. Mo-

tivated by [20], Dutta et al. recently proposed more gen-

eral weighted low-rank (WLR) approximation problems

and showed their application in the background estimation

problem [15,16,17].

IBMTest

Candela

Caviar1

Caviar2

Cavignal

HumanBody

HallandMonitor

Highway1

Highway2

Stuttgart-HD

10

20

30

40

Figure 11: Mean CQM of online methods on SBI dataset and

Basic-HD video. The higher the CQM value, the better is the

recovered image.

Video (No. of frames) IRLS Homotopy incPCP

IBMTest2 (91) 37.28 21.84 22.45

Candela (351) 163.80 133.6 72.15

Caviar1 (610) 279.99 213.99 120.58

Caviar2 (461) 199.16 158.1 85.68

Cavignal (258) 71.26 70.77 39

HumanBody (741) 261.94 227.25 134.83

HallandMonitor (296) 116.86 88.99 59.63

Highway1 (440) 155.84 134.03 81.44

Highway2 (500) 181.85 156.92 87

Basic-HD (600) 599.06 457.2464 382.41

Toscana-HD (6) 7.73 5.13 3.1

Table 5: Computational time (in seconds) comparison for online

methods.

Connection with (18).Recall that the background esti-

mation model the generalized fused lasso (GFL) proposed

by Xin et al. [56] with the choice frank(B) = rank(B)and

fspar(F) = λkFkGFL can be written as:

min

Brank(B) + λkA−BkGFL.(19)

In this model, k·kGF L is the “generalized fused lasso”

norm. With the extra assumption that rank(B) = rank(B1)

and by using the k · kGFL norm, problem (19) is a con-

strained low rank approximation problem as in (18) and can

be written as follows:

min

B=[B1B2]{kA−BkGFL subject to rank(B)≤r, B1=A1}.

7. Appendix 2 : Augmented Lagrangian

Method of Multipliers (ALM)

In this section, we demonstrate an additional background

estimation method by using the decomposition model used

in the main paper. This method was not described in the

main paper. As mentioned in the Further contributions

Section, we device a batch background estimation model

(ﬁfth method) by using the augmented Lagrangian method

of multipliers (ALM).

7.1. The algorithm

The Augmented Lagrangian method of multipliers are

one of the most popular class of algorithms in convex pro-

gramming. In our setup, the proposed method does not pro-

vide an incremental algorithm. Instead it relies on fast batch

processing of the video sequence. We can write (6) as an

equality constrained problem by introducing the variable F2

as follows:

minF2,S kF2k`1

subject to A2=QS +F2.(20)

We now form the augmented Lagrangian of (20):

L(S, F2, Y, µ) = kF2k`1+hY , A2−QS −F2i(21)

+µ

2kA2−QS −F2k2

F,

where Y∈Rm×nis the Lagrange multiplier, hY, X i=

Trace(Y>X)is the trace inner product, and µ > 0is

a penalty parameter. Completing the square and keep-

ing only the relevant terms in (21), for the given iterates

{S(k), F (k)

2, Y (k), µk}we have

S(k+1) = arg min

SL(S, F (k)

2, Y (k), µk)

= arg min

S

µk

2

A2−QS −F(k)

2+1

µk

Y(k)

2

F

,

F(k+1)

2= arg min

F2

L(S(k+1), F2, Y (k), µk)

= arg min

F2kF2k`1+µk

2

A2−QS(k+1) −F2+1

µk

Y(k)

2

F

.

The solution to the ﬁrst subproblem is obtained by set-

ting the gradient of L(S, F (k)

2, Y (k), µk)with respect to S

to 0, and using the fact that Q>Q=I:

S(k+1) =Q>A2−F(k)

2+1

µk

Y(k).(22)

The second subproblem is the classic sparse recovery prob-

lem and its solution is given by

F(k+1)

2=S1

µkA2−QS(k+1) +1

µk

Y(k),(23)

where S1

µk

(·)is the elementwise shrinkage function [26,

4]. We update Ykand µkvia:

Y(k+1) =Y(k)+µk(A2−QS(k+1) −F(k+1)

2)

µk+1 =ρµk

,(24)

for a ﬁxed ρ > 1.

Algorithm 1: ALM

1Input : A= [A1A2]∈Rm×n0(data matrix), threshold

> 0, ρ > 1, µ0>0;

2Initialize: A1=QR, Y (0) =A2/kA2k∞, S (0), F (0)

2;

3while not converged do

4S(k+1) =Q>(A2−F(k)

2+1

µkY(k));

5F(k+1)

2=S1

µk

(A2−QS(k+1) +1

µkY(k));

6Y(k+1) =Y(k)+µk(A2−QS(k+1) −F(k+1)

2);

7µk+1 =ρµk;

8k=k+ 1;

end

9Output : S(k), F (k)

2

Dual problem. Next we formulate the Lagrangian dual

of (6) to get an insight into the choice of the Lagrange mul-

tiplier Y. Using standard arguments, we obtain

min

F2,S :A2=QS+F2kF2k`1= min

F2,S sup

YkF2k`1

+hY, A2−QS −F2i

≥sup

Y

min

F2,S kF2k`1

+hY, A2−QS −F2i

= sup

Y:kYk∞≤1, Q>Y=0hY, A2i.

(25)

The last problem above is the dual of (6) . Clearly, the dual

is a linear program. Note that the constraint Q>Ydictates

that the columns of Ybe orthogonal to all columns of Q

. That is, the columns of Ymust be from the nullspace of

Q. If we relax this constraint, the resulting problem has a

simple closed form solution, namely

Y(0) =A2/kA2k∞.

This is a good choice for the initial value of Yin Algo-

rithm 1.

7.2. Grassmannian robust adaptive subspace esti-

mation (GRASTA)

Due to close connection with our ALM, we ex-

plain the Grassmannian robust adaptive subspace estima-

tion (GRASTA) in this section. In 2012, He et al. [27]

proposed GRASTA, a robust subspace tracking algorithm,

and showed its application in background estimation prob-

lem. Unlike Robust PCA [31,55], GRASTA is not a batch-

video background estimation algorithm. GRASTA solves

the background estimation problem in an incremental man-

ner, considering one frame at a time. At each time step i,

it observes a subsampled video frame aiΩs. That is, each

video frame ai∈Rmis subsampled over the index set

Ωs⊂ {1,2,·· · , m}to produce aiΩs, where sis the sub-

sample percentage. Similarly, denote the foreground as

F2= (f1, . . . , fn). Therefore, fiΩs∈R|Ωs|is a vector

whose entries are indexed by Ωs. Considering each video

frame aiΩshas a low rank (say, r) and sparse structure,

GRASTA models the video frame as:

aiΩs=UΩsx+fiΩs+Ωs,

where U∈Rm×rbe an orthonormal basis of the low-

dimensional subspace, x∈Rris a weight vector, and Ωs∈

R|Ωs|is a Gaussian noise vector. The matrix UΩs∈R|Ωs|×r

results from choosing the rows of Ucorresponding to the

index set Ωs. With the notations above, at each time step i,

GRASTA solves the following optimization problem: For a

given orthonormal basis UΩs∈R|Ωs|×rsolve

min

xkUΩsx−aiΩsk`1.(26)

Problem (26) is the classic least absolute deviations prob-

lem similar to (7) and can be rewritten as:

minfiΩskfiΩsk`1

subject to UΩsx+fiΩs−aiΩs= 0.(27)

Problem (27) can be solved by the use of the augmented

Lagrangian multiplier method (ALM) [9]. In GRASTA, af-

ter updating xand fiΩs, one has to update the orthonor-

mal basis UΩsas well. The rank one UΩsupdate step is

done ﬁrst by ﬁnding a gradient of the augmented Lagrange

dual of (27), and then by using the classic gradient de-

scent algorithm. In summary, at each time step i, given a

U(i)∈Rm×rand Ωs⊂ {1,2,·· · , m}, GRASTA ﬁnds x

and fiΩsvia (27) and then updates U(i+1)

Ωs. This process

continues until the video frames are exhausted.

Comparison between ALM and GRASTA. 1. At each

step of GRASTA, the background and the sparse foreground

are given as UΩsxand aiΩs−UΩsx, respectively and then

one has to update the basis UΩs. In contrast, (20) solves

a supervised batch video background estimation problem.

In our model, once we obtain the basis set from the QR de-

composition of the background matrix A1, we do not update

the basis further. 2. GRASTA lacks a convergence analysis

which is harder to obtain as the objective function (26) in

their set-up is only convex in each component. [27]. Our

objective function in (6) and in (21) are convex and there-

fore allow us to propose a thorough convergence analysis

for ALM.

7.3. Cost of One Iteration

We discuss the complexity of one iteration of Algo-

rithm 1when A1is of full rank, that is, rank(A1) = r. The

complexity of the QR decomposition at the initialization

step is O(2mr2−2

3r3). Because r≤rmax , the maximum

number of available training frames, the above cost can be

controlled by the user. Next, the complexity of one iteration

of Algorithm 1is O(mnr).In contrast, the cost of each it-

eration of GRASTA is O(|Ωs|r3+K r|Ωs|+mr2),where

Kis the number of inner iterations and |Ωs|is the cardinal-

ity of the index set Ωs⊂ {1,2,·· · , m}from which each

video frame ai∈Rmis subsampled at a percentage s(see

Section 7.2).

7.4. Stopping Criteria

Deﬁne Lk:= L(S(k), F (k)

2, Y (k−1) , µk−1). With the

notations above, for a given > 0, Algorithm 1converges

if kA2−QS(k)−F(k)

2kF/kA2kF< , or |Lk−Lk−1|< ,

or if the maximum iteration is reached.

7.5. Remarks on the Behaviour of ALM

In this section, we propose the convergence of Algo-

rithm 1.

Lemma 1. The sequence {Y(k)}is bounded.

Proof. By the optimality condition of F(k+1)

2we have,

0∈∂F2L(S(k+1), F2, Y (k), µk).

Therefore,

0∈∂kF(k+1)

2k`1−µk(A2−QS(k+1) −F(k+1)

2+1

µk

Y(k)),

which implies Y(k+1) ∈∂kF(k+1)

2k`1.By using Theorem

4 in [31] (see also [54]), we conclude that the sequence

{Y(k)}is bounded by the dual norm of k·k`1, that is, the

k·k∞norm.

Theorem 2. There is a constant γsuch that

kA2−QS(k)−F(k)

2k ≤ γ

µk

, k = 1,2,··· .

Proof. By using (24) we have

A2−QS(k)−F(k)

2=1

µk−1

(Y(k)−Y(k−1)).

The result follows by applying Lemma 1.

Theorem 3. The sequence {Lk}is bounded above and

Lk+1 −Lk≤O1

µk−1, k = 1,2,··· .

Proof. We have,

Lk+1 =L(S(k+1), F (k+1)

2, Y (k), µk)

≤L(S(k+1), F (k)

2, Y (k), µk)

≤L(S(k), F (k)

2, Y (k), µk)

=kF(k)

2k`1+hY(k), A2−QS(k)−F(k)

2i

+µk

2kA2−QS(k)−F(k)

2k2

F

=kF(k)

2k`1+hY(k−1), A2−QS(k)−F(k)

2i

+µk−1

2kA2−QS(k)−F(k)

2k2

F

+hY(k)−Y(k−1), A2−QS(k)−F(k)

2i

+µk−µk−1

2kA2−QS(k)−F(k)

2k2

F

(using (24))

=Lk+µk−1kA2−QS(k)−F(k)

2k2

F

+µk−µk−1

2kA2−QS(k)−F(k)

2k2

F

=Lk+µk+µk−1

2kA2−QS(k)−F(k)

2k2

F.

Therefore,

Lk+1−Lk≤µk+µk−1

2kA2−QS(k)−F(k)

2k2

F, k = 1,2,·· · .

By using (24) we have for k= 1,2,···

Lk+1−Lk≤µk+µk−1

µ2

k−1kY(k)−Y(k−1)k2

F=1 + ρ

µk−1kY(k)−Y(k−1)k2

F.

Next by using the boundedness of {Y(k)}we ﬁnd

Lk+1 −Lk≤O1

µk−1, k = 1,2,··· ,

which is what we set out to prove.

Theorem 4.

f∗− kF(k)

2k`1≤O1

µk,

where f∗= min

A2=QS+F2kF2k`1.

Proof. By using the triangle inequality we have

kF(k)

2k`1≥ kA2−QS(k)k`1

−kA2−QS(k)−F(k)

2k`1

(using (24))

≥f∗−1

µk−1kY(k)−Y(k−1)k`1.

(28)

The result follows by applying boundedness of the multipli-

ers Y(k).

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

GRASTA, s=10%, area: 0.7439

iEALM, area: 0.9241

ReProCS, area: 0.8755

ALM, area: 0.9512

(a)

0 100 200 300 400 500 600

-0.2

0

0.2

0.4

0.6

0.8

1

GRASTA, s=10%, mean: 0.5226

iEALM, mean: 0.9301

ReProCS, mean: 0.7980

ALM, mean: 0.9524

(b)

Figure 12: (a) ROC curve to compare between ALM, iEALM, GRASTA, and ReProCS on Basic video, frame size 144 ×176. (b)

Comparison of Mean SSIM (MSSIM) of ALM, iEALM, GRASTA, and ReProCS on Basic video. ALM has the best MSSIM. To process

600 frames each of size 144 ×176, iEALM takes 164.03 seconds, GRASTA takes 20.25 seconds, ReProCS takes 14.20 seconds, and ALM

takes 13.13 seconds.

BG+F G ALM BG ALM FG

Figure 13: Background and foreground recovered by ALM. The

videos have static foreground and dynamic background.

8. Smooth Optimization of `1Regression with

Parallel Coordinate Descent Methods [19]

Imagine a situation when one processes a very low-

resolution video sequence with a huge number of avail-

able training frames. That is, when there are more train-

ing frames rthan the number of pixels m, the method used

in [19] to solve (8) for each icould be more effective. In

this scenario we propose to solve each `1regression prob-

lem in (8) by using the parallel coordinate descent meth-

ods on their smooth variants [19]. Note that each fi(si)

is a non-smooth continuous convex function on a compact

set E1. By using Nesterov’s smoothing technique [37] one

can ﬁnd a smooth approximation fµ

i(si)of fi(si)for any

µ > 0.Fercoq et al. [19] minimized fµ

i(si)to approxi-

mately solve the original `1regression problem that con-

tains fi(si).

9. Additional numerical experiments demon-

strating the effectiveness of ALM

To demonstrate the robustness of the ALM in batch

mode, we compare ALM with other state-of-the-art batch

background estimation methods, such as, iEALM [31] of

RPCA, GRASTA [27], and ReProCS [24] on the Basic

scenario. We use 15 training frames for ALM. Figure 12a

shows that ALM covers the maximum area under the ROC

curve. Additionally, in Figure 12b, our ALM has the best

mean SSIM (MSSIM) among all other methods. More-

over, in batch mode, ALM takes the least computational

time. The background and foreground recovered by ALM in

batch mode also shows its effectiveness in supervised back-

ground estimation (see Figure 13)

References

[1] https://sparselab.stanford.edu/. 3

[2] http://sbmi2015.na.icar.cnr.it/SBIdataset.html. 6,8

[3] A. Agarwala, M. Dontcheva, M. Agrawala,

S. Drucker, A. Colburn, B. Curless, D. Salesin,

and M. Cohen. Interactive digital photomontage.

ACM Transactions on Graphics, 23:294–302, 2004.

6,7,8

[4] T. Boas, A. Dutta, X. Li, K. P. Mercier, and E. Nider-

man. Shrinkage function and its applications in matrix

approximation. Electronic Journal of Linear Algebra,

32:163–171, 2017. 11

[5] D. D. Bolci, A. Pennisi, and L. Iocchi. Parallel multi-

modal background modeling. Pattern Recognition

Letters, 96:45–54, 2017. 7,8

[6] T. Bouwmans. Traditional and recent approaches in

background modeling for foreground detection: An

overview. Computer Science Review, 11–12:31 – 66,

2014. 1

[7] T. Bouwmans, L. Maddalena, and A. Petrosino. Scene

background initialization: A taxonomy. Pattern

Recognition Letters, 2017. 6,8

[8] T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, and E.-

H. Zahzah. Decomposition into low-rank plus additive

matrices for background/foreground separation: A re-

view for a comparative evaluation with a large-scale

dataset. Computer Science Review, 2016. 1

[9] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eck-

stein. Distributed optimization and statistical learn-

ing via the alternating direction method of multipliers.

Foundations and Trends in Machine Learning, 3(1):1–

122, 2011. 11

[10] S. Brutzer, B. H¨

oferlin, and G. Heidemann. Evalua-

tion of background subtraction techniques for video

surveillance. IEEE Computer Vision and Pattern

Recognition, pages 1568–1575, 2012. 6,7

[11] C. S. Burrus, J. A. Barreto, and I. W. Selesnick. Itera-

tive reweighted least-squares design of ﬁr ﬁlters. IEEE

Transaction on Signal Processing, 42(11):2926–2936,

1994. 3,4

[12] E.J. Cand`

es, M.B. Wakin, and S. P. Boyd. Enhanc-

ing sparsity by reweighted `1minimization. Journal

of Fourier Analysis and Applications, 14(5):877–905,

2008. 3

[13] I. Daubechies, R. DeVore, M. Fornasier, and C. S.

Gunturk. Iteratively reweighted least squares min-

imization for sparse recovery. Communications on

Pure and Applied Mathematics, 63:1–38, 2010. 3

[14] A. Dutta, B. Gong, X. Li, and M. Shah. Weighted sin-

gular value thresholding and its application to back-

ground estimation, 2017. arXiv:1707.00133. 1

[15] A. Dutta and X. Li. A fast algorithm for a weighted

low rank approximation. In 15th IAPR International

Conference on Machine Vision Applications (MVA),

pages 93–96, 2017. 10

[16] A. Dutta and X. Li. On a problem of weighted low-

rank approximation of matrices. SIAM Journal on Ma-

trix Analysis and Applications, 38(2):530–553, 2017.

10

[17] A. Dutta and X. Li. Weighted low rank approximation

for background estimation problems. In The IEEE In-

ternational Conference on Computer Vision (ICCV),

pages 1853–1861, 2017. 10

[18] A. Dutta, X. Li, and P. Richt´

arik. A batch-incremental

video background estimation model using weighted

low-rank approximation of matrices. In The IEEE In-

ternational Conference on Computer Vision (ICCV),

pages 1835–1843, 2017. 7

[19] O. Fercoq and P. Richt´

arik. Smooth minimization of

nonsmooth functions with parallel coordinate descent

methods. arXiv:1309.5885, 2013. 13

[20] G. H. Golub, A. Hoffman, and G. W. Stewart. A gener-

alization of the Eckart-Young-Mirsky matrix approxi-

mation theorem. Linear Algebra and its Applications,

88(89):317–327, 1987. 1,9,10

[21] M. Grant and S. Boyd. Graph implementations for

nonsmooth convex programs. In Recent Advances in

Learning and Control, Lecture Notes in Control and

Information Sciences, pages 95–110. Springer-Verlag

Limited, 2008. 3

[22] M. Grant and S. Boyd. CVX: Matlab software for dis-

ciplined convex programming, version 2.1. http:

//cvxr.com/cvx, 2014. 3

[23] M. D. Gregorio and M. Giordano. Background esti-

mation by weightless neural networks. Pattern Recog-

nition Letters, 96:55–65, 2017. 7,8

[24] H. Guo, C. Qiu, and N. Vaswani. An online algorithm

for separating sparse and low-dimensional signal se-

quences from their sum. IEEE Transactions on Signal

Processing, 62(16):4284–4297, 2014. 3,7,13

[25] H. Guo, C. Qiu, and N. Vaswani. Practical REPROCS

for seperating sparse and low-dimensional signal se-

quences from their sum-part 1. In IEEE International

Conference on Acoustic, Speech and Signal Process-

ing, pages 4161–4165, 2014. 3,7

[26] E.T. Hale, W. Yin, and Y. Zhang. Fixed-point contin-

uation for `1-minimization: methodology and conver-

gence. SIAM Journal on Optimization, 19:1107–1130,

2008. 11

[27] J. He, L. Balzano, and A. Szlam. Incremental gradient

on the grassmannian for online foreground and back-

ground separation in subsampled video. IEEE Com-

puter Vision and Pattern Recognition, pages 1937–

1944, 2012. 3,7,11,12,13

[28] T. Huang, G. Yang, and G. Tang. A fast two-

dimensional median ﬁltering algorithm. IEEE Trans.

Acoustic, Speech, Signal Processing, 27(1):13–18,

1979. 7

[29] I. T. Jolliffee. Principal component analysis, 2002.

Second edition. 3

[30] L. Li, W. Huang, I.-H. Gu, and Q. Tian. Statistical

modeling of complex backgrounds for foreground ob-

ject detection. IEEE Transactions on Image Process-

ing, 13(11):1459–1472, 2004. 6,7

[31] Z. Lin, M. Chen, and Y. Ma. The augmented lagrange

multiplier method for exact recovery of corrupted low-

rank matrices, 2010. arXiv1009.5055. 1,6,7,11,12,

13

[32] C. Lu, Z. Lin, and S. Yan. Smoothed low rank and

sparse matrix recovery by iteratively reweighted least

squares minimization. IEEE Transactions on Image

Processing, 24(2):646–654, 2015. 3

[33] L. Maddalena and A. Petrosino. The SOBS algorithm:

What are the limits? In The IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

21–26, 2012. 6,7,8

[34] L. Maddalena and A. Petrosino. Towards benchmark-

ing scene background initialization. In New Trends in

Image Analysis and Processing – ICIAP 2015 Work-

shops, pages 469–476, 2015. 5,6,7,8,9

[35] G. Mateos and G. Giannakis. Robust PCA as bilin-

ear decomposition with outlier-sparsity regularization.

IEEE Transaction on Signal Processing, 60(10):5176–

5190, 2012. 7

[36] B. Millikan, A. Dutta, N. Rahnavard, Q. Sun, and

H. Foroosh. Initialized iterative reweighted least

squares for automatic target recognition. In Proceed-

ings of IEEE Military Communications Conference,

pages 506–510, 2015. 3

[37] Y. Nesterov. Smooth minimization of non-

smooth functions. Mathematical Programmming,

103(1):127–152, 2005. 13

[38] Y. Nesterov. Introductory lectures on convex opti-

mization: A basic course, 2014. First edition. 4,6

[39] N. Oliver, B. Rosario, and A. Pentland. A Bayesian

computer vision system for modeling human interac-

tions. In International Conference on Computer Vision

Systems, pages 255–272, 1999. 1

[40] M. Osborne. Finite algorithms in optimization and

data analysis, 1985. John Wiley & Sons, Inc. 3,4

[41] C. Qiu and N. Vaswani. Support predicted modiﬁed-

CS for recursive robust principal components pur-

suit. In IEEE International Symposium on Information

Theory, pages 668–672, 2011. 3,7

[42] V. Reddy, C. Sanderson, and B. C. Lovell. A low-

complexity algorithm for static background estimation

from cluttered image sequences in surveillance con-

texts. Journal of Image Video Process., pages 1:1–

1:14, 2011. 7,8

[43] P. Richt´

arik. Some algorithms for large-scale convex

and linear minimization in relative scale. PhD thesis,

Cornell University, 2007. 3

[44] P. Rodriguez and B. Wohlberg. A matlab implementa-

tion of a fast incremental principal component pursuit

algorithm for video background modeling. In IEEE

International Conference on Image Processing, pages

3414–3416, 2014. 3,7,8

[45] P. Rodriguez and B. Wohlberg. Translational and rota-

tional jitter invariant incremental principal component

pursuit for video background modeling. In 2015 IEEE

International Conference on Image Processing, pages

537–541, 2015. 3,7

[46] P. Rodriguez and B. Wohlberg. Incremental princi-

pal component pursuit for video background model-

ing. Journal of Mathematical Imaging and Vision,

55(1):1–18, 2016. 3,7,8

[47] J. Romberg. https://statweb.stanford.edu/ can-

des/l1magic/. 3

[48] S. Shalev-Shwartz and S. Ben-David. Understanding

machine learning: From theory to algorithms, 2014.

Cambridge University Press. 6

[49] J. Sigl. Nonlinear residual minimization by iteratively

reweighted least squares, 2015. arXiv:1504.06815. 4

[50] K Toyama, J. Krumm, B. Brumitt, and B. Meyers.

Wallﬂower: Principles and practice of background

maintainance. Seventh International Conference on

Computer Vision, pages 255–261, 1999. 6,7

[51] N. Wang, T. Yao, J. Wang, and D.-Y. Yeung. A prob-

abilistic approach to robust matrix factorization. In

Proceedings of 12th European Conference on Com-

puter Vision, pages 126–139, 2012. 7

[52] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-

celli. Image quality assessment: from error visibility

to structural similarity. IEEE Transaction on Image

Processing, 13(4):600–612, 2004. 6

[53] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multi-

scale structural similarity for image quality assess-

ment. In 37th IEEE Asilomar Conference on Signals,

Systems, and Computers, pages 1398–1402, 2003. 6

[54] G.A. Watson. Characterization of the subdifferential

of some matrix norms. Linear Algebra and its Appli-

cations, 170:33–45, 1992. 12

[55] J. Wright, Y. Peng, Y. Ma, A. Ganseh, and S. Rao. Ro-

bust principal component analysis: exact recovery of

corrputed low-rank matrices by convex optimization.

Proceedings of 22nd Advances in Neural Information

Processing systems, pages 2080–2088, 2009. 1,11

[56] B. Xin, Y. Tian, Y. Wang, and W. Gao. Background

subtraction via generalized fused Lasso foreground

modeling. IEEE Computer Vision and Pattern Recog-

nition, pages 4676–4684, 2015. 1,7,9,10

[57] Y. Yalman and I. Erturk. A new color image qual-

ity measure based on yuv transformation and psnr for

human vision system. Turkish Journal of Electrical

Engineering and Computer Sciences, 21(2):603–612,

2013. 6