Content uploaded by Olivier Vu Thanh

Author content

All content in this area was uploaded by Olivier Vu Thanh on May 06, 2021

Content may be subject to copyright.

Inertial Majorization-Minimization Algorithm for

Minimum-Volume NMF

Olivier Vu Thanh1, Andersen Ang2,1, Nicolas Gillis1, Le Thi Khanh Hien1

1Department of Mathematics and Operational Research, Facult´

e Polytechnique, Universit´

e de Mons

Rue de Houdain 9, 7000 Mons, Belgium

{olivier.vuthanh, manshun.ang, nicolas.gillis, thikhanhhien.le}@umons.ac.be

2Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo, Canada

Abstract—Nonnegative matrix factorization with the

minimum-volume criterion (min-vol NMF) guarantees that,

under some mild and realistic conditions, the factorization has

an essentially unique solution. This result has been successfully

leveraged in many applications, including topic modeling,

hyperspectral image unmixing, and audio source separation. In

this paper, we propose a fast algorithm to solve min-vol NMF

which is based on a recently introduced block majorization-

minimization framework with extrapolation steps. We illustrate

the effectiveness of our new algorithm compared to the state of

the art on several real hyperspectral images and document data

sets.

Index Terms—nonnegative matrix factorization, minimum vol-

ume, fast gradient method, majorization-minimization, hyper-

spectral imaging

I. INTRODUCTION

Nonnegative Matrix Factorization (NMF) has been an active

ﬁeld of research since the seminal paper by Lee and Seung [1].

The success of NMF comes from many speciﬁc applications

since many types of data are nonnegative; for example ampli-

tude spectrograms in audio source separation, images, evalua-

tions in recommendation systems, and documents represented

by vectors of word counts; see [2] and the references therein.

Compared to other unconstrained factorization models such

as PCA/SVD, NMF requires the factors to be nonnegative.

This constraint naturally leads to factors that are more easily

interpretable [1]. Nonetheless, there are two drawbacks with

NMF: computability and identiﬁability.

Computability. As opposed to PCA/SVD, solving NMF is

NP-hard in general [3]. Hence most NMF algorithms rely

on standard non-linear optimization schemes without global

optimality guarantee.

Identiﬁability. NMF solutions are typically not unique, that is,

they are not unique even after removing the trivial scaling and

permutation ambiguities of the rank-one factors; see [4] and

the references therein. For NMF to have a unique solution, also

known as identiﬁability, one needs to add additional structure

to the sought solution. One way to ensure identiﬁability is the

min-vol criterion, which minimizes the volume of one of the

NG and LTKH acknowledge the support by the European Research Council

(ERCstarting grant No 679515), the Fonds de la Recherche Scientiﬁque -

FNRS and the Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) under

EOS project O005318F-RG47.

The names of the last three authors are in alphabetical order.

factors. If the sufﬁciently scattered condition (SSC) is satisﬁed,

then identiﬁability holds for min-vol NMF [5]–[7].

Identiﬁability for min-vol NMF is a strong result that has

been used successfully in many applications such as topic

modeling and hyperspectral imaging [8], and audio source sep-

aration [7]. However, min-vol NMF is computationally hard to

solve. In this paper, after introducing the considered min-vol

NMF model in Section II, we propose a fast method to solve

min-vol NMF in Section III. Our method is an application of

a recent inertial block majorization-minimization framework

called TITAN [9]. Experimental results on real data sets show

that the proposed method performs better than the state of the

art; see Section IV.

II. MINIMUM-VO LU ME NMF

In the noiseless case, the exact NMF model is M=WH

where M∈Rm×n

+denotes the measured data, W∈Rm×r

+

(resp. H∈Rr×n

+) denotes the left factor (resp. the right

factor). The idea behind the min-vol criterion, a.k.a. Craig’s

belief [10], is that the convex hull spanned by the columns of

W, denoted conv(W), should embrace all the data points as

tightly as possible. In the absence of noise, min-vol NMF is

formulated as follows

min

W,Hdet(W>W)(1a)

s.t.M=WH,(1b)

H≥0,W≥0,1>H=1>,(1c)

where 1is a vector of appropriate size containing only ones.

The constraint (1c) ensures that every data point lies within

the convex hull spanned by the columns of W, that is,

M(:, j)∈conv(W)for all j. The volume of the convex hull

of Wand the origin in the subspace span by columns of W,

is proportional to det(W>W); see for example [5]. Under

the sufﬁciently scattered conditions (SSC), which requires the

columns of Mto be sufﬁciently spread within conv(W)or,

equivalently, that His sufﬁciently sparse, min-vol NMF has

an essentially unique solution [5], [6]. A drawback of (1c)

is that it requires the entries in each column of Hto sum to

one, which is not without loss of generality: it imposes that the

columns of Mbelong to the convex hull of the columns of W

as opposed to the conical hull when the equality constraints

of (1c) are absent; see for example [2, Chapter 4].

It was recently shown that the same model where the

constraint 1>H=1>is replaced with 1>W=1>retains

identiﬁability [7]. The sum-to-one constraint on the columns

of W, that is, 1>W=1>, can be assumed w.l.o.g. via

the scaling ambiguity of the rank-one factors W(:, k)H(k, :)

in any NMF decomposition. Moreover, the model with the

constraint on Wwas shown to be numerically much more

stable as it makes Wbetter conditioned which is important

because computing the derivative of det(W>W)requires

computing the inverse of W>W. We refer the interested

reader to [2, Chapter 4.3.3] for a discussion on these models.

In the presence of noise, min-vol NMF is typically formu-

lated via penalization. In this paper, we consider the following

min-vol NMF model

min

W,H

1

2kM−WHk2

F+λ

2log det(W>W+δIr)

s.t.H≥0,W≥0,1>W=1>,

(2)

where k.kFis the Frobenius norm, λ > 0is a parameter

balancing the two terms in the objective function, Iris the

r×ridentity matrix, and δ > 0is a small parameter that

prevents log det(W>W)from going to −∞ if Wis rank

deﬁcient [11]. The use of the logarithm of the determinant is

less sensible to very disparate singular values of W, leading

to better practical performances [8], [12].

Applications. In hyperspectral unmixing (HU), each column

of Mcontains the spectral reﬂectance of a pixel, each row of

Mcorresponds to the reﬂectance of a spectral band among

all pixels, each column of Wis the spectral signature of an

endmember (a pure material in the image), and each column of

Hcontains the proportion of each identiﬁed pure material in

the corresponding pixel; see [13]. Geometrically, the min-vol

NMF in (2) applied to HU consists of ﬁnding endmembers

such that the convex hull spanned by them and the origin

embraces as tightly as possible every pixels in M. This is

the so-called Craig’s belief [10]. In document classiﬁcation,

Mis a word-by-document matrix so that the columns of W

correspond to topics (that is, set of words found simultaneously

in several documents) while the columns of Hallow to assign

each documents to the topics it discusses [8].

III. NEW ALGORITHM FOR MIN-VOL NMF

As far as we know, all algorithms for min-vol NMF rely

on two-block coordinate descent methods that update each

block (Wor H) by using some outer optimization algorithm

to solve the subproblems formed by restricting the min-vol

NMF problem to each block. For example, the state-of-the-

art method from [11] uses Nesterov fast gradient method to

update each factor matrix, one at a time.

Our proposed algorithm for (2) will be based on the TITAN

framework from [9]. TITAN is an inertial block majorization

minimization framework for nonsmooth nonconvex optimiza-

tion. It updates one block at a time while ﬁxing the values

of the other blocks, as previous min-vol NMF algorithms. In

order to update a block, TITAN chooses a block surrogate

function for the corresponding objective function (a.k.a. a

majorizer), embeds an inertial term to this surrogate function

and then minimizes the obtained inertial surrogate function.

When a Lipschitz gradient surrogate is used, TITAN reduces

to the Nesterov-type accelerated gradient descent step for each

block of variables [9, Section 4.2]. The difference of TITAN

compared to previous min-vol NMF algorithms is threefold:

1) The inertial force (also known as the extrapolation, or

momentum) is used between block updates. This is a

crucial aspect that will make our proposed algorithm

faster: when we start the update of a block of variables

(here, Wor H), we can use the inertial force (using the

previous iterate) although the other blocks have been

updated in the mean time.

2) TITAN allows to update the surrogate after each update

of Wand H, which was not possible with the algorithm

from [11] because it applied fast gradient from convex

optimization on a ﬁxed surrogate.

3) It has subsequential convergence guarantee, that is, every

limit point of the generated sequence is a stationary point

of Problem (2). Note that the state-of-the-art algorithm

from [11] does not have convergence guarantees.

Remark. The block prox-linear (BPL) method from [14] can

be used to solve (2) since the block functions in W7→

1

2kM−WHk2

Fand in H7→ 1

2kM−WHk2

Fhave Lipschitz

continuous gradients. However, BPL applies extrapolation to

the Lipschitz gradient surrogate of these block functions and

requires to compute the proximal point of the regularizer

λ

2log det(W>W+δIr), which does not have a closed form.

In contrast, TITAN applies extrapolation to the surrogate

function of W7→ f(W,H)with a surrogate function for

the regularizer λ

2log det(W>W+δIr)(see Section III-A1).

This allows TITAN to have closed-form solutions for the sub-

problems, an acceleration effect, and convergence guarantee.

A. Surrogate functions

An important step of TITAN is to deﬁne a surrogate function

for each block of variables. These surrogate functions are

upper approximation of the objective function at the current

iterate. Denote

f(W,H) = 1

2kM−WHk2

F+λ

2log det(W>W+δIr)

and suppose we are cyclically updating (W,H). Let us denote

uWk(W)the surrogate function of W7→ f(W,Hk)to

update Wk, that is,

f(W,Hk)≤uWk(W)for all W∈ XW,(3)

where uWk(Wk) = f(Wk,Hk)and XWis the feasible

domain of W. Similarly, let us denote uHk(H)the surrogate

function of H7→ f(Wk+1,H)to update Hk, that is

f(Wk+1,H)≤uHk(H)for all H∈ XH,(4)

where uHk(Hk) = f(Wk+1,Hk)and XHis the feasible

domain of H.

1) Surrogate function and update of W:Denote A=

W>W+δIr,Bk=W>

kWk+δIrand Pk= (Bk)−1. Since

log det is concave, its ﬁrst-order Taylor expansion around Bk

leads to log det(A)≤log det(Bk) + h(Bk)−1,A−Bki.

Hence,

f(W,Hk)≤e

fWk(W) := 1

2kM−WHkk2

F

+λ

2hPk,W>Wi+C1,(5)

where C1is a constant independent of W. Note that the

gradient of W7→ e

fWk(W), being equal to

(WHk−M)H>

k+λWPk,

is Lk

W-Lipschitz continuous with Lk

W=kHkH>

k+λPkk2.

Hence, from (5) and the descent lemma (see [15, Section 2.1]),

f(W,Hk)≤uWk(W) := h∇ e

fWk(Wk),Wi

+Lk

W

2kW−Wkk2

F+C2,(6)

where C2is a constant depending on Wk. We use the

surrogate uWk(W)deﬁned in (6) to update Wk. As TITAN

recovers Nesterov-type acceleration for the update of each

block of variables [9, Section 4.2], we have the following

update for W:

Wk+1 = argmin

W∈XW

h∇ e

fWk(Wk),Wi+Lk

W

2kW−Wkk2

F,

=PWk+(M−WkHk)H>

k−λWkP

Lk

W,

(7)

where Pperforms column wise projections onto the unit

simplex as in [16] in order to satisfy the constraint on W

in (2), and where Wkis an extrapolated point, that is, the

current point Wkplus some momentum,

Wk=Wk+βk

W(Wk−Wk−1),(8)

where the extrapolation parameter βk

Wis chosen as follows

βk

W= min

αk−1

αk+1

,0.9999sLk−1

W

Lk

W

,(9)

α0= 1,αk= (1+q1+4α2

k−1)/2. This choice of parameter

satisﬁes the conditions to have a subsequential convergence of

TITAN, see Section III-C.

2) Surrogate function and update of H:Since

∇Hf(Wk+1,H) = W>

k+1(Wk+1 H−M),

the gradient of faccording to His Lk

H-Lipschitz continuous

with Lk

H=kW>

k+1Wk+1 k2. Hence, we use the following

Lipschitz gradient surrogate to update Hk:

uHk(H) = h∇Hf(Wk+1,Hk),Hi+Lk

H

2kH−Hkk2

F+C3,

(10)

where C3is a constant depending on Hk. We derive our

update rule for Hby minimizing the surrogate function from

Equation (10) embedded with extrapolation,

Hk+1 = argmin

H∈XH

h∇Hf(Wk+1,Hk),Hi+Lk

H

2kH−Hkk2

F,

=Hk+1

Lk

H

W>

k+1(M−Wk+1 Hk)+

,

(11)

where [.]+denotes the projector setting all negative values to

zero, and Hkis the extrapolated Hk:

Hk=Hk+βk

H(Hk−Hk−1),(12)

where, as for the update of W,

βk

H= min

αk−1

αk+1

,0.9999sLk−1

H

Lk

H

.(13)

B. Algorithm

Note that the update of Win (7) and Hin (11) was

described when the cyclic update rule is applied. Since TITAN

also allows the essentially cyclic rule [9, Section 5], we can

update Wseveral times before switching updating H, and

vice versa. This leads to our proposed method TITANized

min-vol, see Algorithm 1 for the pseudo code. The stopping

criteria in lines 4 and 15 is the same as in [11]. The way λ

and δare computed is also identical to [11]. Let us mention

that technically the main difference with [11] resides in how

the extrapolation is embedded. In [11] the Nesterov sequence

is restarted and evolves in each inner loop to solve each

subproblem corresponding to each block. In our algorithm,

the extrapolation parameter βW(and βH) for updating each

block W(and H) is updated continuously without restarting.

It means we are accelerating the global convergence of the

sequences rather than trying to accelerate the convergence

for the subproblems. Moreover, TITAN allows to update the

surrogate function at each step, while the algorithm from [11]

can only update it before each subproblem is solved, as it

relies on Nesterov’s acceleration for convex optimization.

C. Convergence guarantee

In order to have a convergence guarantee, TITAN requires

the update of each block to satisfy the nearly sufﬁciently

decreasing property (NSDP), see [9, Section 2]. By [9, Section

4.2.1], the update for Hof TITANized min-vol satisﬁes the

NSDP condition since it uses a Lipschitz gradient surrogate

for H7→ f(W,H)combined with the Nesterov-type ex-

trapolation; and the bounds of the extrapolation parameters

in the update of Hare derived similarly as in [9, Section

6.1]. However, it is important noting that the update for Wof

TITANized min-vol does not directly use a Lipschitz gradient

surrogate for W7→ f(W,H). We thus need to verify NSDP

condition for the update of Wby another method that is

presented in the following.

The function uWk(W)is a Lipschitz gradient surrogate

of ˜

fWk(W)and we apply the Nesterov-type extrapolation to

Algorithm 1 TITANized min-vol

1: initialize W0and H0,

2: α1= 1,α2= 1,Wold =W0,Hold =H0,Lprev

W=

kH0H>

0+λ(W>

0W0+δIr)−1k2,Lprev

H=kW>

0W0k2

3: repeat

4: while stopping criteria not satisﬁed do

5: α0=α1, α1= (1 + p1+4α2

0)/2

6: P←(W>W+δIr)−1

7: LW← kHH>+λPk2

8: βW= min h(α0−1)/α1,0.9999pLprev

W/LWi

9: W←W+βW(W−Wold)

10: Wold ←W

11: W←PhW+(MH>−W(HH>+λP))

LWi

12: Lprev

W←LW

13: end while

14: LH← kW>Wk2

15: while stopping criteria not satisﬁed do

16: α0=α2, α2= (1 + p1+4α2

0)/2

17: βH= min h(α0−1)/α2,0.9999pLprev

H/LHi

18: H←H+βH(H−Hold)

19: Hold ←H

20: H←hH+W>(M−WH)

LHi+

21: Lprev

H←LH

22: end while

23: until some stopping criteria is satisﬁed

obtain the update in (7). Note that the feasible set of Wis

convex. Hence, it follows from [9, Remark 4.1] that

˜

fWk(Wk) + Lk

W(βk

W)2

2kWk−Wk−1k2

F

≥˜

fWk(Wk+1) + Lk

W

2kWk+1 −Wkk2

F.(14)

Furthermore, we note that ˜

fWk(Wk) = f(Wk,Hk), and

˜

fWk(Wk+1)≥f(Wk+1 ,Hk). Therefore, from (14) we have

f(Wk,Hk) + Lk

W(βk

W)2

2kWk−Wk−1k2

F

≥f(Wk+1,Hk) + Lk

W

2kWk+1 −Wkk2

F,

which is the required NSDP condition of TITAN. Conse-

quently, the choice of βk

Win (9) satisfy the required condition

to guarantee subsequential convergence [9, Proposition 3.1].

On the other hand, we note that the error function W7→

e1(W) := uWk(W)−f(W,Hk)is continuously differen-

tiable and ∇We1(Wk) = 0; similarly for the error function

H7→ e2(H) := uHk(H)−f(Wk+1,H). Hence, it follows

from [9, Lemma 2.3] that the Assumption 2.2 in [9] is satisﬁed.

Applying [9, Theorem 3.2], we conclude that every limit point

of the generated sequence is a stationary point of Problem (2).

It is worth noting that as TITANized min-vol does not apply

restarting step, [9, Theorem 3.5] for a global convergence is

not applicable.

IV. NUMERICAL EXP ER IM EN TS

In this section we compare TITANized min-vol to [11], an

accelerated version of the method from [8] (for p= 2), on

two NMF applications: hyperspectral unmixing and document

clustering, which are dense and sparse data sets, respectively.

All tests are performed on MATLAB R2018a, on a PC with

an Intel® Core™ i7 6700HQ and 24GB RAM. The code is

available from https://github.com/vuthanho/titanized-minvol.

The data sets used are shown in Table I. For each data set,

each algorithm is launched with the same random initializa-

tions, for the same amount of CPU time. In order to derive

some statistics, for both hyperspectral unmixing and document

clustering, 20 random initializations are used (each entry of

Wand Hare drawn from the uniform distribution in [0,1]).

The CPU time used for each data set is adjusted manually, and

corresponds to the maximum displayed value on the respective

time axes in Fig. 1; see also Table II.

data set m n r

Urban 162 94249 6

Indian Pine 200 21025 16

Pavia Univ. 103 207400 9

San Diego 158 160000 7

Terrain 166 153500 5

20 News 61188 7505 20

Sports 14870 8580 7

Reviews 18483 4069 5

TABLE I: data sets used in our experiments and their respec-

tive dimensions

For display purposes, for each data set, we compare the

average of the scaled objective functions according to time,

that is, the average of (f(W,H)−emin)/kMkFwhere emin

is the minimum obtained error among the 20 different runs

and among both methods. The results are presented in Fig. 1.

On both hyperspectral and document data sets, TITANized

min-vol converges on average faster than [11] except for the

San Diego data set (although TITANized min-vol converges

initially faster). For most tested data sets, min-vol [11] cannot

reach the same error as TITANized min-vol within the allo-

cated time. In particular, TITANized min-vol achieves a lower

error in 94 out of the 100 runs for the hyperspectral images

(5 images with 20 random initialization each), and 55 out of

60 for the document data sets (3 sets of documents with 20

random initialization each).

We also reported in Table II TITANized min-vol’s lead time

over [11] when the latter reaches its minimum error after the

maximum allotted CPU time. The lead time is the time saved

by TITANized min-vol to achieve the error of the method

from [11] using the maximum allotted CPU time. On average,

TITANized min-vol is twice faster than [11], with an average

gain of CPU time above 50%.

To summarize, our experimental results show that TI-

TANized min-vol has a faster convergence speed and smaller

ﬁnal solutions than [11].

data set Our method’s CPU time Saved

lead time (s) for [11] CPU time

Urban 44 60 73%

Indian Pines 25 30 83%

Pavia Univ. 68 90 76%

San Diego NaN 120 0%

Terrain 44 60 73%

20News 221 300 74%

Reviews 26 30 80%

Sports 15 30 50%

TABLE II: TITANized min-vol’s lead time over min-vol [11]

to obtain the same minimum error.

V. CONCLUSION AND DISCUSSION

We developed a new algorithm to solve min-vol NMF (2)

based on the inertial block majorization-minimization frame-

work of [9]. This framework, under some conditions that

hold for our method, guarantees subsequential convergence.

Experimental results show that this acceleration strategy per-

forms better than the state-of-the-art accelerated min-vol NMF

algorithm from [11]. Future works will focus on different types

of acceleration such as Anderson’s acceleration [17], and on

different constraints on Wand/or Hto address some speciﬁc

applications.

REFERENCES

[1] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-

negative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.

[2] N. Gillis, Nonnegative Matrix Factorization. SIAM, 2020. [Online].

Available: https://doi.org/10.1137/1.9781611976410

[3] S. A. Vavasis, “On the complexity of nonnegative matrix factorization,”

SIAM Journal on Optimization, vol. 20, no. 3, pp. 1364–1377, 2010.

[4] X. Fu, K. Huang, N. D. Sidiropoulos, and W.-K. Ma, “Nonnegative ma-

trix factorization for signal and data analytics: Identiﬁability, algorithms,

and applications,” IEEE Signal Process. Mag., vol. 36, pp. 59–80, 2019.

[5] X. Fu, W.-K. Ma, K. Huang, and N. D. Sidiropoulos, “Blind separation

of quasi-stationary sources: Exploiting convex geometry in covariance

domain,” IEEE Trans. Signal Process., vol. 63, pp. 2306–2320, 2015.

[6] C.-H. Lin, W.-K. Ma, W.-C. Li, C.-Y. Chi, and A. Ambikapathi,

“Identiﬁability of the simplex volume minimization criterion for blind

hyperspectral unmixing: The no-pure-pixel case,” IEEE Trans. Geosci.

Remote Sens., vol. 53, no. 10, pp. 5530–5546, 2015.

[7] V. Leplat, N. Gillis, and M. S. Ang, “Blind audio source separation with

minimum-volume beta-divergence NMF,” IEEE Trans. Signal Process.,

vol. 68, pp. 3400–3410, 2020.

[8] X. Fu, K. Huang, B. Yang, W.-K. Ma, and N. D. Sidiropoulos, “Robust

volume minimization-based matrix factorization for remote sensing and

document clustering,” IEEE Trans. Signal Process., vol. 64, no. 23, pp.

6254–6268, 2016.

[9] L. T. K. Hien, D. N. Phan, and N. Gillis, “An inertial block majorization

minimization framework for nonsmooth nonconvex optimization,” 2020.

[10] M. D. Craig, “Minimum-volume transforms for remotely sensed data,”

IEEE Trans. Geosci. Remote Sens., vol. 32, no. 3, pp. 542–552, 1994.

[11] V. Leplat, A. M. S. Ang, and N. Gillis, “Minimum-volume rank-deﬁcient

nonnegative matrix factorizations,” in ICASSP, 2019, pp. 3402–3406.

[12] A. M. S. Ang and N. Gillis, “Algorithms and comparisons of nonneg-

ative matrix factorizations with volume regularization for hyperspectral

unmixing,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 12,

no. 12, pp. 4843–4853, 2019.

[13] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader,

and J. Chanussot, “Hyperspectral unmixing overview: Geometrical,

statistical, and sparse regression-based approaches,” IEEE J. Sel. Top.

Appl. Earth Obs. Remote Sens., vol. 5, no. 2, pp. 354–379, 2012.

[14] Y. Xu and W. Yin, “A globally convergent algorithm for nonconvex

optimization based on block coordinate update,” Journal of Scientiﬁc

Computing, vol. 72, no. 2, pp. 700–734, Aug 2017.

[15] Y. Nesterov, Lectures on Convex Optimization, 2nd ed. Springer

Publishing Company, Incorporated, 2018.

[16] L. Condat, “Fast projection onto the simplex and the `1ball,” Mathe-

matical Programming, vol. 158, no. 1, pp. 575–585, 2016.

[17] D. G. Anderson, “Iterative procedures for nonlinear integral equations,”

Journal of the ACM (JACM), vol. 12, no. 4, pp. 547–560, 1965.

(a) Urban

0 20 40 60

10−5

10−4

time (s)

TITANized min-vol

min-vol from [11]

(b) Indian Pines

0 10 20 30

10−5

10−4

time (s)

(c) Pavia Uni

0 30 60 90

10−5

10−4

time (s)

(d) San Diego

0 40 80 120

10−5

10−4

time (s)

(e) Terrain

0 20 40 60

10−4.5

10−4

time (s)

(f) 20News

0 100 200 300

10−4

10−3

time (s)

(g) Reviews

0 10 20 30

10−3.5

10−3

time (s)

(h) Sports

0 10 20 30

10−3.4

10−3.2

10−3

time (s)

Fig. 1: Evolution w.r.t. time of the average of (f(W,H)−

emin)/kMkFfor the different data sets.