Content uploaded by Feiping Nie

Author content

All content in this area was uploaded by Feiping Nie on Jul 16, 2015

Content may be subject to copyright.

Int J Comput Vis

DOI 10.1007/s11263-014-0781-x

Multi-Class Active Learning by Uncertainty Sampling

with Diversity Maximization

Yi Yang ·Zhigang Ma ·Feiping Nie ·

Xiaojun Chang ·Alexander G. Hauptmann

Received: 11 September 2013 / Accepted: 15 October 2014

© Springer Science+Business Media New York 2014

Abstract As a way to relieve the tedious work of man-

ual annotation, active learning plays important roles in many

applications of visual concept recognition. In typical active

learning scenarios, the number of labelled data in the seed

set is usually small. However, most existing active learning

algorithms only exploit the labelled data, which often suf-

fers from over-ﬁtting due to the small number of labelled

examples. Besides, while much progress has been made in

binary class active learning, little research attention has been

focused on multi-class active learning. In this paper, we pro-

pose a semi-supervised batch mode multi-class active learn-

ing algorithm for visual concept recognition. Our algorithm

exploits the whole active pool to evaluate the uncertainty

of the data. Considering that uncertain data are always sim-

ilar to each other, we propose to make the selected data as

diverse as possible, for which we explicitly impose a diversity

constraint on the objective function. As a multi-class active

learning algorithm, our algorithm is able to exploit uncer-

tainty across multiple classes. An efﬁcient algorithm is used

to optimize the objective function. Extensive experiments on

action recognition, object classiﬁcation, scene recognition,

and event detection demonstrate its advantages.

Communicated by Kristen Grauman.

Y. Ya n g ·X. Chang

Centre for Quantum Computation and Intelligent Systems,

University of Technology Sydney, Sydney, NSW, Australia

e-mail: yiyang@cs.cmu.edu

Z. Ma ·A. G. Hauptmann

School of Computer Science, Carnegie Mellon University,

Pittsburgh, PA, USA

F. Ni e (B)

The Center for OPTical IMagery Analysis and Learning,

Northwestern Polytechnical University, Xi’an, China

e-mail: feipingnie@gmail.com

Keywords Active learning ·Uncertainty sampling ·

Diversity maximization

1 Introduction

Typical visual concept recognition methods ﬁrst train a clas-

siﬁer based on the labelled training data via a statistical

approach, and then use the classiﬁer to recognize visual con-

cepts. In real-world applications, it is usually easy to obtain

huge volumes of unlabelled data in an automatic way. How-

ever, a large number of labels are difﬁcult to get, which

require much human labour. Generally speaking, there are

three types of approaches to relieve the tedious work of

labelling the training data. The ﬁrst one is known as semi-

supervised learning, which combines both the labelled and

unlabelled data to train the classiﬁer for recognition (Zhu

2008). The second one is to borrow knowledge from related

domain(s), such as transfer learning (Ma et al. 2014;Shen

et al. 2014) and multi-task learning (Yang et al. 2013). On

the other hand, to make the most use of the scarce human

labelling resources, active learning selects the most informa-

tive data from a candidate set (usually referred to as active

pool) for labelling. Instead of being a passive recipient of

label information, the learning algorithm actively decides

what data are more useful and then asks humans to label

them for training.

As a different but complementary way to reduce the

labelling cost in supervised learning, active learning has

received much research attention. In recent years, researchers

have proposed several active learning algorithms and applied

them to different computer vision applications, e.g., image

classiﬁcation (Jain and Kapoor 2009;Joshi et al. 2009), con-

cept detection (Li et al. 2010), object recognition (Gong et

al. 2014), 3D reconstruction (Kowdle et al. 2011), tracking

123

Int J Comput Vis

(Vondrick and Ramanan 2011), correspondences mapping

(Jegelka et al. 2014), etc. The key issue in active learning is

how to decide whether a sample point is “useful” or “infor-

mative”. For example, in (Chattopadhyay et al. 2012)itis

realized by speciﬁcally selecting a set of query samples that

minimize the difference in distribution between the labelled

and the unlabelled data. In literature, representativeness sam-

pling and uncertainty sampling are the two widely used crite-

ria for selecting the training data to be labelled from the active

pool. An uncertainty sampling active learning algorithm is

usually associated with a classiﬁer, which is used to evaluate

the uncertainty of each data in the active pool. Despite the

substantial progress made in uncertainty sampling, there are

still several aspects to be improved.

First, as discussed in Jain and Kapoor (2009), most of the

exiting research in active learning is based on binary clas-

siﬁcation classiﬁers. While relatively few approaches have

been proposed for multi-class active learning, (e.g., Li et

al. 2004;Yan et al. 2003), many of them are direct exten-

sions of binary active learning methods to the multi-class

scenario. However, many real world applications of visual

concept recognition are multi-class problems. Decomposing

a multi-class problem as several independent binary classiﬁ-

cation subproblems may degrade the performance of active

learning. If we use a series of binary classiﬁers in active

learning as those in Li et al. (2004), Yan et a l . (2003) etc.,

the model is not able to evaluate the uncertainty of a sam-

ple across multiple classes. For example, if a sample is an

uncertain sample for one class while it is a certain sample

for another class, it is tricky for an algorithm to evaluate its

uncertainty. Besides, given that the multiple binary classi-

ﬁers are independent from each other, the algorithm cannot

identify the classes that need more labelled training data (Jain

and Kapoor 2009).

Second, uncertainty sampling algorithms tend to suffer

from the problem of insufﬁcient training data. Active learn-

ing algorithms usually start with a seed set, which contains

only a small number of labelled data. Based on the seed set,

a classiﬁer is trained to evaluate the uncertainty of the can-

didate data in the active pool. The goal of active learning

is to select the data to be labelled for training. Thus, at the

beginning, the number of labelled data is very small, which

is the nature of active learning. Performance of the classi-

ﬁer can be poor due to the small number of labelled data

(Hoi et al. 2008;Yang et al. 2012). Based on SVM active

learning (Tong and Chang 2001), Hoi et al. have proposed a

min-max optimization algorithm to evaluate the informative-

ness of data points (Hoi et al. 2008), in which the unlabelled

data are employed as complementary information. Compared

with SVM active learning (Tong and Chang 2001), the min-

max optimization algorithm is able to select training data in

batch mode and is more robust to over-ﬁtting.Empirical study

shows that the min-max criterion proposed in Hoietal.(2008)

outperforms SVM active learning in Tong and Chang (2001).

Hoi’s algorithm calls QP solver to optimize the objective

function, resulting in high computation complexity of O(n3).

In a later work (Hoi et al. 2009) Hoi et al. have proposed a

solution to speed up the optimization approach, which makes

the algorithm in Hoietal.(2009) more applicable. Hoi’s algo-

rithm (Hoi et al. 2008,2009) has improved the performance

of active learning because it uses all the data in the active

pool to evaluate the importance of each candidate. However,

as the algorithm is based on the binary classiﬁer SVM, it may

become less effective when the data are multi-class.

Motivated by the state of the art of active learning, partic-

ularly the semi-supervised active learning algorithm (Hoi et

al. 2008,2009), we propose a new multi-class active learn-

ing algorithm, namely Uncertainty Sampling with Diversity

Maximization (USDM), which carefully addresses the small

seed set problem by leveraging all the data in the active pool

for uncertainty evaluation. Our algorithm is able to globally

evaluate the informativeness of the pool data across multi-

ple classes. Different from the other multi-class active learn-

ing algorithms e.g., (Jain and Kapoor 2009), our algorithm

exploits all the active pool data to train the classiﬁer, mak-

ing the uncertainty evaluation more accurate. Further, most

of the existing uncertainty sampling algorithms merely con-

sider the uncertainty score for active learning, i.e., they select

the active pool data which are closest to the classiﬁcation

boundaries. However, the data close to classiﬁcation bound-

aries may be very similar to each other. If similar data are

selected for supervision, the performance of active learning

may degrade. In light of this, we propose to select the most

uncertain data, which are as diverse as possible. It means that

the data selected for labelling should be sufﬁciently differ-

ent from each other. Compared to Jain and Kapoor (2009),

USDM simultaneously utilizes both the labelled and unla-

belled data in the active pool. While Hoi’s algorithm (Hoi et

al. 2008,2009) exploits the entire active pool, the classiﬁer

embedded in USDM is more capable of evaluating uncer-

tainty partially because it is a multi-class classiﬁer and par-

tially because it explicitly exploits the manifold structure of

active pool. USDM has many merits, such as batch mode,

multi-class, semi-supervised, efﬁcient, and the diversity of

selected data is explicitly guaranteed.

2 Related Work

In this section, we brieﬂy review the related work. This paper

is closely related to active learning and semi-supervised

learning.

Active learning has been shown effective in many appli-

cations such as 3D reconstruction (Kowdle et al. 2011), and

image retrieval (Wang et al. 2003). Existing active learn-

ing algorithms can be roughly divided into two categories

123

Int J Comput Vis

which are representativeness sampling and uncertainty sam-

pling. As it is important to exploit the data distribution when

selecting the data to be labelled (Cohn et al. 1996), repre-

sentativeness sampling tries to select the most representa-

tive data points according to data distribution. A typical way

of this kind of approach is clustering based active learning,

which employs a certain clustering algorithm to exploit the

data distribution and evaluate representativeness. The perfor-

mance of these algorithms directly depends on the clustering

algorithm. Clustering algorithms are unsupervised and only

converge to local optima, whose results may deviate severely

from the true labels. It remains unclear how the clustering

based algorithms will perform when the clustering is not

sufﬁciently accurate. The other well-known approach of rep-

resentativeness sampling is optimal experiment design (Yu

et al. 2006). Based on optimal experiment design, a variety

of active learning algorithms have been proposed, in which

different Laplacian matrices have been utilized, e.g., (He et

al. 2007). A limitation of optimal experiment design is that

the optimization of the objective function is usually NP-hard.

Certain relaxation is required. Then semi-deﬁnite program-

ming (SDP) or sequential method (usually a greedy method)

is applied to the optimization. However, SDP has high com-

putational complexity and the greedy methods may converge

to severe local optima.

Uncertainty sampling, also known as classiﬁer based sam-

pling (Campbell et al. 2000;Li and Sethi 2006), is the most

frequently adopted strategy in active learning, which builds

upon the notions of uncertainty in classiﬁcation (Jain and

Kapoor 2009). This type of algorithm is usually associ-

ated with a particular classiﬁcation algorithm. A classiﬁer is

trained by a seed set consisting of a small number of randomly

selected data. Data points in the active pool, which are most

likely to be misclassiﬁed by the classiﬁer, are regarded as the

most informative ones. For example, support vector machine

(SVM) active learning (Tong and Chang 2001) selects the

data points which are closest to the classiﬁcation boundary of

the SVM classiﬁer as the training data. In Wang et al. (2003),

the transductive SVM classiﬁer is used for active learning.

Hoi et al have proposed to integrate semi-supervised learn-

ing and support vector machines for active learning and have

achieved promising results on image retrieval (Hoi and Lyu

2005). In Brinker (2003), diversity constraint is combined

with SVM for active learning. The uncertainty sampling strat-

egy has also been combined with other classiﬁers, such as the

Gaussian process (Kapoor et al. 2010), the K-nearest neigh-

bor classiﬁer (Lindenbaum et al. 2004) and the probabilistic

K-nearest neighbor classiﬁer (Jain and Kapoor 2009).

Semi-supervised learning has been widely applied to

many applications with the appealing feature that it can

use both labelled and unlabelled data (Yang et al. 2012;

Zhu 2008). For instance, Zhu et al have proposed to uti-

lize a Gaussian random ﬁeld model with a weighted graph

representing labelled and unlabelled data for semi-supervised

learning (Zhu et al. 2003). Han et al. have proposed to use

spline regression for semi-supervised feature selection (Han

et al. 2014). In Hoi et al. (2008), the researchers have for-

mulated the semi-supervised active learning algorithm as a

min-max optimization problem for image retrieval. A semi-

supervised learning based relevance feedback algorithm is

proposed in Yang et al. (2012) for multimedia retrieval. The

beneﬁt of utilizing semi-supervised learning is that we can

save human labor cost for labelling a large amount of data

because it can exploit unlabelled data to learn the data struc-

ture. Thus, the human labelling cost and accuracy are both

considered, which gives semi-supervised learning a great

potential to boost the learning performance when properly

designed.

The rest of this paper is organized as follows. In Sect. 3,we

give the objective function of the proposed USDM algorithm.

An efﬁcient algorithm is described in Sect. 4to optimize

the objective function, followed by detailed experiments in

Sect. 5. Lastly, we conclude this paper in Sect. 6.

3 Uncertainty Sampling with Diversity Maximization

In this section, we give the proposed USDM active learning

algorithm. We start with discussing the approach for evalu-

ating uncertainty of each sample. nis the total number of the

data in the seed set and the active pool. Suppose there are ns

data in the seed set and npdata in the active pool. It turns out

that ns+np=n. We are going to select m(m<np) data

for supervision. Denote xi∈Rdas a sample which is either

in the active pool or the seed set, where dis the dimension of

the sample. To better utilize the distribution of the pool data

and the seed set, we propose to evaluate the uncertainty via

random walks on a graph (Zhu 2008). To begin with, we ﬁrst

construct a graph G, which consists of nnodes, one for each

sample in the active pool or the seed set. The edge between

the two nodes xiand xjis deﬁned as follows.

Wij =exp −xi−xj2

σ2xiand xjare k-nearest neighbors;

0otherwise.

(1)

Note that one can also deﬁne the unweighted edge between

xiand xjas:

Wij =1xiand xjare k-nearest neighbors;

0 otherwise. (2)

We take each vertex in the graph as a state in a Markov

chain, i.e., each state corresponds to one sample in the seed

set or the active pool. For the ease of representation, we deﬁne

a diagonal matrix D∈Rn×nwhose element Dii =jWij.

Denote

123

Int J Comput Vis

Q=D−1W,(3)

which is partitioned into 2 ×2 blocks:

Q=Qss QT

ps

QT

sp Qpp ,(4)

where Qss ∈Rns×nsdenotes the normalized weight between

the data in the seed set, Qsp ∈Rns×npdenotes the weight

between the data from the seed set and the active pool; and

Qpp ∈Rnp×npdenotes the normalized weight between the

data in the active pool. For the data in the seed set, we set the

corresponding states as absorbing states, which only transit

to themselves with the possibility of 1. If a sample xiis in the

active pool, it is a non-absorbing state. The one step transition

probability from xito xjis Tij =Wij

jWij . Then the transition

matrix Tof the Markov random walks with absorbing states

is deﬁned as

T=Ins0np

QT

sp Qpp ,(5)

in which Ins∈Rns×nsis an identity matrix, and 0np∈

Rns×npis a matrix of all zeros. We use the calligraphy upper-

case letters to represent set. Denote Pas the active pool and

Sas the seed set. As demonstrated in Doyle and Shell (1984),

the probabilities that the pool data are absorbed by the seed

set data in equilibrium with transition matrix Tis

P(S|P)=(Inp−Qpp)−1QT

sp,(6)

where Inp∈Rnp×npis an identity matrix. Deﬁne Yj=

[Y1j,Y2j,...,Ynsj]T∈{0,1}ns×1as the label indicator vec-

tor of the seed set for the j-th class. If xi∈Sbelongs to the

j-th class, Yij =1; otherwise Yij =0. Given a pool sample

xt∈P, we deﬁne the probability that xtis absorbed by the

j-th class as the sum of the probabilities that it is absorbed

by all the seed set data which are from the j-th class Cj.The

probabilities that the pool data are absorbed by the seed set

data belonging to the j-th class can be formulated as

P(Cj|P)=(Inp−Qpp)−1QT

spYj.(7)

The above procedure can be interpreted as a Dirichlet

problem to get harmonic functions (Zhu 2008;Zhu et al.

2003). We deﬁne F∈Rn×cas follows

Fij=(Inp−Qpp )−1QT

spYjif xi∈P;

Yij if xi∈S.(8)

where cis the number of classes. It can be veriﬁed that

c

j=1Fij =1. Fij is regarded as the probability that xi

belongs to the j-th class, i.e., P(Cj|xi)=Fij. For a sample

xi∈P, we assume that its label can be estimated by a ran-

dom variable i. As Shannon Entroy is a natural choice to

measure the uncertainty of random variables, we adopt the

entropy H(i)to evaluate the uncertainty of xi, which can be

estimated by

H(i)=−

c

j

P(Cj|xi)log P(Cj|xi),(9)

where log(·)is the natural logarithm operator. A larger H(i)

indicates that xiis more uncertain. Denote fias the ranking

score of xi. The pool data with higher ranking scores are

selected before the others for supervision. According to the

uncertainty principle, we have the following objective func-

tion

max

ifi=1,fi≥0

xi∈P

−fi×

c

j

P(Cj|xi)log P(Cj|xi)

−Ω( fi)(10)

which can be rewritten as

max

ifi=1,fi≥0

xi∈P

−⎡

⎣fi×

c

j

Fijlog(Fij)⎤

⎦−Ω( fi), (11)

In (10), the term Ω( fi)is a function on fencoding the data

distribution information, in other words, the diversity crite-

rion in decision making. The constraint n

i=1fi=1inthe

above function is imposed to avoid arbitrary scaling on fi.

Denote

bi=c

jFijlog(Fij)if xi∈P;

0ifxi∈S.(12)

Then we can rewrite (11)as

min

fin

i=1

1

|log(1/c)|(fi×bi)+Ω( fi),

s.t.n

i=1fi=1,fi≥0.(13)

In (13), the ﬁrst term n

i=1

1

|log(1/c)|(fi×bi)is used to

evaluate the uncertainty of the pool data. biis dependent on

Fij,j∈{1,...,c}. Recall that c

j=1Fij =1, which means

that Fij is the probability such that xiis in the j-th class.

Thus the algorithm is able to estimate the uncertainty across

multiple classes. If the uncertainty is measured by a binary

classiﬁer, the algorithm turns to a binary class active learn-

ing algorithm. In this sense, one main difference between our

algorithm and the S-SVM active learning (Hoi et al. 2008)is

that our algorithm is more capable of estimating the uncer-

tainty of the data in an active pool, where the manifold struc-

ture is uncovered.

Based on (13), the problem is then how to deﬁne Ω( fi)to

incorporate the diversity maximization criterion. We propose

a simple yet effective way of computing a kernel matrix K∈

Rn×n. For example, if we use the well-known RBF kernel, the

i,j-th element of Kcan be computed by Kij =−xi−xj2

σ2,

where σis a parameter. Given two data xiand xj,iftheyare

123

Int J Comput Vis

similar to each other, Kij will have a large value. In this case,

we shall not have the two data to be labelled simultaneously.

In other words, if xiis selected as a training sample, xjshould

be excluded in some sense. Therefore, given that Kij has a

large value, at least one of fiand fjshould have a small

value. We then propose to minimize the following objective

function to make the selected data as diverse as possible.

min fiΩ( fi)=min fin

i=1n

j=1fifjKij.(14)

We can see that if Kij,fiand fjare all large values, it will

incur a heavy penalty on (14). Minimizing (14) makes the

selected training data different from each other. Combining

uncertainty criterion and diversity criterion, we have the fol-

lowing objective function for active learning.

min

fin

i=1r

|log(1/c)|(fi×bi)+n

j=1fifjKij

s.t.n

i=1fi=1,fi≥0,(15)

where ris a parameter. The objective function shown in (15)

can also be viewed as a regularization framework for active

learning, in which we use diversity constraint as a regularizer

added to the traditional uncertainty sampling. The diversity

regularization term is crucial because some of the uncertain

data are potentially similar to each other. It is worth mention-

ing that the batch mode active learning task is a combinational

optimization problem, as discussed in Hoietal.(2009). The

solution to (15) does not necessarily give the exact optimal

solution to the batch mode active learning problem, where the

goal is to exactly select an optimal set of the most informative

examples. However, (15) approximates the optimal solution

of the batch mode active learning in an efﬁcient and effective

way (Hoi et al. 2009). If we take a closer look at (15), it can be

seen that this objective function is inspired by the algorithm

proposed in Hoi et al. (2008,2009). The major difference is

the way how the algorithm does uncertainty estimation.

4 Efﬁcient Optimization

In this section, we optimize the objective function of USDM.

Let f=[f1,f2,..., fn]T. For the ease of representation,

we deﬁne a=[a1,a2,...,an]T, where ai=r×bi

|log(1/c)|and r

is a parameter. (15) can be rewritten as

min ffTa+1

2fTKf

s.t.n

i=1fi=1,fi≥0.(16)

The objective function shown in (16) is a standard

quadratic programming (QP) problem, which can be readily

solved by existing convex optimization packages. However,

typical QP solver has a high computational complexity of

O(n3). It is more practical to make the optimization faster.

In this section, we propose to use a faster algorithm to opti-

mize the objective function (16), based on the augmented

Lagrange multiplier (ALM) framework (Bertsekas 1999).

4.1 Brief Review of ALM

The ALM algorithm in Bertsekas (1999) is introduced to

solve the following constrained minimization problem.

min g(Z), s.t.h(Z)=0,(17)

where g:Rd→Rand h:Rd→Rd. A typical way to

deﬁne the augmented Lagrangian function of (17)is

L(Z,U,μ) =g(Z)+U,h(Z)+ μ

2h(Z)2

F,(18)

where Zis the optimization variable, Uis Lagrangian coef-

ﬁcient and μis a scalar. The following procedure can be

applied to optimizing the problem shown in (17).

Algorithm 1: General ALM method (Bertsekas 1999).

1Set ρ>1, t=1, U1=0, μ1>0;

2repeat

3ˆ

Z=arg minZL(Z,Ut,μ

t);

4Ut+1=Ut+μth(ˆ

Z);

5μt+1=ρμt

6t=t+1;

7until Convergence;

8Output ˆ

Z.

4.2 Efﬁcient Optimization of USDM

In this subsection, we introduce a fast optimization approach

of our algorithm under the ALM framework (Bertsekas 1999;

Delbos and Gilbert 2005). First we rewrite (16)asfollows.

min f,v fTa+1

2fTKf

s.t.fT1n=1,v ≥0,f=v, (19)

where 1n∈Rnis a vector of all ones. The augmented

Lagrangian function of (19) is deﬁned as

L(f,v,μ,λ

1,λ

2)=μ

2fT1n−1+1

μλ12

+μ

2

f−v+1

μλ2

2

F

+fTa+1

2fTKf

s.t.v ≥0(20)

Note that

min

fL(f,v,μ,λ

1,λ

2)⇔min

f

1

2fTAf −fTe,(21)

123

Int J Comput Vis

Algorithm 2: USDM active learning algorithm.

1Initialization: set ρ>1, fi=1

n(1≤i≤n),v=f,λ1=0, and

λ2∈Rnis a vector of all zeros, μ>0;

2repeat

3Update Aby A=K+μIn+μ1n1T

n;

4Update eby e=μv +μ1n−λ11n−λ2−a;

5Compute ˆ

fby solving the linear system Aˆ

f=e;

6Compute vby v=pos(ˆ

f+1

μλ2);

7Update λ1by λ1=λ1+μ×(n

i=1ˆ

fi−1);

8Update λ2by λ2=λ2+μ×(ˆ

f−v);

9μ=ρμ;

10 until Convergence;

11 Output ˆ

f.

where

A=K+μIn+μ1n1T

n(22)

and

e=μv +μ1n−λ11n−λ2−a.(23)

The objective function shown in (21) can be easily optimized

by solving a linear system and we have

ˆ

f=arg min

fL(f,v,μ,λ

1,λ

2)=A−1e.(24)

Meanwhile,

min

v≥0L(f,v,μ,λ

1,λ

2)

⇔min

v≥0

v−(f+1

μλ2)

2

(25)

By solving the optimization problem shown above, we have

ˆv=arg min

v≥0L(f,v,μ,λ

1,λ

2)=pos(q), (26)

where q=f+1

μλ2and pos(q)is a function which assigns

0 to each negative element of q, i.e., for any element qi∈q,

pos(qi)=max(qi,0).

In summary, the proposed USDM active learning algo-

rithm is listed in Algorithm 2 as shown above. It can be

veriﬁed that Algorithm 2 converges to the global optimum.

Except for step 5, the computation of all the steps in Algo-

rithm 2is very fast. It is worth noting that when computing

ˆ

fin step 5, we only need to solve a linear system. No matrix

inversion is involved. Note that there are a few efﬁcient linear

system solvers that can be readily used. We may also use a

faster algorithm to solve or approximate the linear system,

e.g., (Spielman and Teng 2004). Because it is out of the scope

of this paper, we omit the detailed discussion here.

Table 1 Dataset description

Name Size # of class Application Data type

KTH 2,387 6 Action

recognition

Vid eo

Youtube 1,596 11 Action

recognition

Vid eo

Coil 1,440 20 Object

classiﬁcation

Image

Scene15 4,485 15 Scene

recognition

Image

MED 2,874 18 Video event

detection

Vid eo

5 Experiment

In this section, we test the proposed active learning algorithm

by applying it to a variety of visual concept recognition appli-

cations, including action recognition, object classiﬁcation,

scene recognition, and video event detection.

5.1 Experiment Setup

Five different public datasets are used in the experiment,

which are KTH (Schüldt et al. 2004), Youtube (Liu et al.

2009), Coil (Nene et al. 1996), Scene15 (Lazebnik et al.

2006) and MED dataset collected by National Institute of

Standards and Technology (NIST).1Table 1summarizes the

detailed information of the datasets used in the experiment.

We compare our algorithm to both representative sam-

pling active learning and uncertainty sampling active learn-

ing algorithms in the experiment. The comparison algorithms

include SVM active learning (SVMactive) proposed in Tong

and Chang (2001), Semi-supervised SVM active learn-

ing (S-SVM) proposed in Hoi et al. (2008), Laplacian

regularized optimal experiment design (LOED) proposed

in He et al. (2007) and the multi-class active learning

algorithm pKNN proposed in Jain and Kapoor (2009).

The kused in our algorithm for graph construction is

set to 5 empirically. For the parameters involved in dif-

ferent active learning algorithms, we similarly tune them

from {10−4,10−3,10−2,10−1,100,101,102,103,104}and

report the best results.

Each dataset is randomly split into two non-overlapping

subsets, one as training candidate set and the other as testing

data set. In our experiment we ﬁx the size of training candi-

date set as 1,000 for all the datasets. Denote cas the number of

classes. First, we randomly select 3 positive samples for each

class from the training candidate set, i.e., the size of the seed

set is 3 ×c. The remaining data in the training candidate set

1http://www.nist.gov/itl/iad/mig/

123

Int J Comput Vis

are regarded as the active pool. Then we run each active learn-

ing algorithm to select training data. We set the batch size as

1×c,2×c,…, 7 ×c, respectively. During the training we

use both the data selected by the active learning and the data

in the seed set as labelled samples. Therefore, there are 4×c,

5×c,…, 10 ×clabelled data for training. In our experiment,

after the training data are selected by different active learning

algorithms, a classiﬁer is trained for visual concept recogni-

tion. The SVM classiﬁer is used for SVMactive, S-SVM and

LOED. For pKNN (Jain and Kapoor 2009), we use the clas-

siﬁer embedded in the algorithm (Jain and Kapoor 2009).

For our USDM algorithm, we use the random walk classiﬁer

illustrated in Sect. 2to generate the soft label representation

for classiﬁcation. As the random walk classiﬁer is a trans-

ductive algorithm, after the pool data have been selected, we

re-construct a larger graph including all data for testing data

classiﬁcation. Note that class labels of selected data could be

imbalanced. Based on the soft label representation derived

by the random walk, binary class labels for each class are

then determined by an SVM.

5.2 Action Recognition

We use the KTH action dataset (Schüldt et al. 2004) and the

Youtube action dataset (Liu et al. 2009) to compare the per-

formance of different active learning algorithms in terms of

action recognition. In this experiment, each video sequence is

represented by 1,000 dimension BoW STIP feature (Laptev

et al. 2008). KTH action dataset contains six types of human

actions (walking, jogging, running, boxing, hand waving and

hand clapping) performed several times by 25 subjects in four

different scenarios: outdoors, outdoors with scale variation,

outdoors with different clothes and indoors (Schüldt et al.

2004). There are 2,387 action sequences in this dataset. The

Youtube action dataset is a real-world dataset which was col-

lected from Youtube. It contains intended camera motion,

variations of the object scale, viewpoint, illumination as well

as cluttered background. There are 11 actions in this data set

which are basketball shooting, biking/cycling, diving, golf

swinging, horseback riding, soccer juggling, swinging, ten-

nis swinging, trampoline jumping, volleyball spiking, and

walking with a dog (Liu et al. 2009). Four videos in Youtube

action dataset are too short to be captured by the feature

extracting code shared by Laptev et al. (2008), so we use a

dataset of 1,596 sequences.

Figure 1compares the performance of different active

learning algorithms for action recognition using KTH dataset.

We observe that LOED outperforms SVMactive. Meanwhile,

S-SVM and pKNN signiﬁcantly improve the accuracy, com-

pared with SVMactiveand LOED. A possible explanation

is that SVMactiveis vulnerable to over-ﬁtting, because it

does not consider the data distribution of the active pool dur-

ing the learning process. S-SVM uses the unlabelled data

6 12 18 24 30 36 42

60

62

64

66

68

70

72

74

76

78

80

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 1 A comparison of different active learning algorithms on action

recognition using KTH dataset. There are 6 different actions in this

dataset

11 22 33 44 55 66 77

25

30

35

40

45

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 2 A comparison of different active learning algorithms on action

recognition using Youtube dataset. There are 11 different actions in this

dataset

and pKNN evaluates the uncertainty across multiple classes,

and therefore more information is used in them. Our algo-

rithm dramatically outperforms all the competitors at all

batch sizes. Figure 2shows the experimental results on action

recognition using Youtube dataset. The video sequences in

Youtube dataset were downloaded from Youtube. It is much

more noisy than the lab-generated KTH dataset. Yet, we

observe that our algorithm consistently outperforms other

active learning algorithms. The experimental results demon-

strate the advantages of our algorithm.

123

Int J Comput Vis

20 40 60 80 100 120 140

70

75

80

85

90

95

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 3 A comparison of different active learning algorithms on object

classiﬁcation. There are 20 different objects in this dataset

5.3 Object Classiﬁcation

Figure 3shows the experimental results of objective recog-

nition on Coil dataset, which consists of 1,440 grey scale

images (Nene et al. 1996). There are 20 different objects in

total. Each image was resized to 32 ×32.Weusethegrey

values as the features of the images, with dimension of 1024.

Both pKNN and SVMactivetrain classiﬁers only based on the

seed set for uncertainty evaluation. We can see from Fig. 3

that pKNN generally outperforms SVMactive, indicating that

it is beneﬁcial to evaluate the uncertainty of data across multi-

ple classes. S-SVM gains the second best performance due to

the exploration on the data distribution of the active pool. Our

algorithm achieves the best performance. Compared with the

second best algorithm S-SVM, our algorithm has two main

advantages. First, the random walk algorithm has better capa-

bility of uncovering the manifold structure (Tenenbaum et al.

2000) of the entire active pool to evaluate uncertainty of the

pool data. Although S-SVM also takes the distribution of the

pool data into consideration, the manifold structure is missed

when training the SVM classiﬁer for uncertainty evaluation.

Thus, it somehow suffers from the small size of the training

data, especially when the data has manifold distribution. Sec-

ond, our algorithm is a multi-class active learning algorithm,

which is able to evaluate the “informativeness” of the pool

data globally.

5.4 Scene Recognition

To test the performance of USDM in scene recognition, we

use the Scene15 dataset, which contains 4,485 images from

15 different scenes (Lazebnik et al. 2006). In this experi-

ment, we extract HOG feature to represent the images and

the dimension of the feature vectors is 6300.

15 30 45 60 75 90 105

40

42

44

46

48

50

52

54

56

58

60

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 4 A comparison of different active learning algorithms on scene

recognition. There are 15 different scenes in this dataset

Figure 4shows the experimental results. Both pKNN

and SVMactiveonly use the seed set to train the classi-

ﬁers for uncertainty evaluation. pKNN generally outperforms

SVMactive, which indicates that multi-class active learning

(e.g., pKNN) is a more powerful approach. If we take the pool

data into consideration for training data selection, the perfor-

mance will be further improved. As shown in Fig. 4,S-SVM

outperforms SVMactiveat all batch sizes, but not as signiﬁ-

cantly as other applications. We observe that our algorithm

outperforms all the other algorithms. Given that S-SVM gains

good performance for this dataset and our USDM still out-

performs S-SVM, we conclude that it is better to leverage

the manifold structure of the active pool and the seed set to

evaluate uncertainty for active learning.

5.5 Complex Event Detection

In this subsection, we compare the different active learn-

ing algorithms on complex event detection (Ma et al. 2014;

Yang et al. 2013). We merge the MED10 dataset and the

MED11 dataset into one, which is referred to as MED in this

paper. In the experiment, we use the MoSift (Chen and Haupt-

mann 2009) descriptor, based on which a 32,768 dimension

spatial BoW feature is computed to represent each video

sequence (Yang et al. 2013). Principal component analysis is

performed to remove the null space.

Figure 5shows the keyframes from a video which is

“changing a vehicle tire”. We can see that the MED dataset is

rather “wild”. The problem is more difﬁcult, compared to the

other datasets. In the experiment, we have used all the videos

which are labelled as one of the 18 events. The number of

positive samples for each event varies from 80 to 170 and

there are 2,874 positive samples for the 18 events in total.

123

Int J Comput Vis

Fig. 5 An example video

sequence of “changing a vehicle

tire” event from the MED

dataset

18 36 54 72 90 108 126

18

20

22

24

26

28

30

32

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 6 A comparison of different active learning algorithms on com-

plex event detection. There are 18 different events in this dataset

Figure 6shows the experimental results on complex event

detection. We can see that our algorithm consistently outper-

forms all the competitors. As shown in the ﬁgure, the advan-

tage of our algorithm over other algorithms is quite visible.

As the batch size grows, the performance of pKNN and S-

SVM improves but is still worse than our algorithm. This

experiment demonstrates that the algorithm proposed in this

paper is more robust in dealing with “wild” data, compared

to the state of the art.

5.6 Performance Comparison Using Different Seed Size

In this subsection, we examine the impact of the initial train-

ing size given that it usually plays a key role in the semi-

supervised learning tasks. We perform this experiment by

varying the seed size from 1 ×cand 5 ×c.AsLOEDisan

unsupervised method that is irrelevant to the labelled seed

set, we leave it out in this experiment. Figures 7,8,9,10 and

11 show the experimental results on different datasets.

The experimental results in the ﬁgures (i.e., those in Fig.

7,8,9,10 and 11) and the results when seed size is 3 ×c

demonstrate that for different seed sizes, our method consis-

tently yields compelling performance, validating its efﬁcacy

in selecting the most informative data for a variety of vision

tasks. Meanwhile, we notice S-SVM also obtains good per-

formance, which further indicates that leveraging the unla-

belled pool data does help improve the active learning per-

formance.

5.7 Performance Comparison on the Pool Data

From this subsection on, taking the Youtube action dataset as

an example, we report the experimental results to test more

characteristics of the proposed algorithm. These experiments

include (1) classiﬁcation accuracy of the active pool data; (2)

classiﬁcation accuracy when different classiﬁers are used; (3)

classiﬁcation accuracy when a different feature is used; (4)

classiﬁcation accuracy when the unweighted graph is used.

First, we evaluate the classiﬁcation accuracy of different

active learning algorithms on the pool data. To this end, we

exclude the seed data and the selected batch data whereas

treat the remaining pool data as the testing data. Since LOED

is unsupervised, we leave it out in the comparison. Figure 12

displays the experimental results. We can see that our method

consistently gains the top performance, whereas S-SVM and

pKNN obtain good performance as well. This observation is

consistent with the results when the testing data are outside

the active pool, again demonstrating the effectiveness of the

proposed USDM algorithm.

5.8 Performance Comparison Using a Different Feature

In this subsection, we compare the performance of the active

learning algorithms when a different feature is used. In the

123

Int J Comput Vis

6 12 18 24 30 36 42

45

50

55

60

65

70

75

80

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(a) seed size: c

6 12 18 24 30 36 42

60

65

70

75

80

85

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(b) seed size: 5×c

Fig. 7 Performance comparison on KTH dataset w.r.t. different seed size. Our method is consistently competitive

11 22 33 44 55 66 77

20

25

30

35

40

45

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(a) seed size: c

11 22 33 44 55 66 77

32

34

36

38

40

42

44

46

48

50

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(b) seed size: 5×c

Fig. 8 Performance comparison on Youtube dataset w.r.t. different seed size. Our method is consistently competitive

20 40 60 80 100 120 140

65

70

75

80

85

90

95

Batch Size

Average Accuracy (%)

USDM

SVM

active

S−SVM

pKNN

(a) seed size: c

20 40 60 80 100 120 140

80

82

84

86

88

90

92

94

96

98

100

Batch Size

Average Accuracy (%)

USDM

SVM

active

S−SVM

pKNN

(b) seed size 5×c

Fig. 9 Performance comparison on Coil dataset w.r.t. different seed size. Our method is consistently competitive

123

Int J Comput Vis

15 30 45 60 75 90 105

25

30

35

40

45

50

55

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(a) seed size: c

15 30 45 60 75 90 105

40

42

44

46

48

50

52

54

56

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(b) seed size: 5×c

Fig. 10 Performance comparison on Scene15 dataset w.r.t. different seed size. Our method is consistently competitive

18 36 54 72 90 108 126

16

18

20

22

24

26

28

30

32

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(a) seed size: c

18 36 54 72 90 108 126

22

24

26

28

30

32

34

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

(b) seed size: 5×c

Fig. 11 Performance comparison on MED dataset w.r.t. different seed size. Our method is consistently competitive

previous experiments, we have used the STIP feature for

action recognition. In this experiment, we use the MoSIFT

feature (Chen and Hauptmann 2009) for action recognition.

Figure 13 shows the experimental result, where a 1,000

dimension BoW MoSIFT feature is used to represent the

videos in the Youtube dataset.

Comparing Figs. 2and 13, we can see that the MoSIFT

feature performs better than STIP feature on Youtube dataset.

Similarly, our algorithm outperforms the other compared

algorithms dramatically. In particular, when 11, 22, and 33

data are selected, our algorithm outperforms the second best

algorithm by about 10 %, relatively. This experiment demon-

strates that when a better feature is used, the performance of

an active learning algorithm usually improves. Nevertheless,

our algorithm outperforms the other competitors consistently

using a different feature.

5.9 Performance Comparison Using Different Classiﬁers

The function of active learning algorithms is to select the

most informative data for supervision and then use these

labelled data as input of a speciﬁc classiﬁcation algorithm

to train a classiﬁer for recognition. It turns out an interesting

question how the active learning algorithms will perform if

we use a different classiﬁer. In this subsection, we again use

Youtube dataset as a showcase to compare different active

learning algorithms using some other classiﬁers.

We ﬁrst use the Least Square Regression (LSR) as the clas-

siﬁer for action recognition. This time, LSR is used for all the

active learning algorithms, including our USDM, SVMactive,

S-SVM, LOED and pKNN. Each active learning algorithm

is ﬁrst performed to select the training data, based on which

a LSR classiﬁer is trained for action recognition. Figures 14

123

Int J Comput Vis

11 22 33 44 55 66 77

30

32

34

36

38

40

42

44

46

48

50

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

pKNN

Fig. 12 A comparison of different active learning algorithms on clas-

sifying the pool data. The results are based on Youtube dataset

11 22 33 44 55 66 77

26

28

30

32

34

36

38

40

42

44

46

48

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 13 A comparison of different active learning algorithms on action

recognition using the MoSIFT feature. The results are based on the

Youtube dataset

and 15 show the experimental results when STIP feature and

MoSIFT feature are used, respectively. We can see from the

two ﬁgures that the proposed algorithm USDM outperforms

all the other algorithms at all batch sizes, when LSR is used

as the classiﬁer. This experiment further demonstrates that

our algorithm is more effective than other active learning

algorithms when a different classiﬁer is used.

Next, we additionally use KNN as the classiﬁer to com-

pare the performance of different active learning algorithms

on Youtube dataset. In this experiment, we use the same set-

ting as LSR. Figures 16 and 17 show the experiment results

when STIP feature and MoSIFT feature are used, respec-

tively. For both features, when using KNN as the classiﬁer,

our algorithm dramatically outperforms the competitors. The

11 22 33 44 55 66 77

30

32

34

36

38

40

42

44

46

48

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 14 A comparison of different active learning algorithms on

Youtube dataset using STIP feature. In this experiment, least squares

regression (LSR) is used as the classiﬁer for action recognition

11 22 33 44 55 66 77

25

30

35

40

45

50

55

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 15 A comparison of different active learning algorithms on

Youtube dataset using MoSIFT feature. In this experiment, LSR is used

as the classiﬁer for action recognition

visual concept recognition accuracy generally relies on three

factors. The ﬁrst one is the feature; the second one is the

classiﬁer and the third one is the data selected for supervi-

sion. We observe in the experiment that our USDM algorithm

consistently outperforms the other methods when a different

feature and/or a different classiﬁer are used.

The experiment result of adopting different classiﬁers

reported in this section also shows that the performance of

our algorithm is still better than other compared algorithms

when a different classiﬁer is used. We observe similar per-

formance on all datasets if an SVM classiﬁer is directly

trained instead of reconstructing a larger graph for ran-

dom walk. Thus in real world applications one may directly

train an inductive classiﬁer, e.g., SVM, based on the data

123

Int J Comput Vis

11 22 33 44 55 66 77

16

18

20

22

24

26

28

30

32

34

36

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 16 A comparison of different active learning algorithms on

Youtube dataset using STIP feature. In this experiment, KNN is used

as the classiﬁer for action recognition

11 22 33 44 55 66 77

12

14

16

18

20

22

24

26

28

30

32

34

Batch Size

Average Accuracy (%)

USDM

SVMactive

S−SVM

LOED

pKNN

Fig. 17 A comparison of different active learning algorithms on

Youtube dataset using MoSIFT feature. In this experiment, KNN is

used as the classiﬁer for action recognition

selected by the USDM algorithm to reduce the computation

cost.

5.10 Performance Variation Using an Unweighted Graph

In the previous experiment, we use the weighted graph

deﬁned in (1) for the random walks. In the following exper-

iment we compare the weighted graph with different para-

meter σand the unweighted graph deﬁned in (2). Again, we

use the Youtube dataset to showcase with a randomly gener-

ated seed set of 33 videos. Figure 18 shows the experimental

results.

11 22 33 44 55 66 77

30

32

34

36

38

40

42

44

46

48

50

Batch Size

Average Accuracy (%)

sigma=1e−6

sigma=1e−4

sigma=0.01

sigma=1

sigma=100

sigma=1e4

sigma=1e6

unweighted

Fig. 18 Performance comparison between unweighted graph and

weighted graph with different σon Youtube dataset

It can be seen that if σis appropriately chosen, the

weighted graph usually gives us better performance. In other

words, better performance can be expected when σis opti-

mal. In this experiment, we can see that the performance is

usually better when σis small. However, the optimal σis

data dependent, and can be determined by cross validation

or experiments.

5.11 Computational Efﬁciency Comparison

Finally, taking MED dataset as an example, we compare

the computational efﬁciency of the supervised and semi-

supervised active learning algorithms. The computation time

of the semi-supervised active learning algorithms mainly

depends on the size of the active pool. In this experiment,

the active pool size varies from 250 to 1,250, with an interval

of 250. All experiments are implemented by Matlab R2011a,

which is installed on a machine with 24 cores2and 64.0GB

RAM.

Figure 19 shows the average time elapsed to select 18

data for labelling, i.e., the batch size is 1 ×c. Note that

only S-SVM, and our USDM exploit data distribution while

SVMactiveand pKNN merely utilize the seed set for active

learning. Thus the size of active pool does not affect speed

much for pKNN and SVMactive. Although SVMactiveand

pKNN are faster, their performance is worse in the previ-

ous experiments than the semi-supervised active learning

algorithms S-SVM and USDM. In our experiments, as a

semi-supervised active learning algorithm, S-SVM generally

achieves the second best performance in accuracy. As shown

in Fig. 19, our algorithm outperforms S-SVM dramatically in

2Intel(R)Xeon Processor, 24 cores

123

Int J Comput Vis

250 500 750 1000 1250

0

2

4

6

8

10

12

14

Pool Size

Time (seconds)

USDM

SVMactive

S−SVM

pKNN

Fig. 19 Running time of different active learning algorithms w.r.t . dif-

ferent pool sizes. The result shown in this ﬁgure is the elapsed time

(seconds) of selecting pool data for training

efﬁciency. If we increase the pool size, the efﬁciency advan-

tage of our algorithm over S-SVM will become more visible.

6 Conclusion

Generally speaking, there are three important factors in visual

concept recognition, which are the features, the classiﬁers

and the data selected for supervision. In this paper, we have

proposed a new active learning algorithm USDM for visual

concept recognition. To address the problem of small seed

set size in uncertainty sampling, we proposed to exploit the

distribution of all the data in the active pool and the seed

set. Considering that the uncertain data in the active pool are

potentially similar to each other, we proposed to make the

selection as diverse as possible. USDM is able to evaluate

the “informativeness” of a sample across multiple classes,

making the selection more accurate. An efﬁcient algorithm

was used to optimize the objective function of USDM. Exten-

sive experiments on a variety of applications with different

classiﬁers and features demonstrate that USDM dramatically

outperforms the state of the art. We have observed that even

if the size of the seed and pool sets is the same, the classiﬁca-

tion performance would be different when the the seed and

pool sets are different. In our future research, we will study

to optimally initialize the seed set and active pool.

Acknowledgments This paper was partially supported by the US

Department of Defense the U. S. Army Research Ofﬁce (W911NF-13-1-

0277), partially supported by the ARC DECRA project DE130101311,

and partially supported by the Tianjin Key Laboratory of Cognitive

Computing and Application.

References

Bertsekas, D. (1999). Nonlinear programming (2nd ed.). Belmont, MA:

Athena Scientiﬁc.

Brinker, K. (2003). Incorporating diversity in active learning with sup-

port vector machines. In International conference on machine

learning.

Campbell, C., Cristianini, N., & Smola, A. J. (2000). Query learning

with large margin classiﬁers. In ICML.

Chattopadhyay, R., Wang, Z., Fan, W., Davidson, I., Panchanathan, S.,

& Ye, J. (2012). Batch mode active sampling based on marginal

probability distribution matching. In KDD (pp. 741–749).

Chen, M., & Hauptmann, A. (2009). Mosift: Recognizing human

actions in surveillance videos. In Technical Report CMU-CS-09-

161.

Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning

with statistical models. Journal of Artiﬁcial Intelligence Research

(JAIR),4, 129–145.

Delbos, F., & Gilbert, J. (2005). Global linear convergence of an aug-

mented lagrangian algorithm to solve convex quadratic optimiza-

tion problems. Journal of Convex Analysis,12(1), 45–69.

Doyle, P. G., & Shell, J. (1984). Random walks and electric networks.

Washington, DC: Mathematical Association of America.

Gong, B., Grauman, K., & Sha, F. (2014). Learning kernelsfor unsuper-

vised domain adaptation with applications to visual object recog-

nition. International Journal of Computer Vision,109(1–2), 3–27.

Han, Y., Yang, Y., Yan, Y., Ma, Z., Sebe, N., & Zhou, X. (2014). Semi-

supervised feature selection via spline regression for video seman-

tic recognition. IEEE Transactionson Neural Networks and Learn-

ing Systems. doi:10.1109/TNNLS.2014.2314123.

He, X., Min, W., Cai, D., & Zhou, K. (2007). Laplacian optimal design

for image retrieval. In SIGIR.

Hoi, S., Jin, R., Zhu, J., & Lyu, M. (2008). Semi-supervised SVM batch

mode active learning for image retrieval. In CVPR.

Hoi, S., Jin, R., Zhu, J., & Lyu, M. (2009). Semisupervised svm batch

mode active learning with applications to image retrieval. ACM

Transactions on Information Systems,27(3), 16:1–16:29.

Hoi, S., & Lyu, M. (2005). A semi-supervised active learning framework

for image retrieval. CVPR,2, 302–309.

Jain, P., & Kapoor, A. (2009). Active learning for large multi-class

problems. In CVPR.

Jegelka, S., Kapoor, A., & Horvitz, E. (2014). An interactive approach to

solving correspondence problems. International Journal of Com-

puter Vision,108(1–2), 49–58.

Joshi, A., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active

learning for image classiﬁcation. In CVPR.

Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2010). Gaussian

processes for object categorization. International Journal of Com-

puter Vision,88(2), 169–188.

Kowdle, A., Chang, Y., Gallagher, A., & Chen, T. (2011). Active learn-

ing for piecewise planar 3D reconstruction. In CVPR.

Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Recog-

nizing realistic actions from videos in the wild. In CVPR.

Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features:

Spatial pyramid matching for recognizing natural scene categories.

In CVPR.

Li, H., Shi, Y., Chen, M., Hauptmann, A., & Xiong, Z. (2010). Hybrid

active learning for cross-domain video concept detection. In ACM

Multimedia.

Li, M., & Sethi, I. K. (2006). Conﬁdence-based active learning. IEEE

Transactions on Pattern Analysis and Machine Intelligence,28(8),

1251–1261.

Li, X., Wang, L., & Sung, E. (2004). Multilabel SVM active learning

for image classiﬁcation. In ICIP.

123

Int J Comput Vis

Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sam-

pling for nearest neighbor classiﬁers. Machine Learning,54(2),

125–152.

Liu, J., Luo, J., & Shah, M. (2009). Recognizing realistic actions from

videos in the wild. In CVPR.

Ma, Z., Yang, Y., Nie, F., Sebe, N., Yan, S., & Hauptmann, A. (2014).

Harnessing lab knowledge for real-world action recognition. Inter-

national Journal of Computer Vision,109(1–2), 60–73.

Ma, Z., Yang, Y., Sebe, N., & Hauptmann, A. (2014). Knowledge adap-

tation with partiallyshared features for event detection using few

exemplars. IEEE Transactions on Pattern Analysis and Machine

Intelligence,36(9), 1789–1802.

Nene, S., Nayar, S., & Murase, H. (1996). Columbia object image library

(coil-20). Technical Report CUCS-005-96.

Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human

actions: A local SVM approach. In ICPR.

Shen, H., Yu, S.-I., Yang, Y., Meng, D., & Hauptmann, A. (2014). Unsu-

pervised video adaptation for parsing human motion. In ECCV.

Spielman, D., & Teng, S.-H. (2004). Nearly-linear time algorithms for

graph partitioning, graph sparsiﬁcation, and solving linear sys-

tems. In STOC.

Tenenbaum, J., Silva, V., & Langford, J. C. (2000). A global geomet-

ric framework for nonlinear dimensionality reduction. Science,

290(5500), 2319–2323.

Tong, S., & Chang, E. (2001). Support vector machine active learning

for image retrieval. In ACM Multimedia.

Vondrick, C., & Ramanan, D. (2011). Video annotation and tracking

with active learning. In NIPS.

Wang, L., Chan, K. L., & Zhang, Z. (2003). Bootstrapping SVM active

learning by incorporating unlabelled images for image retrieval.

In CVPR (pp. 629–634).

Yan, R., Yang, J., & Hauptmann, A. (2003). Automatically labeling

video data using multi-class active learning. In ICCV.

Yang, Y., Ma, Z., Hauptmann, A., & Sebe, N. (2013). Feature selection

for multimedia analysis by sharing information among multiple

tasks. IEEE Transactions on Multimedia,15(3), 661–669.

Yang, Y., Ma, Z., Xu, Z., Yan, S., & Hauptmann, A. (2013). How related

exemplars help complex event detection in web videos. In ICCV.

Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., & Pan, Y. (2012). A

multimedia retrieval framework based on semi-supervised ranking

and relevance feedback. IEEE Transactions on Pattern Analysis

and Machine Intelligence,34(4), 723–742.

Yu, K., Bi, J., & Tresp, V. (2006). Active learning via transductive

experimental design. In ICML (pp. 1081–1088).

Zhu, X. (2008). Semi-supervised learning literature survey. Technical

Report, University of Wisconsin-Madison.

Zhu, X., Ghahramani, Z., & Lafferty, J.D. (2003). Semi-supervised

learning using gaussian ﬁelds and harmonic functions. In ICML

(pp. 912–919).

123