Page 1

Multi-class Boosting for Early Classification of Sequences

Katsuhiko Ishiguro

ishiguro@cslab.kecl.ntt.co.jp

Hiroshi Sawada

sawada@cslab.kecl.ntt.co.jp

Hitoshi Sakano

keen@cslab.kecl.ntt.co.jp

NTT Communication Science Laboratories

NTT Corporation

Kyoto, 610-0237, Japan

Consider the problem of driver behavior recognition from images

captured by a camera installed in a vehicle [4]. Recognition of driver

behavior is crucial for driver assistance systems that make driving com-

fortable and safe. One notable requirement for real applicatioins is that

we would like to predict and classify a behavior as quickly as possi-

ble: if we detect a sign of dangerous movements such as mobile phone

use while driving, we would like to warn the driver quickly before the

behavior causes any accidents. This kind of classification task is called

“early classification (recognition),” and is important for many practical

problems including on-line handwritten character recognition, and speech

recognition systems.

In this paper, we focus one of the most famous discriminative mod-

els, i.e. Adaboost[1, 2], andextenditforearlyclassificationofsequences.

While existing researches (e.g. [5, 6]) have studied only a binary classifi-

cation problem, we present a multi-class extension of Adaboost for early

classification, called Earlyboost.MH (Fig. 1). In this paper, we propose

an efficient multi-class Adaboost for early classification by combining

multi-class Adaboost.MH [3] and the early classification Boosting (Early-

boost [6]),

Observation (feature)

sequence

Classify

Misclassification

Information

ht,k (xt)

αt

Learning

a new weak classifier

Weight

Propagation

Hk (x) = Σt=1:L αt ht,k (xt)

start (t = 1)

end (t = T)

1

t-1

t

Early classification of

(sub)sequences

Add a weak classifier

Input sequence

Figure 1: Overview of a concept of the multi-class early classification

boosting. Final strong classifiers consists of time frame-wise weak clas-

sifiers. The weak classifiers are learnt through weight propagation tech-

nique to achieve early classification of (sub)sequences.

The training data consists of the ith sequence x x xi= {xi,t∈ Rd} and

its class label yi∈ {1,2,...,K}. The number of sequences is N: thus

i ∈ {1,2,...,N}. T is the length of time sequences, and t ∈ {1,2,...,T}

is the time index. A weak classifier ht,k(x) : Rd→ {1,−1} only accepts the

samples on the t-th time frames, and returns 1 if x belongs to class k, and

returns −1 otherwise. We also define gk(y) : {1,2,...,K} → {1,−1} returns

1 if y = k, and returns −1 otherwise. Ht

strong classifier which computes the likelihood of the sequence being the

member of class k, and consists of t weak classifiers.

The loss to minimize in Earlyboost.MH is:

kindicates the one vs. all type

J(Ht)=

N

∑

i=1

K

∑

k=1

(

exp

(

−gk(yi)Ht

k(x x xi)

))

=

N

∑

i=1

K

∑

k=1

exp

−gk(yi)

t ∑

s=1

αshs,k(xi,s)

.

(1)

And we seek for {ht,k,αt} such that:

αt,ht,k= argmin

α,h

N

∑

i=1

K

∑

k=1

exp

[

−gk(yi)

(

Ht−1

k

(x x xi)+αtht,k(xi,t)

)]

.

(2)

An optimal ht,kand αtis computed as follows:

ht,k= argmax

h

rt,k,αt=1

2log

(1+∑

krt,k

krt,k

1−∑

)

,

(3)

rt,k=

N

∑

i=1

gk(yi)ht,k(xi,t)Dt(i,k).

(4)

rt,kis a class- and frame-wise classification score which only depends on

the observation at the t-th frame that will be large if the estimated label

by a weak classifier and gkmatch correctly.

These equations imply the benefits of Earlyboost.MH model. First,

the optimal αtis computed via a sum of K class scores. This implies that

the resultant weak classifiers are optimized for multi-class problem, not

for K independent binary classifications. Second, Dt(i,k) ∈ R is a weight

of a sequence x x xifor the k-th classifier at t-th frame, and computed as

follows:

(

Eq. (5) implies that each weak classifier ht,klearns the classification

boundary at time t to minimize the classification error induced by

the information up to time t−1. This interpretation of the weight Dt

is first devised by Earlyboost [6] by using frame-wise weak classifiers

h. Because of this “weight propagation” update rule, the Earlyboost.MH

classifier will be good for early classification of sequences.

Our Earlyboost.MH is validated using two datasets: the online hand-

written digits trajectory data, and the driver behavior data (Fig. 2). Exper-

imental results showed the effectiveness of the proposed model in multi-

class early classification of sequence data.

Dt(i,k) ∝ Dt−1(i,k)exp

−αt−1gk(yi)ht−1,k(xi,t−1)

)

.

(5)

Pressure sensitive tablet

Stylus pen

(A)

Three joints, tracked by optical flow (no markers)

(B)

Sampling points (w,h)

Figure 2: Dataset used in the experiments. (A) On-line handwritten digits

data. Trajectories are collected through a pressure sensitive tablet and a

wireless stylus pen. (B) Driver behavior data. Seven subjects are recorded

their driving simulations by a consumer video camera. Three joints are

tracked by optical flow, without any markers or special attachments.

[1] Y. Freund and R. E. Schapire. A decision-theoretic generalization of

on-linelearningandanapplicationtoboosting. JournalofComputing

Systems and Science, 55(1):119–139, 1997.

[2] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regres-

sion: A statistical view of boosting (with discussion). Annals of

Statistics, 28(2):337–407, 2000.

[3] R. Schapire and Y. Singer.Improved boosting algorithms using

confidence-rated predictions.

1999.

[4] Y. A. Sheikh, A. Datta, and T. Kanade. On the sustained tracking of

human motion. In Proc. FG, 2008.

[5] J. Sochman and J. Matas. Waldboost ― learning for time constrained

sequentialdetection. InProc.CVPR,volume2, pages150–156, 2005.

[6] S. Uchida and K. Amamoto. Early recognition of sequential patterns

by classifier combination. In Proc. ICPR, 2008.

Machine Learning, 37(3):297–336,