Content uploaded by Gongbo Liang

Author content

All content in this area was uploaded by Gongbo Liang on Jul 16, 2019

Content may be subject to copyright.

PEDESTRIAN DETECTION VIA A LEG-DRIVEN PHYSIOLOGY

FRAMEWORK

Gongbo Liang, Qi Li

Western Kentucky University

Department of Computer Science

Ogden College of Science & Engineering

Bowling Green, KY 42101, USA

Xiangui Kang

Guangdong Key Lab of Information Security

Sun Yat-Sen University

School of Data and Computer Science

Guangzhou 510006, China

ABSTRACT

In this paper, we propose a leg-driven physiology framework

for pedestrian detection. The framework is introduced to re-

duce the search space of candidate regions of pedestrians.

Given a set of vertical line segments, we can generate a space

of rectangular candidate regions, based on a model of body

proportions. The proposed framework can be either integrated

with or without learning-based pedestrian detection methods

to validate the candidate regions. A symmetry constraint is

then applied to validate each candidate region to decrease the

false positive rate. The experiment demonstrates the promis-

ing results of the proposed method by comparing it with Dalal

& Triggs method. For example, rectangular regions detected

by the proposed method has much similar area to the ground

truth than regions detected by Dalal & Triggs method.

Index Terms—Pedestrian detection, leg, line segment,

bounding box

1. INTRODUCTION

Pedestrian detection has been active in computer vision and

pattern recognition [6, 4] due to its wide range of applica-

tions, e.g., video surveillance [20], driving assistance [17].

Most of pedestrian detection methods follow a machine learn-

ing strategy that usually contains two aspects: i) designing

a distinct representation robust with respect to various ap-

pearance of a pedestrian, and ii) designing/selecting an ef-

fective classiﬁer. In the context of pedestrian detection, ex-

amples of well-known representations include Haar wavelet

coefﬁcients [15], grid of Histogram of Oriented Gradients

(HOG) [3], Local Binary Patterns [23], and edgelet part rep-

resentations [21]; examples of well-known classiﬁers include

SVM [15, 3], neural network [22, 5], boosting [14, 9], and

Bayesian [21]. Recently, deep neural network (also called

deep learning) received extensive studies for pedestrian detec-

tion [16, 13, 24, 18]. An appealing advantage of deep neural

networks is that they can be directly applied to the raw rep-

resentation of a candidate region without an explicit feature

Fig. 1. A theory of body proportions of a pedestrian (in a

standing pose), where the head height is used as the basic unit

to measure the length of other body parts [2]. In practice, it is

not easy to estimate the head height, and thus we propose to

estimate the height and the width of a pedestrian in terms of

his/her legs (either lower legs or entire legs) under a standing

pose or a walking pose.

extraction procedure, such as HOG [3].

In this paper, we propose a leg-driven physiology frame-

work for pedestrian detection. The basic idea of the proposed

framework is that we can construct a small set of rectangu-

lar candidate regions based on the theory of body proportions

of a pedestrian [2] (as illustrated in Fig. 1), in addition to

recent developments on line segment detection (or 2-piece

polylines) [10, 19, 11]. The proposed framework is driven

by legs, which is motivated by the following facts: 1) a leg is

more salient than an arm in terms of its length and width; 2)

a leg can be modeled by simpler geometry primitives (line

segments or 2-piece polylines) than a head. More specif-

ically, we ﬁrst detect a number of “vertical” line segments

(whose cross angles with the ground is larger than 45◦. For

each line segment, we generate a number of bounding boxes

2926978-1-4673-9961-6/16/$31.00 ©2016 IEEE ICIP 2016

whose locations, heights, and widths are estimated by a devel-

oped theory of body proportion of a pedestrian in a standing

or a walking pose. Finally, a symmetry constraint is applied

to remove non-pedestrian bounding boxes. It is worth noting

that the proposed framework can be integrated with an exist-

ing machine learning method, by adding an additional step of

verifying bounding boxes via a machine learning method. In

the experiment, we compare Dalal & Triggs method (i.e., the

HOG+SVM method) and the proposed framework, and the

results convince its effectiveness.

The paper is structured as follows: Section 2 reviews line

segment detection. A pedestrian detection algorithm is pro-

posed in Section 3. Experiments are presented in Section 4.

Conclusion and future works are presented in Section 5.

2. LINE SEGMENTATION DETECTION

Recently, several interesting methods were proposed to de-

tect line segments [10, 19, 11]. Here, we are going to review

two of them, both of which engage connected components of

edge pixels: i) Kosecka and Zhang [10], and ii) Li et al. [11].

Kosecka and Zhang [10] proposed a ﬁtting based method to

detect line segments in the context of vanishing point estima-

tion. Speciﬁcally, their method ﬁrst applied quantiﬁed gradi-

ent directions to label edge pixels, and then applied the con-

nected components algorithm to group edge pixels with the

same label. A ﬁtting algorithm was ﬁnally applied to each

connected component to estimate its line parameter. Li et

al. [11] proposed a method that can detect not only line seg-

ments (called 1-piece polylines) but also two joint line seg-

ments (called 2-piece polylines) in the context of stop sign

detection. Note that 2-piece polylines can be used to model

a bended leg of a walking pedestrian under a side view. Due

to the space constraint, we here will focused on line segments

only.

Given a connected component (C) of edge pixels, Li et al.

method performs three steps for line segment detection [11].

The ﬁrst step extracts three dominant points, where the ﬁrst

two points (v1and v2) maximize the distance of an arbitrary

pair of points in C, and the third point v3maximizes the sum

of distances between a p∈Cand vi, i = 1,2, i.e.,

max

p∈C(kp−v1k+kp−v2k).

The second step veriﬁes the piecewise linearity of C. The

third step partitions Cinto two subsets if Cdoes not form a

1-piece polyline. This idea is then applied to the two subsets

recursively.

Fig. 2 shows “vertical” line segments detected in a pedes-

trian image by the above two methods: i) Kosecka and Zhang

[10], and ii) Li et al. [11]. Recall that a vertical line segment

refers to a line segment whose cross angle with the ground is

larger than 45◦in this paper. Note that some line segments

detected by Li’s methods may be partially overlapped to each

other due to the application of a scale space. Both methods

detected a sufﬁcient number of line segments consistent with

the length of lower legs or entire legs of a pedestrian, which

helps to generate precise bounding boxes.

(a) Kosecka & Zhang [10] (b) Li et al. [11]

Fig. 2. “Vertical” line segments detected by Kosecka & Zhang

method [10] and Li et al. method [11]. Both methods detected

a sufﬁcient number of line segments consistent with the length

of lower legs or entire legs of a pedestrian, which helps to

generate precise bounding boxes.

3. A LEG-DRIVEN PEDESTRIAN DETECTION

ALGORITHM

In this section, we will propose a leg-driven framework for

pedestrian detection. The framework contains two key com-

ponents: i) how to generate hypothesis boxes, given vertical

line segments; and 2) how to remove false-positive boxes.

For convenience, we will use the notations listed in Ta-

ble 1.

notation meaning

hheight of a hypothesis box

wwidth of a hypothesis box

llength of a line segment

θcross angle between a line and the ground

Table 1. Notations

3.1. Generation of hypothesis boxes

Pedestrians can have many different poses under different

(camera) viewing directions. Poses and viewing directions

have a signiﬁcant impact on the width of a hypothesis box,

and have relatively small impact on the height of a hypothesis

2927

box. Exhaustive modeling a large number of poses may not

be a realistic attempt since this attempt can generate a large

number of hypothesis boxes, increasing computation cost and

more seriously increasing false-positive instances. Thus, we

propose two standard conﬁgurations on hypothesis boxes: i)

narrow, and ii) wide. A narrow box has a width equal to 2

times of the height of a head (refer to Fig. 1; a wide box has

a width equal to 4 times the height of a head. The height of

both types of boxes is equal to 8 times the height of a head.

Given a “vertical” line segment with length l, it is not dif-

ﬁcult for us to decide the height and width of a box, under

different combinations of two factors: i) a lower or entire leg,

and ii) a narrow or wide type. Table 2 shows the formula to

estimate the size of a box under different scenarios.

The two standard types of boxes are exclusive, i.e., only

one type of box can be generated for a given vertical line seg-

ment. The selection of a narrow or wide box depends on θ,

in addition to the leg conﬁguration (lower or entire). When a

line segment is an edge of an entire leg, it is easy to see that

θ= 60◦can be used to decide the type of a box. (Note that

cos 60◦= 0.5.) Otherwise, the analysis seems difﬁcult with-

out introducing a 2-piece polyline that can model a walking

leg very well. We will leave this in the future work.

narrow type wide type

lower leg entire leg lower leg entire leg

h w h w h w h w

4l l 2l0.5l4l2l2l l

Table 2. The length of a “vertical” line segment is used to

decide the length and the width of a hypothesis box.

Since a line segment can be one of four edges of two lower

legs, or one of four edges of two entire legs, we need to decide

the location of a hypothesis box for each possible case. One

key factor that has an impact on the location estimation is the

width of a leg. With measurement on images on the INRIA

pedestrian dataset, we set the width of a lower leg as 0.25 ×l.

Under a narrow size conﬁguration, we set the gap between

two legs to be 0.1×l. Thus, we can derive the relative distance

between each of four possible edges of a leg and the boundary

of a bounding box, as illustrated in Fig. 3. Note that numbers

displayed in Fig. 3 represents ratios only. The analysis under

a wide size conﬁguration is similar to Fig. 3. But there are

only 4 boxes generated for a fairly skew line segment.

3.2. Symmetry constraint

Left/right symmetry has been shown an effective way to re-

duce the false positive rate of a pedestrian detection method

[1, 7, 8]. Bertozzi et al. [1] measured the symmetry of an

image region (enclosed by a bounding box) by computing the

similarity of the normalized histograms of gray values of left

and right sub-regions (that are divided by the central verti-

cal line of a given bounding box). Speciﬁcally, assume that

(a) (b)

0.2

0.45

0.55

0.8

1

4

2

1

Fig. 3. Under a narrow size conﬁguration, eight hypothesis

boxes are generated for given a “vertical” line segment. (a)

Four boxes are generated with the hypothesis that the line seg-

ment is one of four edges of two lower legs; (b) Four boxes

are generated with the hypothesis that the line segment is one

of four edges of two entire legs.

hi, i = 1,2,are the histograms of the gray values of left and

right sub-regions, respectively. The symmetry of the region

is measured by the dot product h1

kh1k·h2

kh2k, where k · k de-

notes the 2-norm of a vector. Following the above idea, sym-

metry measurement is translated to similarity measurement,

and thus many existing feature representations, such as HOG,

SIFT [12], and LBP [23], can be used as alternatives, espe-

cially in the scenarios that computational time is not critical

in an application of pedestrian/human detection.

3.3. Merging boxes with signiﬁcant overlaps

There may be multiple boxes partially overlapped to each

other. One reason is that multiple hypothesis boxes may be

generated by the same vertical line. In the context of scale

space, similar line segments may be detected, which leads to

similar bounding boxes. We try to merge them together in or-

der to have a clear view on the output results. Given hypothe-

sis boxes b1and b2, their overlapping region are quantiﬁed as

the overlapping ratio between these two boxes as follows:

overlapping =area(b1∩b2)

area(b1∪b2).(1)

If the overlapping ratio is signiﬁcant, i.e., larger than a thresh-

old (that is set to be 50% in this paper), b2is added to the

cluster containing b1. For each cluster of boxes, we return the

bounding box with the highest symmetry. Note that if two

boxes have signiﬁcant difference in their size, the two boxes

won’t be merged even though one box is enclosed to the other

one.

4. EXPERIMENT

In this section, we will test the performance of the pro-

posed framework, along with a comparison to Dalal & Triggs

method, i.e., the HOG+SVM method [3]. (The pedestrian

2928

method true positive # false positive #

Dalal and Triggs [3] 86 367

Proposed 127 215

Table 3. Test set contains 250 images and 287 pedestrians.

The proposed method increases 48% true positive detected

pedestrians, and descrease 41% false positive detected pedes-

trians.

detector in the vision package in Matlab 2016a is used as the

implementation of Dalal & Triggs method.) Li et al. method

[11] is used to detect line segments. The left/right symmetry

of a region is measured by the similarity of histograms of

gray values in its left and right sub-regions. We will ﬁrst give

a visual comparison, and then a quantitative comparison.

Fig. 4 shows a visual comparison between Dalal & Triggs

method and the proposed method. It is clear to see that

the bounding boxes detected by Dalal & Triggs method are

commonly much larger than the proposed method, while the

bounding boxes detected by the proposed method are much

more accurate. Moreover, many bounding boxes detected

Dalal & Triggs method are false positive. In the ﬁrst two im-

ages, each of which contains two pedestrians, Dalal & Triggs

method only detects one.

Next, we present a quantitative comparison between the

two methods. Our test set includes 250 INRIA images that

contain 287 pedestrians totally. A bounding box output by a

method is considered as true-positive if the overlap between

the bounding box and the ground truth is over 50%.

Table 3 shows the number of true positives and false posi-

tives obtained by the two methods. Dalal & Triggs method de-

tected 86 pedestrians correctly, and the proposed one detected

127 pedestrians. Dalal & Briggs method outputs 367 false-

positive bounding boxes, and the proposed one outputs 215.

Precisely, the proposed method increases 48% true-positives,

and decreases 41% false-positives.

5. CONCLUSION AND FUTURE WORK

In this paper, we proposed a leg-driven framework for pedes-

trian detection. Experiments show that the proposed frame-

work achieve more precise localization of a pedestrian region

than Dalal & Triggs method. In the future, we will explore the

features of 2-piece polylines, such as the orientation, to re-

duce the number of hypothesis boxes, which is in turn equiv-

alent to reduce the false positive rates.

Acknowledgements: The work of Xiangui Kang was sup-

ported by NSFC (Grant nos. 61379155, U1536204) and NSF

of Guangdong province (Grant no. s2013020012788).

(a) Dalal & Triggs [3] (b) Proposed

Fig. 4. A comparison between Dalal & Triggs [3] and the

proposed method. The bounding boxes detected by Dalal &

Triggs method are commonly much larger than the proposed

method.

2929

6. REFERENCES

[1] M. Bertozzi, A. Broggi, R. Chapuis, F. Chausse, A. Fascioli,

and A. Tibaldi. Shape-based pedestrian detection and local-

ization. In IEEE Trans. on Intelligent Transportation Systems,

volume 1, pages 328–333, 2003.

[2] B. Bogin and M. Varela-Silva. Leg length, body proportion,

and health: a review with a note on beauty. International Jour-

nal of Environmental Research and Public Health, 7(3):1047–

1075, 2010.

[3] N. Dalal and B. Triggs. Histograms of oriented gradients for

human detection. In IEEE Computer Society Conference on

Computer Vision and Pattern Recognition (CVPR’05), vol-

ume 1, pages 886–893, vol. 1, 2005.

[4] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian

detection: An evaluation of the state of the art. IEEE Trans.

Pattern Anal. Mach. Intell., 34(4):743–761, 2012.

[5] D. Gavrila and J. Giebel. Shape-based pedestrian detection

and tracking. In IEEE Intelligent Vehicle Symposium, vol-

ume 1, pages 8–14, 2002.

[6] D. Geronimo, A. Lopez, A. Sappa, and T. Graf. Survey of

pedestrian detection for advanced driver assistance systems.

IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 32(7):1239–1258, 2010.

[7] I. Havasi, Z. Szlavik, and T. Sziranyi. Pedestrian detection

using derived third-order symmetry of legs: A novel method

of motion-based information extraction from video image-

sequences. In K. Wojciechowski, B. Smolka, H. Palus, R. Koz-

era, W. Skarbek, and L. Noakes, editors, Computer Vision and

Graphics, volume 32 of Computational Imaging and Vision,

pages 733–739. 2006.

[8] L. Havasi, Z. Szl´

avik, and T. Szir´

anyi. Detection of gait char-

acteristics for scene registration in video surveillance system.

IEEE Trans. Image Processing, 16(2):503–510, 2007.

[9] V.-D. Hoang, M.-H. Le, and K.-H. Jo. Hybrid cascade boost-

ing machine using variant scale blocks based HOG features for

pedestrian detection. Neurocomputing, 135:357–366, 2014.

[10] J. Koseck´

a and W. Zhang. Video compass. In European Con-

ference on Computer Vision (4), pages 476–490, 2002.

[11] Q. Li, G. Liang, and Y. Gong. A geometric framework for

stop sign detection. In IEEE China Summit and International

Conference on Signal and Information Processing, ChinaSIP

2015, Chengdu, China, July 12-15, 2015, pages 258–262,

2015.

[12] D. Lowe. Distinctive image features from scale-invariant key-

points. International Journal on Computer Vision, 60(2):91–

110, 2004.

[13] W. Ouyang and X. Wang. Joint deep learning for pedestrian

detection. In Computer Vision (ICCV), 2013 IEEE Interna-

tional Conference on, pages 2056–2063, 2013.

[14] S. Paisitkriangkrai, C. Shen, and J. Zhang. Fast pedestrian

detection using a cascade of boosted covariance features. IEEE

Trans. Circuits Syst. Video Techn., 18(8):1140–1151, 2008.

[15] C. Papageorgiou, T. Evgeniou, and T. Poggio. A trainable

pedestrian detection system. In Intelligent Vehicles, pages

241–246, 1998.

[16] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun.

Pedestrian detection with unsupervised multi-stage feature

learning. In 2013 IEEE Conference on Computer Vision and

Pattern Recognition, Portland, OR, USA, June 23-28, 2013,

pages 3626–3633, 2013.

[17] A. Shashua, Y. Gdalyahu, and G. Hayun. Pedestrian detection

for driving assistance systems: single-frame classiﬁcation and

system level performance. In IEEE Intelligent Vehicles Sym-

posium, pages 1–6, 2004.

[18] Y. Tian, P. Luo, X. Wang, and X. Tang. Pedestrian detection

aided by deep learning semantic tasks. In IEEE Conference

on Computer Vision and Pattern Recognition, CVPR 2015,

Boston, MA, USA, June 7-12, 2015, pages 5079–5087, 2015.

[19] R. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall. Lsd:

A fast line segment detector with a false detection control.

IEEE Trans. Pattern Anal. Mach. Intell., 32(4):722–732, 2010.

[20] X. Wang, M. Wang, and W. Li. Scene-speciﬁc pedestrian de-

tection for static video surveillance. IEEE Transactions on

Pattern Analysis and Machine Intelligence, 36(2):361–374,

2014.

[21] B. Wu and R. Nevatia. Detection of multiple, partially oc-

cluded humans in a single image by bayesian combination of

edgelet part detectors. In International Conference on Com-

puter Vision, volume 1, pages 90–97, 2005.

[22] L. Zhao and C. Thorpe. Stereo- and neural network-based

pedestrian detection. IEEE Transactions on Intelligent Trans-

portation Systems, 1(3):148–154, 2000.

[23] Y. Zheng, C. Shen, and X. Huang. Pedestrian detection us-

ing center-symmetric local binary patterns. In Proceedings of

the International Conference on Image Processing, ICIP 2010,

September 26-29, Hong Kong, China, pages 3497–3500, 2010.

[24] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao. Orientation

robust object detection in aerial images using deep convolu-

tional neural network. In 2015 IEEE International Conference

on Image Processing, ICIP 2015, Quebec City, QC, Canada,

September 27-30, 2015, pages 3735–3739, 2015.

2930