Content uploaded by Gongbo Liang
Author content
All content in this area was uploaded by Gongbo Liang on Jul 16, 2019
Content may be subject to copyright.
PEDESTRIAN DETECTION VIA A LEG-DRIVEN PHYSIOLOGY
FRAMEWORK
Gongbo Liang, Qi Li
Western Kentucky University
Department of Computer Science
Ogden College of Science & Engineering
Bowling Green, KY 42101, USA
Xiangui Kang
Guangdong Key Lab of Information Security
Sun Yat-Sen University
School of Data and Computer Science
Guangzhou 510006, China
ABSTRACT
In this paper, we propose a leg-driven physiology framework
for pedestrian detection. The framework is introduced to re-
duce the search space of candidate regions of pedestrians.
Given a set of vertical line segments, we can generate a space
of rectangular candidate regions, based on a model of body
proportions. The proposed framework can be either integrated
with or without learning-based pedestrian detection methods
to validate the candidate regions. A symmetry constraint is
then applied to validate each candidate region to decrease the
false positive rate. The experiment demonstrates the promis-
ing results of the proposed method by comparing it with Dalal
& Triggs method. For example, rectangular regions detected
by the proposed method has much similar area to the ground
truth than regions detected by Dalal & Triggs method.
Index Terms—Pedestrian detection, leg, line segment,
bounding box
1. INTRODUCTION
Pedestrian detection has been active in computer vision and
pattern recognition [6, 4] due to its wide range of applica-
tions, e.g., video surveillance [20], driving assistance [17].
Most of pedestrian detection methods follow a machine learn-
ing strategy that usually contains two aspects: i) designing
a distinct representation robust with respect to various ap-
pearance of a pedestrian, and ii) designing/selecting an ef-
fective classifier. In the context of pedestrian detection, ex-
amples of well-known representations include Haar wavelet
coefficients [15], grid of Histogram of Oriented Gradients
(HOG) [3], Local Binary Patterns [23], and edgelet part rep-
resentations [21]; examples of well-known classifiers include
SVM [15, 3], neural network [22, 5], boosting [14, 9], and
Bayesian [21]. Recently, deep neural network (also called
deep learning) received extensive studies for pedestrian detec-
tion [16, 13, 24, 18]. An appealing advantage of deep neural
networks is that they can be directly applied to the raw rep-
resentation of a candidate region without an explicit feature
Fig. 1. A theory of body proportions of a pedestrian (in a
standing pose), where the head height is used as the basic unit
to measure the length of other body parts [2]. In practice, it is
not easy to estimate the head height, and thus we propose to
estimate the height and the width of a pedestrian in terms of
his/her legs (either lower legs or entire legs) under a standing
pose or a walking pose.
extraction procedure, such as HOG [3].
In this paper, we propose a leg-driven physiology frame-
work for pedestrian detection. The basic idea of the proposed
framework is that we can construct a small set of rectangu-
lar candidate regions based on the theory of body proportions
of a pedestrian [2] (as illustrated in Fig. 1), in addition to
recent developments on line segment detection (or 2-piece
polylines) [10, 19, 11]. The proposed framework is driven
by legs, which is motivated by the following facts: 1) a leg is
more salient than an arm in terms of its length and width; 2)
a leg can be modeled by simpler geometry primitives (line
segments or 2-piece polylines) than a head. More specif-
ically, we first detect a number of “vertical” line segments
(whose cross angles with the ground is larger than 45◦. For
each line segment, we generate a number of bounding boxes
2926978-1-4673-9961-6/16/$31.00 ©2016 IEEE ICIP 2016
whose locations, heights, and widths are estimated by a devel-
oped theory of body proportion of a pedestrian in a standing
or a walking pose. Finally, a symmetry constraint is applied
to remove non-pedestrian bounding boxes. It is worth noting
that the proposed framework can be integrated with an exist-
ing machine learning method, by adding an additional step of
verifying bounding boxes via a machine learning method. In
the experiment, we compare Dalal & Triggs method (i.e., the
HOG+SVM method) and the proposed framework, and the
results convince its effectiveness.
The paper is structured as follows: Section 2 reviews line
segment detection. A pedestrian detection algorithm is pro-
posed in Section 3. Experiments are presented in Section 4.
Conclusion and future works are presented in Section 5.
2. LINE SEGMENTATION DETECTION
Recently, several interesting methods were proposed to de-
tect line segments [10, 19, 11]. Here, we are going to review
two of them, both of which engage connected components of
edge pixels: i) Kosecka and Zhang [10], and ii) Li et al. [11].
Kosecka and Zhang [10] proposed a fitting based method to
detect line segments in the context of vanishing point estima-
tion. Specifically, their method first applied quantified gradi-
ent directions to label edge pixels, and then applied the con-
nected components algorithm to group edge pixels with the
same label. A fitting algorithm was finally applied to each
connected component to estimate its line parameter. Li et
al. [11] proposed a method that can detect not only line seg-
ments (called 1-piece polylines) but also two joint line seg-
ments (called 2-piece polylines) in the context of stop sign
detection. Note that 2-piece polylines can be used to model
a bended leg of a walking pedestrian under a side view. Due
to the space constraint, we here will focused on line segments
only.
Given a connected component (C) of edge pixels, Li et al.
method performs three steps for line segment detection [11].
The first step extracts three dominant points, where the first
two points (v1and v2) maximize the distance of an arbitrary
pair of points in C, and the third point v3maximizes the sum
of distances between a p∈Cand vi, i = 1,2, i.e.,
max
p∈C(kp−v1k+kp−v2k).
The second step verifies the piecewise linearity of C. The
third step partitions Cinto two subsets if Cdoes not form a
1-piece polyline. This idea is then applied to the two subsets
recursively.
Fig. 2 shows “vertical” line segments detected in a pedes-
trian image by the above two methods: i) Kosecka and Zhang
[10], and ii) Li et al. [11]. Recall that a vertical line segment
refers to a line segment whose cross angle with the ground is
larger than 45◦in this paper. Note that some line segments
detected by Li’s methods may be partially overlapped to each
other due to the application of a scale space. Both methods
detected a sufficient number of line segments consistent with
the length of lower legs or entire legs of a pedestrian, which
helps to generate precise bounding boxes.
(a) Kosecka & Zhang [10] (b) Li et al. [11]
Fig. 2. “Vertical” line segments detected by Kosecka & Zhang
method [10] and Li et al. method [11]. Both methods detected
a sufficient number of line segments consistent with the length
of lower legs or entire legs of a pedestrian, which helps to
generate precise bounding boxes.
3. A LEG-DRIVEN PEDESTRIAN DETECTION
ALGORITHM
In this section, we will propose a leg-driven framework for
pedestrian detection. The framework contains two key com-
ponents: i) how to generate hypothesis boxes, given vertical
line segments; and 2) how to remove false-positive boxes.
For convenience, we will use the notations listed in Ta-
ble 1.
notation meaning
hheight of a hypothesis box
wwidth of a hypothesis box
llength of a line segment
θcross angle between a line and the ground
Table 1. Notations
3.1. Generation of hypothesis boxes
Pedestrians can have many different poses under different
(camera) viewing directions. Poses and viewing directions
have a significant impact on the width of a hypothesis box,
and have relatively small impact on the height of a hypothesis
2927
box. Exhaustive modeling a large number of poses may not
be a realistic attempt since this attempt can generate a large
number of hypothesis boxes, increasing computation cost and
more seriously increasing false-positive instances. Thus, we
propose two standard configurations on hypothesis boxes: i)
narrow, and ii) wide. A narrow box has a width equal to 2
times of the height of a head (refer to Fig. 1; a wide box has
a width equal to 4 times the height of a head. The height of
both types of boxes is equal to 8 times the height of a head.
Given a “vertical” line segment with length l, it is not dif-
ficult for us to decide the height and width of a box, under
different combinations of two factors: i) a lower or entire leg,
and ii) a narrow or wide type. Table 2 shows the formula to
estimate the size of a box under different scenarios.
The two standard types of boxes are exclusive, i.e., only
one type of box can be generated for a given vertical line seg-
ment. The selection of a narrow or wide box depends on θ,
in addition to the leg configuration (lower or entire). When a
line segment is an edge of an entire leg, it is easy to see that
θ= 60◦can be used to decide the type of a box. (Note that
cos 60◦= 0.5.) Otherwise, the analysis seems difficult with-
out introducing a 2-piece polyline that can model a walking
leg very well. We will leave this in the future work.
narrow type wide type
lower leg entire leg lower leg entire leg
h w h w h w h w
4l l 2l0.5l4l2l2l l
Table 2. The length of a “vertical” line segment is used to
decide the length and the width of a hypothesis box.
Since a line segment can be one of four edges of two lower
legs, or one of four edges of two entire legs, we need to decide
the location of a hypothesis box for each possible case. One
key factor that has an impact on the location estimation is the
width of a leg. With measurement on images on the INRIA
pedestrian dataset, we set the width of a lower leg as 0.25 ×l.
Under a narrow size configuration, we set the gap between
two legs to be 0.1×l. Thus, we can derive the relative distance
between each of four possible edges of a leg and the boundary
of a bounding box, as illustrated in Fig. 3. Note that numbers
displayed in Fig. 3 represents ratios only. The analysis under
a wide size configuration is similar to Fig. 3. But there are
only 4 boxes generated for a fairly skew line segment.
3.2. Symmetry constraint
Left/right symmetry has been shown an effective way to re-
duce the false positive rate of a pedestrian detection method
[1, 7, 8]. Bertozzi et al. [1] measured the symmetry of an
image region (enclosed by a bounding box) by computing the
similarity of the normalized histograms of gray values of left
and right sub-regions (that are divided by the central verti-
cal line of a given bounding box). Specifically, assume that
(a) (b)
0.2
0.45
0.55
0.8
1
4
2
1
Fig. 3. Under a narrow size configuration, eight hypothesis
boxes are generated for given a “vertical” line segment. (a)
Four boxes are generated with the hypothesis that the line seg-
ment is one of four edges of two lower legs; (b) Four boxes
are generated with the hypothesis that the line segment is one
of four edges of two entire legs.
hi, i = 1,2,are the histograms of the gray values of left and
right sub-regions, respectively. The symmetry of the region
is measured by the dot product h1
kh1k·h2
kh2k, where k · k de-
notes the 2-norm of a vector. Following the above idea, sym-
metry measurement is translated to similarity measurement,
and thus many existing feature representations, such as HOG,
SIFT [12], and LBP [23], can be used as alternatives, espe-
cially in the scenarios that computational time is not critical
in an application of pedestrian/human detection.
3.3. Merging boxes with significant overlaps
There may be multiple boxes partially overlapped to each
other. One reason is that multiple hypothesis boxes may be
generated by the same vertical line. In the context of scale
space, similar line segments may be detected, which leads to
similar bounding boxes. We try to merge them together in or-
der to have a clear view on the output results. Given hypothe-
sis boxes b1and b2, their overlapping region are quantified as
the overlapping ratio between these two boxes as follows:
overlapping =area(b1∩b2)
area(b1∪b2).(1)
If the overlapping ratio is significant, i.e., larger than a thresh-
old (that is set to be 50% in this paper), b2is added to the
cluster containing b1. For each cluster of boxes, we return the
bounding box with the highest symmetry. Note that if two
boxes have significant difference in their size, the two boxes
won’t be merged even though one box is enclosed to the other
one.
4. EXPERIMENT
In this section, we will test the performance of the pro-
posed framework, along with a comparison to Dalal & Triggs
method, i.e., the HOG+SVM method [3]. (The pedestrian
2928
method true positive # false positive #
Dalal and Triggs [3] 86 367
Proposed 127 215
Table 3. Test set contains 250 images and 287 pedestrians.
The proposed method increases 48% true positive detected
pedestrians, and descrease 41% false positive detected pedes-
trians.
detector in the vision package in Matlab 2016a is used as the
implementation of Dalal & Triggs method.) Li et al. method
[11] is used to detect line segments. The left/right symmetry
of a region is measured by the similarity of histograms of
gray values in its left and right sub-regions. We will first give
a visual comparison, and then a quantitative comparison.
Fig. 4 shows a visual comparison between Dalal & Triggs
method and the proposed method. It is clear to see that
the bounding boxes detected by Dalal & Triggs method are
commonly much larger than the proposed method, while the
bounding boxes detected by the proposed method are much
more accurate. Moreover, many bounding boxes detected
Dalal & Triggs method are false positive. In the first two im-
ages, each of which contains two pedestrians, Dalal & Triggs
method only detects one.
Next, we present a quantitative comparison between the
two methods. Our test set includes 250 INRIA images that
contain 287 pedestrians totally. A bounding box output by a
method is considered as true-positive if the overlap between
the bounding box and the ground truth is over 50%.
Table 3 shows the number of true positives and false posi-
tives obtained by the two methods. Dalal & Triggs method de-
tected 86 pedestrians correctly, and the proposed one detected
127 pedestrians. Dalal & Briggs method outputs 367 false-
positive bounding boxes, and the proposed one outputs 215.
Precisely, the proposed method increases 48% true-positives,
and decreases 41% false-positives.
5. CONCLUSION AND FUTURE WORK
In this paper, we proposed a leg-driven framework for pedes-
trian detection. Experiments show that the proposed frame-
work achieve more precise localization of a pedestrian region
than Dalal & Triggs method. In the future, we will explore the
features of 2-piece polylines, such as the orientation, to re-
duce the number of hypothesis boxes, which is in turn equiv-
alent to reduce the false positive rates.
Acknowledgements: The work of Xiangui Kang was sup-
ported by NSFC (Grant nos. 61379155, U1536204) and NSF
of Guangdong province (Grant no. s2013020012788).
(a) Dalal & Triggs [3] (b) Proposed
Fig. 4. A comparison between Dalal & Triggs [3] and the
proposed method. The bounding boxes detected by Dalal &
Triggs method are commonly much larger than the proposed
method.
2929
6. REFERENCES
[1] M. Bertozzi, A. Broggi, R. Chapuis, F. Chausse, A. Fascioli,
and A. Tibaldi. Shape-based pedestrian detection and local-
ization. In IEEE Trans. on Intelligent Transportation Systems,
volume 1, pages 328–333, 2003.
[2] B. Bogin and M. Varela-Silva. Leg length, body proportion,
and health: a review with a note on beauty. International Jour-
nal of Environmental Research and Public Health, 7(3):1047–
1075, 2010.
[3] N. Dalal and B. Triggs. Histograms of oriented gradients for
human detection. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05), vol-
ume 1, pages 886–893, vol. 1, 2005.
[4] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian
detection: An evaluation of the state of the art. IEEE Trans.
Pattern Anal. Mach. Intell., 34(4):743–761, 2012.
[5] D. Gavrila and J. Giebel. Shape-based pedestrian detection
and tracking. In IEEE Intelligent Vehicle Symposium, vol-
ume 1, pages 8–14, 2002.
[6] D. Geronimo, A. Lopez, A. Sappa, and T. Graf. Survey of
pedestrian detection for advanced driver assistance systems.
IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 32(7):1239–1258, 2010.
[7] I. Havasi, Z. Szlavik, and T. Sziranyi. Pedestrian detection
using derived third-order symmetry of legs: A novel method
of motion-based information extraction from video image-
sequences. In K. Wojciechowski, B. Smolka, H. Palus, R. Koz-
era, W. Skarbek, and L. Noakes, editors, Computer Vision and
Graphics, volume 32 of Computational Imaging and Vision,
pages 733–739. 2006.
[8] L. Havasi, Z. Szl´
avik, and T. Szir´
anyi. Detection of gait char-
acteristics for scene registration in video surveillance system.
IEEE Trans. Image Processing, 16(2):503–510, 2007.
[9] V.-D. Hoang, M.-H. Le, and K.-H. Jo. Hybrid cascade boost-
ing machine using variant scale blocks based HOG features for
pedestrian detection. Neurocomputing, 135:357–366, 2014.
[10] J. Koseck´
a and W. Zhang. Video compass. In European Con-
ference on Computer Vision (4), pages 476–490, 2002.
[11] Q. Li, G. Liang, and Y. Gong. A geometric framework for
stop sign detection. In IEEE China Summit and International
Conference on Signal and Information Processing, ChinaSIP
2015, Chengdu, China, July 12-15, 2015, pages 258–262,
2015.
[12] D. Lowe. Distinctive image features from scale-invariant key-
points. International Journal on Computer Vision, 60(2):91–
110, 2004.
[13] W. Ouyang and X. Wang. Joint deep learning for pedestrian
detection. In Computer Vision (ICCV), 2013 IEEE Interna-
tional Conference on, pages 2056–2063, 2013.
[14] S. Paisitkriangkrai, C. Shen, and J. Zhang. Fast pedestrian
detection using a cascade of boosted covariance features. IEEE
Trans. Circuits Syst. Video Techn., 18(8):1140–1151, 2008.
[15] C. Papageorgiou, T. Evgeniou, and T. Poggio. A trainable
pedestrian detection system. In Intelligent Vehicles, pages
241–246, 1998.
[16] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun.
Pedestrian detection with unsupervised multi-stage feature
learning. In 2013 IEEE Conference on Computer Vision and
Pattern Recognition, Portland, OR, USA, June 23-28, 2013,
pages 3626–3633, 2013.
[17] A. Shashua, Y. Gdalyahu, and G. Hayun. Pedestrian detection
for driving assistance systems: single-frame classification and
system level performance. In IEEE Intelligent Vehicles Sym-
posium, pages 1–6, 2004.
[18] Y. Tian, P. Luo, X. Wang, and X. Tang. Pedestrian detection
aided by deep learning semantic tasks. In IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2015,
Boston, MA, USA, June 7-12, 2015, pages 5079–5087, 2015.
[19] R. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall. Lsd:
A fast line segment detector with a false detection control.
IEEE Trans. Pattern Anal. Mach. Intell., 32(4):722–732, 2010.
[20] X. Wang, M. Wang, and W. Li. Scene-specific pedestrian de-
tection for static video surveillance. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 36(2):361–374,
2014.
[21] B. Wu and R. Nevatia. Detection of multiple, partially oc-
cluded humans in a single image by bayesian combination of
edgelet part detectors. In International Conference on Com-
puter Vision, volume 1, pages 90–97, 2005.
[22] L. Zhao and C. Thorpe. Stereo- and neural network-based
pedestrian detection. IEEE Transactions on Intelligent Trans-
portation Systems, 1(3):148–154, 2000.
[23] Y. Zheng, C. Shen, and X. Huang. Pedestrian detection us-
ing center-symmetric local binary patterns. In Proceedings of
the International Conference on Image Processing, ICIP 2010,
September 26-29, Hong Kong, China, pages 3497–3500, 2010.
[24] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao. Orientation
robust object detection in aerial images using deep convolu-
tional neural network. In 2015 IEEE International Conference
on Image Processing, ICIP 2015, Quebec City, QC, Canada,
September 27-30, 2015, pages 3735–3739, 2015.
2930