FPGAbased pedestrian detection using array of covariance features.
ABSTRACT In this paper we propose a pedestrian detection algorithm and its implementation on a Xilinx Virtex4 FPGA. The algorithm is a sliding windowbased classifier, that exploits a recently designed descriptor, the covariance of features, for characterizing pedestrians in a robust way. In the paper we show how such descriptor, originally suited for maximizing accuracy performances without caring about timings, can be quickly computed in an elegant, parallel way on the FPGA board. A grid of overlapped covariances extracts information from the sliding window, and feeds a linear Support Vector Machine that performs the detection. Experiments are performed on the INRIA pedestrian benchmark; the performances of the FPGAbased detector are discussed in terms of required computational effort and accuracy, showing stateoftheart detection performances under excellent timings and economic memory usage.

Conference Paper: FPGAbased RealTime Pedestrian Detection on HighResolution Images
[Show abstract] [Hide abstract]
ABSTRACT: This paper focuses on realtime pedestrian detection on Field Programmable Gate Arrays (FPGAs) using the Histograms of Oriented Gradients (HOG) descriptor in combination with a Support Vector Machine (SVM) for classification as a basic method. We propose to process image data at twice the pixel frequency and to normalize blocks with the L1Sqrtnorm resulting in an efficient resource utilization. This implementation allows for parallel computation of different scales. Combined with a timemultiplex approach we increase multiscale capabilities beyond resource limitations. We are able to process 64 high resolution images (1920 × 1080 pixels) per second at 18 scales with a latency of less than 150 u s. 1.79 million HOG descriptors and their SVM classifications can be calculated per second and per scale, which outperforms current FPGA implementations by a factor of 4.IEEE Conference on Computer Vision and Pattern Recognition Workshops (Embedded Computer Vision), Portland, Oregon; 06/2013
Page 1
FPGABased Pedestrian Detection Using Array of
Covariance Features
Samuele Martelli∗, Diego Tosato∗, Marco Cristani∗†, Vittorio Murino∗†
∗Department of Computer Science, University of Verona, Italy
name.surname@univr.it
†Istituto Italiano di Tecnologia (IIT), Genova, Italy
name.surname@iit.it
Abstract—In this paper we propose a pedestrian detection
algorithm and its implementation on a Xilinx Virtex4 FPGA.
The algorithm is a sliding windowbased classifier, that exploits
a recently designed descriptor, the covariance of features, for
characterizing pedestrians in a robust way. In the paper we show
how such descriptor, originally suited for maximizing accuracy
performances without caring about timings, can be quickly
computed in an elegant, parallel way on the FPGA board. A
grid of overlapped covariances extracts information from the
sliding window, and feeds a linear Support Vector Machine that
performs the detection. Experiments are performed on the INRIA
pedestrian benchmark; the performances of the FPGAbased
detector are discussed in terms of required computational effort
and accuracy, showing stateoftheart detection performances
under excellent timings and economic memory usage.
I. INTRODUCTION
Pedestrian detection is undoubtedly one of the most impor
tant tasks in Computer Vision [1], [2]; it is pervasive, as it is
exploited as a key operation in video surveillance, automotive,
robotics and in many other applicative fields. Person detection
represents a hard issue due to the large variability of the target
of interest, which is subjected to occlusions, strong variations
in pose, shape, and appearance. For all these reasons, a large
spectrum of solutions have been proposed so far: roughly
speaking, the typical approaches are characterized by a classi
fier that operates by means of a sliding window over the image.
The classifier may be fed with heterogeneous features, e.g.,
Haarlike features [3], [4], Histograms of Oriented Gradients
(HOG) [5], or binary descriptors [6], to cite a few.
In many cases, the focus of attention in the design of a
detector is merely the achieved accuracy. Nevertheless, in the
most scenarios the speed is also a primary requirement; any
way, in the literature, this latter aspect received less attention
than the accuracy factor.
This motivates and encourages the research on hardware
acceleration. Few embedded pedestrian detection frameworks
are present in the literature [7], [8], mostly based on Field Pro
grammable Gate Arrays (FPGA) architectures and HOGlike
descriptors. Many proposed algorithms start by considering a
software algorithm with stateoftheart performances, oppor
tunely revised until realtime performances are accomplished.
It is worth noting that accuracy performances of hardware
implementations are usually not reported in the literature, even
if the simplifications made w.r.t the original software version
are substantial.
In this paper, we present a pedestrian detection algorithm in
spired from a detection technique using feature covariance ma
trices as descriptors [9], [10], and then we detail its hardware
implementation, in an FPGAbased embedded architecture.
The covariance of image features is a compact descriptor that
recently received the attention of many researchers, because of
its interesting properties: it allows to highlight different aspects
of the objects of interest through the use of heterogeneous
features, and it encodes how they are statistically related, in a
very robust manner. This makes such descriptor applicable in
noisy situations, at different resolutions.
The software algorithm is composed by five modules, as so
as its hardware counterpart, and the first notable contribution
is a brand new object model, which is an ensemble of
local covariances, that allows an elegant and natural hardware
implementation. The second contribution lies in the hardware
implementation, that is, in the management of covariance data
without relying on complex projection operations (covariances
live in fact in a Riemannian Manifold). We assume them as
vectors in an Euclidean space, and this is motivated by a
geometrical reasoning.
Experimental results have been provided both for the soft
ware version and its hardware translation. For the latter case,
accuracy rates, usage of hardware resources and timing are
presented. In particular, the system has been tested on the
INRIA pedestrian dataset, where stateoftheart detection per
formances are reached by both the versions with the advantage
of real time performances.
The rest of the paper is organized as follows. In Section II,
we describe the software version of our algorithm. The hard
ware implementation design is reported in Section III. Exper
iments are presented in Section IV, and, finally, Section V
concludes the paper, emphasizing future perspectives.
II. OUR SOFTWARE SOLUTION
Our classification algorithm operates under a sliding win
dow philosophy, where a window is a rectangular region
of size 128 × 64 pixels that scans the entire image with a
horizontal and vertical stride of 8 pixels. The input image at
VGA resolution is downsampled several times with a scale
factor of 1.2 in order to detect objects at different scales.
We encode the human appearance employing a composite
object model. Given an image region selected by the sliding
Page 2
Input Image
(WxH)
Features
Extraction
Tensors
FPGA Virtex4 LX80T
Covariance
Matrix
Covariance Matrix
4 Units
Classifier
object descriptor:
h x w
covariance
matrices
Linear SVM
Covariance
Array
h x W
covariance lines
Buffer
Detection Window:
h x w
covariance matrices
Secondorder Tensors: Q
11 MACs
21 Registers
Firstorder Tensors: p
6 ACCs
6 Registers
Position, confidence, label
Fig. 1.Block Diagram of the proposed object detection system.
window, a set of covariance of features is calculated on 16×16
image patches, called blocks, overlapped by half their sizes in
both the x and y directions (see Fig. 2). The idea is that each
localized covariance will encode a particular portion of the
human appearance.
Our classifier is composed by five modules (see Fig. 1),
four of them employed at building the object model as fast
as possible, and the last one is the classification module.
The first module extracts features from the raw image in a
rowwise raster scan fashion. The second module exploits the
fact that covariances may be computed by adopting integral
representations under the form of tensors, calculating them
on small portions of the image, called cells (see Fig. 2). The
third module exploits the tensors of the cells for calculating the
overlapped covariance matrices. Each covariance focuses on a
block. The fourth builds the entire object model, collecting all
the covariance matrices and obtaining a feature vector which
is fed into the last module, that performs the classification.
In the following, each module will be explained in detail.
A. Features Extraction
On the analysis window we perform feature extraction,
sequentially for each pixel, in a rowwise fashion. For each
pixel we extract:
?
where Ix,Ixx, etc. are first and secondorder derivatives of
the image intensity, and the last term represents the edge
orientation. The choice of the features and their number, d = 6,
is somewhat limited by the available FPGA resources, as
each feature {Φi}i=1,...,d requires separate buffers, registers,
combinatorial logic, and other resources, as we will see in the
next section. Each pixel processed by this module becomes a
feature vector which is sent to the next module.
Φ = [Ix Iy Ixx Iyy
I2
x+ I2
yarctanIx
Iy
]T,
(1)
8 pixels
8 pixels
16 pixels
cell
Block
Covariance Matrix
16 pixels
Block Descriptor
Detection Window
7 Blocks
Array of Covariance Features:
Cell  Block  Covariance Matrix
64 pixels
15 Blocks / 128 pixels
Fig. 2.Object Model: cell, block and covariance matrix concepts.
B. Integral Representation
An important brick of our algorithm is the block B, a
16×16 image region over which the covariance of features is
calculated (see Fig. 2). Since the image is scanned in a row
wise direction, 16 lines of feature vectors should be buffered.
This buffering is too expensive in hardware. The solution
comes from the integral representation [11]. For each block,
we have to compute the firstorder integral tensor p which is
a ddimensional vector encoding the sum of each feature Φj
of all pixels in the block:
pj=
x,y∈B
and then Q, the d×d secondorder integral tensor defined by
Qjk=
x,y∈B
The tensors allow to compute covariances in a very fast way,
as shown in the following. In addition, they naturally serve
?
Φj(x,y),
j = 1,...,d
(2)
?
Φj(x,y)Φk(x,y).
j,k = 1,...,d
(3)
Page 3
to avoid redundancies in the covariance calculations: roughly
speaking, the areas of overlap among blocks, called cells,
are processed only once, and composed together specifically
for computing the covariance of each block. A cell is an
8 × 8 region and it is shown in Fig. 2. Using the integral
representation, the tensors of the blocks are computed in a
constant time combining the tensors of four adjacent cells as
follows:
pBlock= pCell1+ pCell2+ pCell3+ pCell4,
QBlock= QCell1+ QCell2+ QCell3+ QCell4.
(4)
Block tensors are then passed to the next module as soon as
available, and any feature bufferization is required.
C. Covariance Computation
The Covariance Computation module calculates the covari
ance matrix C for each block (see Fig. 2) by using integral
representation:
?
where S = 16×16 is the number of pixels inside the block B.
In our software implementation, since covariance matrices live
in a Riemannian manifold, a logarithmic mapping is applied to
project the covariance matrix to the local tangent space [12],
in order to employ the Euclidean norm as basic brick in the
classification. The projection simplifies the topology of the
original data, so that the more complex will be the source
manifold, the stronger will be the simplification. However,
the exhaustive experimental analysis [13] showed that such
manifold for pedestrian is essentially flat.
C =
1
S − 1
Q −1
Sp pT
?
,
(5)
D. Covariance Array
In this module, all the projected covariances are concate
nated and organized as a single vector, the object descriptor
x, and fed into the classifier.
E. Classifier
In the last stage of the pipeline the descriptor x is processed
by using a binary classifier. For each region analyzed, a
confidence value and a binary label is assigned. In the current
implementation a linear SVM classifier is employed, applying
the following equation:
confidence = wTx + b,
(6)
where w are the weights calculated in the training phase.
The linear kernel is particularly suitable for hardware im
plementations since it requires simple scalar products and an
accumulator. Because of no calculation dependencies between
any element of x, all operations can ideally be performed in
parallel.
III. HARDWARE IMPLEMENTATION
The previous algorithm has been revised for the hardware
implementation. In general, the proposed architecture has been
designed trying to minimize the logic area, in order to be
integrated in a smart camera with extended functionalities. We
implemented and tested the system on a Field Programmable
Gate Array (FPGA), specifically, a Xilinx Virtex4 LX80. The
FPGA device is part of the Alpha Data development board
which includes 24MB of fast internal memory, distributed
on six SSRAM blocks, two video inputs and several other
interfaces and modules. The SRAMs modules stores the input
video or still image data that is sourced to the pedestrian
detector. Data may be live video sourced through one of the
available video inputs or archived content transferred from
the host PC. In our experimental results we evaluated the
performance of the pedestrian detection system assuming data
are always available in the SRAM memory of the board. The
section aims at describing the architecture implementation,
the parallelism extraction techniques and the optimizations
achieved.
A. Memory Management and Parallelism Extraction
1) Image Buffers: To increase the throughput of the circuit
avoiding latencies, two SRAMs are allocated and involved in
a “pingpong” buffer strategy. The system operates on frame
N, stored in one SRAM block, while the other memory is
filled with incoming pixels of frame N + 1. A controller
reads the image pixels by alternatively scanning one of the
two image frame buffers and sending pixels and control
signals to the feature extraction module. It is worth to notice
how the possibility to store at least one frame into fast
onboard memories, connected with the FPGA programmable
logic through high speed buses, has a major impact on the
implementation of computer vision algorithms.
2) Features Extraction: In our implementation vertical and
horizontal gradients are computed in parallel at high speed
with the 3×3 Sobel operator, using FPGA logic. Once the first
order image derivatives are computed, an equivalent circuit
is used to extract the secondorder derivatives in parallel
with intensity and phase calculation (Eq. 1). The first order
derivatives are delayed in order to be synchronized with the
secondorder ones.
The Sobel filters are designed to produce one pixel in
output per clock cycle, as long as consecutive local areas (i.e.,
3x3 neighborhood areas every clock cycle) are continuously
provided in input. A specific controller fetches every pixel
and the related neighborhood to be processed from a 3line
memory buffer, and is responsible for maintaining this
continuous flow of sequential neighborhood areas. The 3line
buffers are implemented using fast dualport block RAMs
(BRAM). This memories have two independent ports that
enable shared access to a single memory space, thus the first
and the second line buffers are simultaneously read at the
same time. The third line is the one directly fetched from the
frame buffer. The FPGA performs this processingintensive
Page 4
pixellevel analysis of each frame at different scales, and
transforms the data from a 8bit grayscale image pixels into
feature vectors of size d for the downstream processing. The
accuracy of this step is fundamental since it dramatically
influences the accuracy of the resulting covariance matrices.
3) Integral Representation: In our system the feature ex
traction task is the most suitable to be implemented on high
parallelizable architectures. However the management of the
features data is critical. For each incoming image pixel, six
feature values are generated, thus it is not feasible to bufferize
the features to compute the covariance matrix. This problem
has been solved by using the integral representation and
applying parallelizations. The cell tensor calculation process
depends on the feature extraction module, since it can not
start until the first feature vector is available. Through careful
synchronization of the different tasks to allow hazardfree
concurrency, the cell tensor calculation process has been
slightly delayed and partially parallelized with the feature
extraction module. Features vectors are processed as soon as
available, first compressed into cell tensors and then into block
tensors by appling Eq. (4).
In order to meet the realtime constraint a major latency
reduction optimization can also be realized extracting the
coarsegrained parallelism at task level by computing Q and
p in parallel, rather than in sequence, as done in the software
implementation. Moreover, an ulterior speedup is achieved
computing each element of Q in parallel, using only MACs1
and registers. This has been possible by exploiting the fine
grained parallelism at the operation level that can also be
reached in FPGA. The same applies for p.
In terms of hardware resources, taking advantage from
the symmetry of Q, only 21 MAC units and 21 registers
could be used. Instead 6 ACC and 6 registers in parallel are
necessary for p. However Time Division Multiplexers (TDM)
and high performance XtremeDSPTMDSP48 slices have
been used to further minimize the expected number of MAC
units. XtremeDSPTMDSP48 slices, available on Xilinx
FPGA boards, work up to 500 MHz and allow designers to
implement multiple slower operations using timemultiplexing
methods. We have succesfully exploited these capabilities
implementing a module which calculates two MAC operations
in one pixelclock cycle, using a single XtremeDSPTM
instead of two multipliers, without decreasing the overall
circuit performances. Since our architecture is demanding in
terms of multipliers, which are a limited resources even on
latest highend FPGA boards, this optimization has a major
impact on the whole design.
Summarizing, for tensor computation task a speedup of
36× is achieved by exploiting parallelism and 11 MACs
instead of 21 are used by taking advantage from accurate
management of hardware resources.
1MAC: Multiply/Accumulate unit, computes the product of two numbers
and adds that product to an accumulator (ACC). Adopted for Eq. (3)
implementation.
4) Covariance Computation: The most important note of
the implementation of this module in hardware is that, due
to the flatness of the manifold in which covariances lie, we
assume the upper triangular part of the covariance as a vector
in the Rd·(d+1)/2Euclidean space, where d · (d + 1)/2 = 21.
We have defined blocks as regions of size 16 × 16 with 8
pixels overlap in both vertical and horizontal directions. Since
the image is scanned in a rowwise direction, a covariance
matrix of 21 elements has to be computed in 8 clock cycles.
Only a covariance value per clock cycle can be computed
if no upsampling is introduced, but 21 values are required.
However we reached this timing constraints by using registers
as memories and exploiting parallelism at the operation level.
Since the block tensor calculation module (see Sec.IIB)
stores the tensors values in registers, the tensors buffer can
be accessed simultaneously by multiple units without any
memory access conflicts. At least three units have to be
implemented, processing in parallel and computing up to 24
values in 8 clock cycle. However during the design of this
module future improvements and enhancements of the system
have already been taken into account. The circuit has been
oversized with four units to compute up to 32 values. With
this architecture it is possible to upgrade the system to use
seven features instead of six, without changing any element
of the actual circuit.
5) Covariance Array: At instant N the Covariance Array
module stores the incoming vector cN ∈ R21in memory
and the related controller generates the control signals for
moving and storing the values next to the previous ones
cN−1 ( the descriptor of its overlapping block on the left).
Our object model is composed by h × w blocks, respectively
in vertical and horizontal direction.Defining W the number
of blocks columns in the frame, according to our object
model, at least h × W vectors need to be bufferized before
the detection window can start the processing. The controller
generates the control signals for reading the vectors according
to the current position of the detection window. The basic
procedures of image filtering are applied here to this particular
data structure, where the mask is replaced by the detection
window and each pixel by a vector c. The buffer is scanned
by the detection window with a stride period of 64 pixel
clocks. In the same time interval the Covariance Array
module read h×w vectors, corresponding to a 2205 elements
object descriptor vector x, and send it to the classifier.
6) Classifier: A major latencyreduction optimization in
the classification task is the execution of scalar products in
parallel, rather than in sequence, as is done in the software
implementation. The complexity, hardware resources and par
allelism effort required to meet the realtime constraints are
proportional to the number of features of the descriptor vector,
that define the number of scalar products to be performed.
Our object model is an array of fifteen rows and seven
columns of blocks, centered on a regular grid of 8 pixels
in both directions.Processing all blocks in parallel requires
Page 5
a huge number of hardware resources, especially in terms
of multipliers. Thus we design the classifier as fifteen units
processing in parallel, each one specialized on seven blocks.
For each unit, 147 scalar products and accumulations are
performed. The confidence value of a candidate window is the
sum of the results of 15 parallel classifiers. For each region
analyzed, a condence value and a binary label is assigned.
IV. EXPERIMENTAL RESULTS
Experiments on the INRIA Data Set: We evaluated the
detection performance of our algorithm, both in its software
version and its hardware implementation. We consider the
INRIA pedestrian dataset [5]. The dataset is composed by
a training and a test set. The training set contains 1212
human images of size 96 × 160 pixels (a margin of 16
pixels around each side), and 1218 high resolution images
of different sizes for the background. The positive examples
have been doubled in size by using leftright reflections since
the images are centered on the person. Then each example
has been shifted by two pixels in vertical and horizontal
direction collecting a total of 21816 positive examples. The
height of pedestrian samples is approximately 96 pixels, most
of them standing in upright position. However we pick a
bigger region of 128 × 64 pixels around each pedestrian,
since this border provides a significant amount of context that
improves the detection performance [5]. Patches of the same
size, extracted from background images, have been used as
negative examples.
The test set contains 566 pedestrian samples ( 1132 with
reflections ) extracted from 288 positive images, and 453
negative images.
The linear SVM classifier has been trained using Liblinear
SVM tool [14] running on offtheshelf Intel( c ?) Xeon( c ?)
CPU 2.33 GHz with 8 GB of RAM. To avoid overflow issues
due to the large pool of positive and negatives training sets,
bootstrapping is employed: an initial SVM classifier is trained
with the positive images and 12000 background patches ran
domly selected from the image database. Afterwards it is used
to classify patches of nonpedestrian extracted from the 1200
background images. A set of 30000 falsepositive are collected
and added to the initial negative training set. The process has
been repeated three times until no significant improvement of
the performance of the classifier has been noted.
Few pedestrian detection systems have been fully imple
mented in FPGA so far, but any of those provide information
on accuracy to be directly compared with our system.
In Fig. 3, we compare our software implementation with the
best software implementations on the INRIA dataset, whose
plots are extracted from [1] and [11]. The performances are
evaluated using the detection error tradeoff (DET) curve,
which is estimated by varying the SVM confidence threshold
in the range [−5,5]. The yaxis corresponds to the miss rate
and the xaxis corresponds to falsepositives per window.
Lower values are better. Our classifier, define stateoftheart
10
−4
10
−3
10
−2
10
−2
10
−1
10
0
False Positives Per Window (FPPW)
miss rate
Tuzel−Cov
Ker.R−HOG
Lin.R−HOG
Our CPU−Cov−SVM
VJ
Fig. 3.INRIA per window results. See text for details.
10
−4
10
−3
10
−2
10
−2
10
−1
10
0
False Positives Per Window (FPPW)
miss rate
Our CPU−Cov−SVM
Our HW−Cov−SVM
Bauer et al. [12]
Fig. 4.Comparison between hardware and software implementation.
performance, especially in terms of missrate. The covariance
based approach proposed in [9] outperforms the others, but
this method is not suitable for an hardware implementation,
especially because the number of covariance matrices consid
ered is huge and is not known a priori (see the original paper
for further details). It is also worth to noting that our SVM
classifier took less than two hours to be trained, compared to
days in [9].
In Fig. 4 our hardware implementation is compared with
the software version, including the performance of [15], a
pedestrian detection system partially implemented in FPGA
and GPU. As for the previous case, the DET curves are used.
If we consider 10−4as an acceptable FPPW, our hardware
implementation has 9% more miss rate than the software
one. The gap in performances is easily justifiable, since
the geometry of the Riemannian manifold is ignored in the
hardware implementation. As a result, the hardware resources
and the system latency are significantly reduced, with a
minor lost in terms of accuracy. We are currently working on
alternative solutions to the eigenvalue decomposition required
Page 6
to compute the logarithm projection of the covariance matrix,
trying to fill the gap between the two implementations without
major impacts on the hardware resources and performance.
The Bauer’s implementation is based on HOG descriptors fed
to a Gaussian kernel SVM classifier. Its miss rate is worst
with respect to our hardware implementation even considering
a FPPW 10−3.
Resource Type
Number of BRAMs
Number of DSP48s
Number of Slices
QVGAVGA
54 of 200 (27%)
47 of 80 ( 58%)
21329 of 35840 ( 59%)
73 of 200 (37%)
47 of 80 ( 58%)
25745 of 35840 ( 72%)
TABLE I
FPGA RESOURCE OCCUPATION
Hardware Resources and Performance: Table I shows the
device utilization characteristics of our complete pedestrian
recognition system. Two implementations have been reported:
one implementation for 320 × 240 (QVGA) resolution im
ages and the other for 640 × 480 (VGA) resolution images.
Both pedestrian classification systems can be implemented
in Virtex4 LX80 FPGA board. It is evident how the image
resolution affects the number of memories and slices required,
but it does not hold for multipliers, which only depends on the
covariance matrix size. According to implementation results,
the maximum frequency is 213 MHz.
V. CONCLUSIONS AND FUTURE PERSPECTIVES
In this paper, we presented a pedestrian detection algorithm,
and its embedded version. The most notable contribution are
the implementation in hardware of the covariance matrices
of features as object descriptors, and their approximated
usage, motivated by geometrical considerations. The results
are encouraging, especially for the hardware implementation,
that reaches the absolute best among hardware competitors
on the INRIA dataset. Many are the future developments.
In particular, we performed preliminary study on the use
of alternative classifier, in substitution of the SVMs, as the
random forests [16], that have performances comparable to
those of SVMs, but are faster, and particularly suited for an
hardware design.
REFERENCES
[1] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A
benchmark,” in Proc. CVPR. IEEE, 2009, pp. 304–311.
[2] M. Enzweiler and D. Gavrila, “Monocular pedestrian detection: Survey
and experiments,” IEEE Trans. PAMI, vol. 31, no. 12, pp. 2179–2195,
2009.
[3] A. Mohan, C. Papageorgiou, and T. Poggio, “Examplebased object
detection in images by components,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 23, no. 4, pp. 349–361, 2001.
[4] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns
of motion and appearance,” International Journal of Computer Vision,
vol. 63, no. 2, pp. 153–161, 2005.
[5] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in Proc. CVPR, vol. 1.
[6] Y. Zheng, C. Shen, and X. Huang, “Pedestrian detection using center
symmetric local binary patterns,” in Proc. ICIP.
Press, October 2010.
[7] M. Hiromoto and R. Miyamoto, “Hardware architecture for high
accuracy realtime pedestrian detection with CoHOG features,” in Proc.
ICCV Workshops. IEEE, 2009, pp. 894–899.
[8] R. Kadota, H. Sugano, M. Hiromoto, H. Ochi, R. Miyamoto, and
Y. Nakamura, “Hardware Architecture for HOG Feature Extraction,”
in Proc. IIHMSP. IEEE, 2009, pp. 1330–1333.
[9] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptor
for detection and classification,” in Proc. ECCV.
589–600.
[10] S. Paisitkriangkrai, C. Shen, and J. Zhang, “Performance evaluation of
local features in human classification and detection,” IETCV, vol. 2, pp.
236–246, 2008.
[11] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via classification
on riemannian manifolds,” Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 30, no. 10, pp. 1713–1727, 2008.
[12] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric means in
a novel vector space structure on symmetric positivedefinite matrices,”
SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 1, p.
328, 2008.
[13] D. Tosato, M. Farenzena, M. Cristani, M. Spera, and V.Murino, “Multi
class classification on riemannian manifolds for video surveillance,” in
Proc. ECCV, 2010.
[14] R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin,
“LIBLINEAR: A library for large linear classification,” Journal of
Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[15] S. Bauer, S. Kohler, K. Doll, and U. Brunsmann, “FPGAGPU archi
tecture for kernel SVM pedestrian detection,” in Proc. CVPRW. IEEE,
pp. 61–68.
[16] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
5–32, 2001.
IEEE, 2005, pp. 886–893.
Hong Kong: IEEE
Springer, 2006, pp.