Available via license: CC BY 3.0
Content may be subject to copyright.
Research Article
System Architecture for Real-Time Face Detection on
Analog Video Camera
Mooseop Kim,
1
Deokgyu Lee,
2
and Ki-Young Kim
1
1
Creative Future Research Laboratory, Electronics and Telecommunications Research Institute, 138 Gajeongno,
Yuseong-gu, Daejeon 305-700, Republic of Korea
2
Department of Information Security, Seowon University, 377-3 Musimseo-ro, Seowon-gu, Cheongju-si,
Chungbuk 361-742, Republic of Korea
Correspondence should be addressed to Deokgyu Lee; deokgyulee@gmail.com
Received October ; Accepted March
Academic Editor: Neil Y. Yen
Copyright © Mooseop Kim et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
is paper proposes a novel hardware architecture for real-time face detection, which is ecient and suitable for embedded systems.
e proposed architecture is based on AdaBoost learning algorithm with Haar-like features and it aims to apply face detection to a
low-cost FPGA that can be applied to a legacy analog video camera as a target platform. We propose an ecient method to calculate
the integral image using the cumulative line sum. We also suggest an alternative method to avoid division, which requires many
operations to calculate the standard deviation. A detailed structure of system elements for image scale, integral image generator, and
pipelined classier that purposed to optimize the eciency between the processing speed and the hardware resources is presented.
e performance of the proposed architecture is described in comparison with the detection results of OpenCV using the same
input images. For verication of the actual face detection on analog cameras, we designed an emulation platform using a low-cost
Spartan- FPGA and then experimented the proposed architecture. e experimental results show that the processing time for
face detection on analog video camera is frames per second, which is about times faster than previous works for low-cost face
detection.
1. Introduction
Face detection is the process of nding the locations and sizes
of all possible faces in a given image or in video streams. It
is the essential step for developing many advanced computer
vision and multimedia applications such as object detection
and tracking [–], object recognition [–], privacy masking
[], and video surveillance [, ]. e object detection scheme
proposed by Viola and Jones []isoneofthemostecient
and widely used techniques for the face detection from the
advantage of its high detection rate and fast processing.
e recent trend in video surveillance is changed to
IP network cameras. But the analog video cameras are
still widely used in many surveillance services because of
availability, cost, and eectiveness. e face detection now
available with IP cameras is not available in analog video
cameras. We would like to propose an alternative method,
which can provide face detection in analog video cameras.
Recently, real-time processing of the face detection is
required for embedded systems such as security systems
[], surveillance cameras [],andportabledevices.e
challenges of the face detection in embedded environments
include an ecient pipelined design, the bandwidth con-
straints set by low cost memory, and an ecient utilization
of the available hardware resources. In addition, consumer
applications require the reliability to guarantee the process-
ing deadlines. Among these limitations, the main design
concerns for face detection on embedded system are circuit
area and computing speed for real-time processing. e
face detection, however, essentially requires considerable
computation load because many Haar-like feature classiers
check all pixels in the images. erefore, the design methods
Hindawi Publishing Corporation
International Journal of Distributed Sensor Networks
Volume 2015, Article ID 251386, 11 pages
http://dx.doi.org/10.1155/2015/251386
International Journal of Distributed Sensor Networks
to achieve the best trade-o between several conicting
design issues are required.
Currently, Viola-Jones face detection scheme is used in
personal computer system in the form of the Open Computer
Vision Library (OpenCV) []. However, the implementation
of the OpenCV’s face detection on the embedded system is
not a suitable solution because the computing power of the
processor used in an embedded system is not as powerful as
those in PCs. is disparity between the real-time processing
andthelimitedcomputingpowerclearlyshowsthenecessity
of coprocessor acceleration for image processing on the
embedded system.
In general, the hardware system is implemented on an
application specic integrated circuit (ASIC) or on eld-
programmable gate arrays (FPGAs). Although slower than
ASIC devices, the FPGA has the advantage of the fast imple-
mentation of a prototype and the ease of design changes.
Recently, the improvement of the performance and the
increase of the density of FPGAs such as embedded memory
and DSP cores have led to this device becoming a viable and
highly attractive solution to computer vision [–].
e high-speed vision system developed so far accelerates
the computing speed by using massively parallel processors
[, ] or by implementing the dedicated circuits in recon-
gurable hardware platform [–]. However, the previous
researches focused on the enhancement of the execution
speed instead of the implementation on the feasible area,
which is the real concern of embedded systems. Only few
attempts have been made to realize the Viola-Jones face detec-
tion scheme in embedded systems [–]. Although these
approaches have been tried to develop an eective detection
system for embedded system, their performance seems to be
still insucient to meet the real-time detection. erefore,
a spatially optimized architecture and a design method that
enhances the detection speed and can be implemented on a
small area are required.
In this paper, we present ecient and low-cost FPGA-
based system architecture for the real-time Viola-Jones face
detection applicable to legacy analog video cameras. e
proposed design made an eort to minimize the complexity
of architecture for implementing both integral image gen-
erator and classier. e main contributions of this work
are summarized as follows. Firstly, we propose an ecient
methodtocalculatetheintegralimageusingthecumulative
line sum. Secondly, we suggest an alternative method to avoid
division, which requires many operations to calculate the
standard deviation. e integral image window is generated
by the combination of the piped register array, which allows
fast computation of integral image as introduced in [],
and the proposed compact integral image generator. e
classier then detects a face candidate using the generated
integral image window. Finally, the proposed architecture
uses only one classier module, which consists of -stage
pipeline based on training data of OpenCV with stages
and , features. As the result of applying the proposed
architecture, we can design a physically feasible hardware
system to accelerate the processing speed of the operations
required for real-time face detection in analog video cameras.
2. Related Works
In this section, we rstly give a brief introduction of face
detection based on AdaBoost learning algorithm using Haar-
like features. en we evaluate the most relevant previous
works in the literature.
2.1. AdaBoost Face Detection. Viola and Jones proposed
a robust and real-time object detection method using
AdaBoost algorithm to select the Haar-like features and to
train the classier. ey selected a small number of weak
classiers and then these classiers combined in a cascade
to construct strong classiers. Haar-like feature consists of
several black and white rectangles and each feature has
predened number and size of rectangles. Figure (a) shows
some examples of Haar-like feature and Figure (b) shows
how Haar-like features are applied to a subwindow for face
detection.
e computation of a Haar-like feature involves the
subtraction of the sum of the pixel values in the black
rectangle from the sum of pixel values in the white rectangle
of the feature. To speed up the feature computation, Viola
and Jones introduced the integral image.eintegralimage
is a simple transformation of the original input image to the
alternative image where each pixel location represents the
sumofallthepixelstotheleandaboveofthepixellocation
as shown in Figure (c). is operation is described as follows,
where (,)is integral image at the location of (,)and
(,)isapixelvalueintheimage:
,=
𝑖≤𝑥
𝑗≤𝑦
,.
()
e computation of the integral image for rectangle
in Figure (d) canbecalculatedbytwoadditionsandtwo
subtractions using the integral image values of four corner
points: (
1
,
1
)−(
2
,
1
)−(
1
,
2
)+(
2
,
2
). erefore,
only four values are required to compute a rectangle area of
each feature regardless of the feature size.
To achieve fast detection, Viola and Jones also proposed
a cascade structure of classier. Each stage of the cascade
consists of a group of Haar-like features selected by AdaBoost
learning algorithm. For the rst several stages, the classiers
are trained to reject most of the negative subwindows while
detecting almost all face-like candidates. is architecture
can speed up the detection process dramatically because most
of negative images can be discarded during the rst two or
three stages. erefore, the computation eort can be focused
on face-like subwindows. Subwindows are sequentially evalu-
ated by stage classiers and the result of each Haar-like feature
in a stage is accumulated. When all the features in a stage are
computed, the accumulated value is compared with a stage
threshold value to determine if the current subwindow is a
face-likecandidate.esubsequentstageisactivatedonly
if the previous stage turns out to be a positive result. If a
candidate passes all stages in the cascade, it is determined that
the current subwindow contains a face.
2.2. Related Works. SinceViolaandJoneshaveintroduceda
novel face detection scheme, numerous considerable research
International Journal of Distributed Sensor Networks
(a) (b)
P
y
x
(c)
A
B
C
D
x
1
x
2
y
1
y
2
P1
P2
P3
P4
(d)
F : Haar-like features: (a) examples of Haar-like features, (b) Haar-like features applied to a subwindow, (c) integral image of pixel
(,), and (d) integral image computation for rectangle = 1−2−3+4,where1, 2, 3,and4are the integral image at
coordinates (
1
,
1
), (
2
,
1
), (
1
,
2
),and(
2
,
2
),respectively.
eorts have already expended on the ecient implementation
of their scheme. Most of these literatures mainly focused on
the optimization of feature calculation and cascade structure
of classiers because they are the most time consuming part
in the detection system.
Lienhart and Maydt [] were the rst to introduce the
face detection algorithm into Intel Integrated Performance
Primitives, which was later included in OpenCV library [].
e optimized code for x architecture can be detected
accurately on an image of × with GHz Pentium-
processor in real-time. However, on embedded platforms,
thisperformanceismuchpoorerthanonthedesktop
platform. A MHz ARM processor can only detect the
same resolution image at the speed of fps, which is far
from real-time execution. is means that the face detection
is still a time consuming process on embedded platforms.
In order to detect a face in an image, a massive number of
subwindows within each image must be evaluated. erefore,
a hardware design of AdaBoost algorithm could be an alter-
native solution for embedded systems. eocharides et al.
[] proposed a parallel architecture using a structure called
CDTU (Collection and Data Transfer Unit) array on an ASIC
platform to accelerate the processing speed. e simulation
results in their paper reported that they obtained a rough
estimate of frames per second targeting MHz clock
cycle. However, the CDTU architecture consumes massive
hardware resources, which is dicult to adopt in embedded
systems. Moreover, VLSI technology requires a large amount
of development time and cost and it is dicult to change
design. Recently, much attention has been paid to implement
a face detection system using FPGA platform. FPGA can
provide a low-cost platform to realize the face detection
algorithm in a short design time with the exibility of ne
tuning the design for more parallel operations as needed.
In recent years, new generations of FPGAs with embedded
DSP resources have provided an attractive solution for image
and video processing applications. In the work presented
byShietal.[] some optimization methods are suggested
to speed up the detection procedure considering systolic
AdaBoost implementations. e proposed work introduces
International Journal of Distributed Sensor Networks
two pipelines in the integral image array to increase the
detection process: a vertical pipeline that computes the
integral image and a horizontal pipeline that can compute a
rectangle feature in one cycle. However, their results are not
hardware implementation but come from the cycle accurate
simulation. Cho et al. [] proposed a parallelized architec-
ture of multiple classiers for a face detection system using
pipelined and parallel processing. ey adopted cell array
architecture for the main classication modules. e integral
imageisgeneratedforeachsubwindowandisthenusedfor
classication through a cell array. Recently, Hiromoto et al.
[] proposed a hybrid model of face detection architecture
consisting of parallel and sequential modules. To achieve high
speed detection, the parallel modules are assigned to the early
stages of the algorithm which are frequently used whereas
the latter stages are mapped onto sequential modules as they
are rarely executed. However, the separation of the parallel
and sequential stages requires additional hardware resources
to hold the integral image values for current subwindow to
process in sequential stages while parallel stages compute a
new subwindow. Moreover, the detailed experimental results
and analysis of the implemented system are not discussed.
Lai et al. []presentedahardwarearchitecturethat
employs very similar architecture to the ones presented in
[, ].eyusedapipedregistermoduleforintegralimage
calculation with columns and rows. According to their
report, it can achieve theoretical fps detection for ×
-pixel images. However, they used only classiers in a
singlestage.Becauseoftheirsmallnumberofcascadestages
and classiers, their results show lower detection rate and
higher false alarm rate than OpenCV’s implementations. Gao
and Lu [] presented an approach to use an FPGA to acceler-
ate Haar-like feature based face detection. ey retrained the
Haar-like classier with classiers per stage. However, only
classiers are implemented in the FPGA. e integral image
computing is processed in a host microprocessor.
e aforementioned approaches achieve fast detection
but they still require too much hardware resources to be
realized in embedded systems. Bigdeli et al. []studiedthe
eects of replacing certain soware bottleneck operations by
custom instructions on embedded processors, Altera Nios
II processor, especially the image resizing and oating point
operations, but did not fully implement the entire algorithm
in hardware. A simple version of the algorithm was proposed
in [] where techniques such as scaling input images and
xed-point expression to achieve fast processing with a
smaller circuit area were used. e architecture presented
in [] was reported to achieve fps at MHz. However,
the image size is too small ( × pixels) to be practical
and only three stages of classiers are actually implemented.
Another low-cost architecture implemented on an inexpen-
sive ALTERA Cyclone II FPGA was reported by Yang et al.
[]. ey used a complex control scheme to meet hard real-
time deadlines by sacricing detection rate. e frame rate of
this system is fps with low detection rate of about %. On
the other hand, Nair et al. [] proposed an embedded system
for human detection on a FPGA platform, which performed
on an input image of about pixels. However, the reported
frame rate was only . fps. Although they insist that their
implementations target embedded system, the disadvantage
of such designs is low pixel images or a low performance,
which is far from real-time detection in embedded systems.
3. Proposed Hardware Architecture
e structure of the proposed face detection system for
analog video camera is shown in Figure .Intheproposed
design, we made an eort to minimize the complexity
of the architecture for implementing each block. We also
considered the contribution for the trade-o between the
performance and the consuming hardware resources. e
proposed method is able to detect a face in conventional
analog video cameras without a large-scale change of the
system. Our method only reads the analog input video signal
through the physical wiring and detects faces in each input
frame. e detection result is then transmitted to a camera
or system by using the general-purpose serial interface.
erefore, there is almost no eect on the operation of
conventional systems.
e proposed architecture consists of three major blocks:
an image scale block (ISB), an integral image processing
block (IPB), and a feature processing block (FPB). For the
video image acquisition, we used a commercial device, which
supports the analog-to-digital conversion and the decoding
of a composite signal into a NTSC signal. e A/D converter
in the image acquisition module converts the analog image
signals into digital image data. e video sync module
converts digital image data to BT. video protocol.
e basic ow of the proposed architecture is as follows.
e ISB receives the input video frame and scales down the
frame image. Aer a frame image is stored, the operating
state is changed to calculate the integral images for each
subwindow of the current frame. e IPB is responsible for
generating and updating the integral image for the integral
imagearray.eoperationstateisthenmovedtotheface
detection state, which is processed in the FPB. e classier
in the FPB evaluates the integral image transferred from IPB
and provides the output of the detection results.
3.1. Image Scale Block (ISB). e ISB receives the input video
image frames from the image acquisition module and scales
down each frame image. e image scaling is repeated until
the downscaled image is similar to the size of the subwindows.
e video interface module of the ISB receives image frame in
therow-wisedirectionandsavesimagepixelsintotheframe
image buer.
e detailed structure of the video interface module
is shown in Figure . e data path receives BT. video
sequences and selects active pixel data regions in the input
video packet. e sync control generates the control signal to
synchronize the video frame using the embedded sync signals
of the BT. video sequences such as EAV (end of active
video) and SAV (start of active video) preamble codes. e
generated control signals are used to generate the address and
control signals for managing the frame image buer.
e frame image buer stores a frame image at the start of
operation. In general, the management of data transmission
International Journal of Distributed Sensor Networks
becomes an issue in embedded video system. e clock
frequency of inner modules of the face detection system
is running at higher speed than the clock rate of input
video sequence. Using a dual-port memory has proven to
be an eective interface strategy for bridging dierent clock
domain logics. e frame image buer consists of a dual-
port RAM. Physically, this memory has two completely
independent write and read ports. erefore, input video
streams are stored using one port and inner modules read out
the saved image using the other one.
e image scale module scales down the frame image
stored in the frame image buer by the scale factor of ..
To make downscaled images, we used the nearest neighbor
interpolation algorithm, which is the simplest interpolation
algorithm and requires a lower computing cost. is module
keeps scaling down until the height of scaled frame image is
similar with the size of the subwindow size ( × pixels).
erefore, the image scale module has scale factors (.
0
∼
.
10
)for× -pixel image. is module consists of
two memory blocks (mem and mem). Once the frame
image data is saved in frame image buer, the image scale
module starts to generate the rst scaled image, which is
saved in the mem. At the same time, the original frame
image is transferred to the IPB to start the computing of the
integral image. Aer all the computing of IPB and FPB over
the original image is nished, data stored in the mem are
transferredtoIPBandthesecondscaledimageiswrittenback
to the mem. is process is continued until the scaled image
is similar to subwindow size. e scale controller generates
control signals for moving and storing pixel values to generate
scaled images. It also checks the state of video interface and
manages the memory modules.
3.2. Integral Image Processing Block (IPB). e IPB is in
charge of calculating integral images, which are used for
classication process. Integral image is an image represen-
tation method where each pixel location holds the sum of
all the pixels to the le and above of the original image. We
calculate the integral image by subwindow separately and the
subwindow is scanned from the top to the bottom and then
from the le to the right in the frame image and the scaled
images. In the context of this work, the subwindow is dened
astheimageregionexaminedforthetargetobjects.e
proposed architecture used × pixels as a subwindow.
e integral image generator of IPB performs the precal-
culation to generate the integral image window. It computes
the cumulative line sum (CLS) for the single row ()ofthe
current processing subwindow as shown in (),where(,)
is the pixel value. Consider the following:
CLS ,=
𝑦
𝑖=1
(
,
)
.
()
e detailed structure of the integral image generator
is shown in Figure .elineregisterreadsonerowofa
subwindow from top to bottom lines. erefore, to store
the pixels of image data, the width of the line register
requires × bits. Each pixel value for a horizontal line is
accumulated to generate the line integral image.
e computing of the line integral image requires the
sum of -pixel data results in an increase of delay time. To
solve this problem, we used a tree adder based on carry-save
adders. e tree adder receives eight pixel values and a carry-
in as the input variables and outputs eight integral image
values corresponding to the input pixels. erefore, three
clock cycles are required to compute a line integral image.
is process is performed in parallel and at the same time
with the operation of the pipelined classier module. us,
when considering overall behavior of the system, it has the
eect of taking one clock cycle to compute the integral image.
e output of tree adder is fed into corresponding position of
the integral image line buer. Each pixel in the line integral
image buer has a resolution of bits to represent the bit
width of integral image pixels for × size of a subwindow.
When all the pixel values in the line integral image buer are
updated, the data will be transferred to the bottom line of the
integral image window.
e integral image window calculates the integral image
of the current subwindow. e structure for integral image
window is shown in Figure (a). e size of the integral
image window is identical to the size of subwindow except
the dierence of the data width for the pixel values.
e integral image window consists of × window
elements (WE) to represent the integral image for the current
subwindow. As shown in Figure (b), each WE includes a
data selector, a subtractor to compute the updated integral
image,andaregistertoholdtheintegralimagevaluefor
each pixel. e register takes the pixel value of the lower line
(
𝑖
)ortheupdatedpixelvalue(
𝑜
) subtracting the rst line’s
value (
1
) from current pixel value according to the control
signal. e control signal for data selection and data storing
is provided by the integral image controller in Figure .
e data loading for the integral image window can be
divided into initial loading and normal loading according
to the coordinate of the le-up corner of the current
processing subwindow in the frame image. e initial loading
is executed when the coordinate of the rst line in a
subwindow is zero (=0). is loading is performed for the
rst rows during vertical scan of the window in the original
and scaled images. In the initial loading, CLS values are fed
continuously to the bottom line of the integral image array
andsummedwiththepreviousvalueasshownin
,=,−1+CLS , if =24,
,=,+1 otherwise.
()
e WEs at the bottom line of the integral image array
accumulate these line integral images and output the accu-
mulated value to the next upper line. e other lines of the
integral image array shi current values by one pixel upward.
Aer clock cycles, each WE holds their appropriate
integral image values.
e normal loading is executed during the rest of the
vertical scan (>0). As the search window moves down one
line, it is required to update the corresponding integral image
array. e updating of the integral image array for normal
loading is prepared by shiing and subtracting as shown in
(),where(,)represents current integral image,
(,)
International Journal of Distributed Sensor Networks
Stage 1
Stage 2
Rejected subwindows
All subwindows
Frame image
Next
subwindow
Load T
F
T
F
F
T
Detect
face
Stage n
···
···
···
.
.
.
.
.
.
.
.
.
F : Cascade structure for Haar classiers.
Scale controller
Camera
Integral image controller
Pipelined
classier
Classier controller
Weig ht
Image scale block (ISB) Int. image processing block (IPB)
Feature proc. block (FPB)
A/D converter
Video sync
Integral image array
for subwindow computing
Feature
memory
Detection results
BT.656
Int. image generator
Composite
WE
Video interface
Frame image buer
Image scale
Squared image gen.
Var i ance
reshold
Positions
Le_right
Stage_th
w×h
.
.
.
F : Structural overview of the proposed face detection system.
Data path
Frame
image
buer
Sync control
adder
WE
BT.656 video
sequence
c
Y
C
in_buer
c
pixel_y
F : e block diagram of the video interface module.
15
15
15
24 pixels
8 pixels
18
line_register
carry_register
int_image_line_buer
To subwin. array
From image scale
To square
18 bit ×8
18 bits ×24
d
in
c
in
+
···
.
.
.
.
.
.
tree_adder (9:2)
F : e block diagram of the integral image generator.
Integral image window
WE
+
Local control
Con.
(a)
(b)
+ +
+
Integral image line buer
Line 1
Line 23
Line 24
+
WE
WE
WE
Reg.
···
···
···
···
.
.
.
S
i
E
o
E
i
S
o
−
F : Structure of the integral image window (a) and internal
implementations of window element (b).
means updating integral image, and means the line number
of the integral image window. Consider the following:
,=,−1−
(
,1
)
+CLS , if =24,
,=,+1−
(
,1
)
otherwise.
()
International Journal of Distributed Sensor Networks
During the normal loading, the rst lines for the
updatingintegralimagearrayareoverlappedwiththevalueof
the line to in the current integral image. erefore, these
valuesarecalculatedbysubtractingthevaluesoftherstline
(line)fromthelinetointhecurrentintegralimageand
then shi the array one line up. e last line of the updating
integralimageiscalculatedbyaddingtheCLSofthenewrow
with the last line of the current integral image array and then
subtracting the rst line of the current integral image array.
AdaBoost framework uses a variance normalization to
compensate for the eect of dierent lighting conditions.
erefore, the normalization should be considered in the
design of the IPB as well. is process requires the compu-
tation of the standard deviation ()forcorrespondingsub-
windows. e standard deviation ()andthecompensated
threshold (
𝑐
) are dened as (), where VAR represents the
variance, is the pixel value of the subwindow, and is the
area of the subwindow. Consider the following:
=
VA R =
1
⋅
2
−
1
⋅
2
,
𝑐
=
0
⋅.
()
e AdaBoost framework multiplies the standard devi-
ation () to the original feature threshold (
0
)giveninthe
training set, to obtain the compensated threshold (
𝑐
). e
computation of the compensated threshold is required only
once for each subwindow. However, the standard deviation
computing needs the square root operation. In general the
square root computing takes much hardware resource and
requires much computing time because of its computational
complexity. erefore, as an alternative method, we squared
bothsidesofthesecondlineof().Nowthecomputingloadis
converted to multiplying the variance with the squared value
of the original threshold. We expand this result once again
using the reciprocal of the squared value of the subwindow
area by multiplying to both sides to avoid costly division
operation as shown in
2
𝑐
=
2
0
⋅VA R ,
2
⋅
2
𝑐
=⋅
2
−
2
⋅
2
0
.
()
erefore we can eciently compute the lighting correc-
tion by just using the multiplication and subtraction, since the
subwindow size is already xed and the squared value of the
original threshold values can be precomputed and stored in
the training set.
e functional block diagram to map the method of ()
in hardware is shown in Figure , which is responsible for
handling the right part of the second line of ().esquared
image generator module, the le part of Figure ,calculates
the squared integral image values for the current subwindow.
e architecture of the squared image generator is identical to
that of the integral image generator, except for using squared
pixel values and storing the last column’s values. erefore,
the computing of the squared integral image requires pixel
values in the last vertical line of a subwindow. e output of
Square line reg.
Squared buer
h
Squared
win. array
Tree
adder
+ ×
N
+
+
×
Squared image
generator
Var i ance
···
···
−
Σx
i
t
2
0
Σx
2
i
^2
^2
F : Structure for squared image generator and variance
module.
thesquaredimagegeneratoristhesummationoftheintegral
image for squared pixel values (
∑
2
).
e other parts of () are computed in the variance
module, located in the right part of Figure . e upper input
ofthevariancemodulemeanstheintegralimagevalueof
the coordinate for (, ) of the integral image window. e
outputofthevariancemoduleistransferredtotheFPBto
check with the feature sum of all weighted feature rectangles.
3.3. Feature Processing Block (FPB). e FPB computes the
cascaded classier and outputs the detected faces’ coordinates
and scale factors. As shown in Figure ,theFPBconsistsof
three major blocks: pipelined classier, feature memory, and
classier controller.
e pipelined classier is implemented based on a seven-
stage pipeline scheme as shown in Figure .Ineachclock
cycle,theintegralimagevaluesarefedfromtheintegral
image window and the trained parameters of Haar classier
are inputted from the external feature memory. e classier
controller generates appropriate control signals for the mem-
ory access and the data selection in each pipeline stage.
Tocomputerectanglevalueforeachfeatureofthe
cascaded classier, FPB interfaces to the external memory
that holds the training data for the features’ coordinates
and the weights associated with each rectangle. We use one
o-chip, -bit-wide ash memory for the trained data for
coordinates, and weight of features. For storing other trained
data,weusedtheBlockRAMsinthetargetdevice.
e rst phase of the operation for the classiers is the
loading of the features data from the integral image window.
In each clock cycle, eight or twelve pixel values of integral
image for a feature are selected from the integral image array.
To select pixel values, the coordinate data corresponding to
each feature is required. We used a multiplexer to select or
pieces of data from the integral image array according to
the feature data stored in the external memory.
e rectangle computation takes place in the next pipeline
stage. As shown in Figure , two additions and a subtraction
are required for a rectangle computing. Each rectangle value
is multiplied with predened weight, also obtained from
the training set. e sum of all weighted feature rectangles
represents the result of one weak classier (V
𝑓
).
International Journal of Distributed Sensor Networks
Pipelined classier
MUX
Rectangle
computing
MUX
Position Weight Le Right
Stage
threshold
> >
Feature memory
Classier
controller
WE
values
Result
S
1
S
2
S
3
S
4
S
5
S
6
S
7
.
.
.
2
f
·N
2
f
V
L
V
R
t
s
W
1∼3
Σ
F : Block diagram for FPB.
e value of the single classier is squared and then
multiplied with the square value of the subwindow’s area (
2
)
to compensate the variance depicted as in (). is result is
comparedwiththesquareofcompensatedfeaturethreshold
(
𝑐
). If the result is smaller than the square of the compensated
feature threshold, the result of this Haar classier selects the
le value (
𝐿
), a predetermined value obtained from the
training set. Otherwise, the result chooses the right value
(
𝑅
), another predetermined value also obtained from the
training set. is le or right value is accumulated during
a stage to compute a stage sum. e multipliers and adders,
which require the high performance computing modules, are
implemented with dedicated Xilinx’s DSPE on-chip cores.
At the end of each stage, the accumulated stage sum is
compared to a predetermined stage threshold (
𝑠
). If the stage
sum is larger, the current subwindow is a successful candidate
region to contain a face. In this case, the subwindow carries
out the next stage and so on to decide if current subwindow
could pass all cascaded stages. Otherwise, the subwindow is
discarded and omitted from the rest of the computation.
4. Experiment Results
e proposed architecture was described using VHDL and
veried by the ModelSim simulator. Aer successful simula-
tion, we used the Xilinx ISE tool for synthesis. We selected
a recongurable hardware for a target platform because it
oers several advantages over xed logic systems in terms
of production cost for small volumes but, more critically,
inthattheycanbereprogrammedinresponsetochanging
operational requirements. As a target device, we selected the
Spartan- FPGA device because the Spartan series is low
priced with less capability than the Xilinx’s Virtex series.
4.1.SystemSetup.In order to evaluate the experimental
testing of our design, we designed an evaluation system. e
logical architecture of the evaluation system is depicted in
Figure .
We used a commercial chip for video image capture. e
TVP module in the bottom of the gure for system archi-
tecture receives the analog video sequence from video camera
and converts it to BT. video sequences. In all experiments,
FPGA was congured to contain a face detection core,
TI DM6446
Interrupt
ARM
processor
DSP
processor
Memory
controller
Clock
manage
VPBE
A/D conveter
Video decoder
X-tal
Sync
MCLK
TVP5146
Display
Camera
Y, C
SPI
Xilinx FPGA
Feature memory
VPFE
JTAG
Image
scale
Control
Integral
image
Classier
SPIVPFE
Int.
BT.656 Interface
Video input
Power
TVP5146
Spartan FPGA
DM6446
Video output
V
clk
F : System architecture and photo view of the evaluation
platform.
interface block, and feature memory except coordinate data
of features. e input stream converted to BT. format is
loaded into FPGA and each frame data is processed in the
face detection core. e detection results, the -bit data to
represent the coordinates and scale factors for each candidate
face, are transferred to the DM processor through SPI
interface. When it comes to the overhead of transferring the
detection data from FPGA to DM processor, the SPI
provides at least Mbps of bandwidth, which is sucient to
send the detection results. e FPGA outputs the detection
resultssuchasfacecoordinationandscalefactorthrough
serial peripheral interface (SPI) bus to DM processor.
e Video Processing Front-End (VPFE) module of the
International Journal of Distributed Sensor Networks
(a) (b)
F : Captured results for functional verication of the face detection between OpenCV (a) and the proposed hardware (b) on the same
input images.
DM processor captures the input image from FPGA.
en the ARM processor calculates the position and the
size of candidate faces and marks the face region on the
frame image based on the transferred detection results. It also
performs a postprocessing, which merges multiple detections
into a single face. Finally, the Video Processing Back-End
(VPBE) module outputs the result of the detector to a VGA
monitor, for visual verication, along with markings on
where the candidate faces were detected.
4.2. Verication and Implementation. We st arted from the
functional verication for the experiments of the proposed
structure. For the functional test, we generated the binary
les, which contain the test frame images using the MATLAB.
We used a sample of test images which contained several
faces, obtained through the internet and MIT + CMU test
images, and sized and formatted to the design requirements.
egeneratedbinaryleswerestoredastextlesandthen
loaded to frame image buer as an input sequence during the
detection process. We next proceeded to run the functional
RTL simulation of the face detection system using the
generated binary les from test images. e detection results,
which contain the coordinates and scale factors for face
candidates, were stored as text les. We used the MATLAB
program in order to output the result of the face detection
system to the tested input image for visual verication, along
with markings on where the candidate faces were detected.
For a fair verication of the functionality, we also tested the
same input images on the OpenCV program to compare the
T : Resource utilization of the Xilinx Spartan- FPGA.
Logic utilization Used Available Utilization
Slices , , %
Slice ip op , , %
-input LUT , , %
Block RAMs %
DSPs %
results of the face detection system as shown in Figure .In
such way, we could visually verify the detection results.
We also tried to achieve a real-time detection while
consuming a minimum amount of hardware resource. Table
shows the synthesis results and resource utilization of our
face detection system based on the logic element of the
target FPGA. Considering the essential element like memory
for saving the trained data, Table shows that the resource
sharingofthelogicblocksisfullyusedtodesigntheproposed
face detection system.
We could estimate the average of the detection frame rate
using the obtained clock frequency from the synthesis results.
e performance evaluation presented in this section is based
on the combination of the circuit area and the timing esti-
mation obtained from Xilinx’s evaluation tools and Mentor
Graphics design tools, respectively. Table summarizes the
design features obtained from our experiments.
To estimate the performance of the proposed face detec-
tion system, we processed all input images continuously
International Journal of Distributed Sensor Networks
T : Test results for target FPGA platform.
Features Results
Target platform Xilinx xcsda
Clock frequency MHz
Input image resolution ×
Detection rate %
Detection speed fps
to measure the total number of clock cycles. e system
operating at the maximum frequency of MHz processed
test images within . sec, which means the processing
rate is estimated to be fps. is result is including actual
overhead for switching frames and subwindows. Addition-
ally, the FPGA implementation achieved % accuracy of
detecting the faces on the images when compared to the
OpenCV soware, running on the same test images. is
discrepancyismainlyduetothefactthattheOpenCV
implementation scales the features up which does not result
in data loss. However, our implementation scales down the
frame image instead of feature size during detection process.
Another reason for the above dierence is that the FPGA
implementation used the xed-point arithmetic to compute
oating point number, which essentially has a precision loss.
A direct quantitative comparison with previous imple-
mentations is not practical because previous works dier
in terms of targeting devices and the need for additional
hardware or soware. Moreover, some conventional works
did not open their detailed data for their implementation.
Although the target platform is dierent in its design purpose,
webelievethatitisenoughtoshowtheroughcomparison
with previous designs.
Table presents the comparison of the proposed design
between previous works focused on low-cost face detection.
e implementation of Wei et al. []hasthesamedesign
strategy with the proposed design in terms of compact
circuit design. In general, the major factors for evaluating
the performance of the face detection system are the input
image size, the stage number, and the feature number of
the classier. e image size has a direct eect on the face
detection time. erefore, the bigger image size requires the
longer processing time. Although we use four times larger
input image than the implementation of Wei et al. [], our
architecture is about times faster. Due to the nature of the
cascade structure, the speed of a face detection system is
directly related to the number of stages and to the number of
features used for the cascaded classier. e smaller number
of stages or of features used in the classier gets the faster
detection time. However, in the inverse proportion to the fast
speed, the accuracy of face detecting will decrease. From this
point of view, the experimental results showed outstanding
performance of the proposed architecture for face detection
compared to that of [, ]. As a result, we can say that the
uniqueness of our face detection system is that it supports
real-time face detection on embedded systems, which urge
for high-performance and small-sized solution using low-
price commercial FPGA chip.
T : Comparisons between the proposed design and the
previous works for low-cost face detection.
Wei et al. [ ]
Yang et al .
[]
is work
Image size
× — ×
Stage number
Feature number
, ,
Target device
Xilinx
Virtex-
Altera
Cyclone II
Xilinx
Spartan-
Max. frequency
MHz MHz MHz
Performance
fps fps fps
5. Conclusion
is paper proposed a low-cost and ecient FPGA-based
hardware architecture for the real-time face detection system
applicable to an analog video camera. We made an eort to
minimize the complexity of architecture for integral image
generator and classier. e experimental results showed that
the detection-rate of the proposed architecture is around
% of that of OpenCV’s detection results. Moreover, the
proposed architecture is implemented on a Spartan- FPGA
as an example of practical implementation. e results of
practical implementation showed that our architecture can
detect faces in a ×-pixel image of analog camera at
fps with the maximum operating frequency of MHz.
Considering the balance between the hardware performance
andthedesigncost,theproposedarchitecturecouldbean
alternative way of a feasible solution for low-cost recong-
urable devices, supporting the real-time face detection on
conventional analog video cameras.
Conflict of Interests
e authors declare that there is no conict of interests
regarding the publication of this paper.
References
[] X. Yang, G. Peng, Z. Cai, and K. Zeng, “Occluded and low
resolution face detection with hierarchical deformable model,”
Journal of Convergence,vol.,no.,pp.–,.
[] H. Cho and M. Choi, “Personal mobile album/diary application
development,” JournalofConvergence,vol.,no.,pp.–,
.
[] K. Salim, B. Hada, and R. S. Ahmed, “Probabilistic models
for local patterns analysis,” Journal of Information Processing
Systems,vol.,no.,pp.–,.
[] H. Kim, S.-H. Lee, M.-K. Sohn, and D.-J. Kim, “Illumination
invariant head pose estimation using random forests classier
andbinarypatternrunlengthmatrix,”Human-Centric Comput-
ing and Information Sciences, vol. , article , .
[] K.Goswami,G.S.Hong,andB.G.Kim,“Anovelmesh-based
moving object detection technique in video sequence,” Journal
of Convergence,vol.,no.,pp.–,.
International Journal of Distributed Sensor Networks
[] R. Raghavendra, B. Yang, K. B. Raja, and C. Busch, “A new
perspective—face recognition with light-eld camera,” in Pro-
ceedings of the 6th IAPR International Conference on Biometrics
(ICB ’13),pp.–,June.
[] S. Choi, J.-W. Han, and H. Cho, “Privacy-preserving H.
video encryption scheme,” ETRI Journal,vol.,no.,pp.–
, .
[] D. Bhattacharjee, “Adaptive polar transform and fusion for
human face image processing and evaluation,” Human-centric
Computing and Information Sciences,vol.,no.,pp.–,.
[] S.-M. Chang, H.-H. Chang, S.-H. Yen, and T. K. Shih, “Pano-
ramic human structure maintenance based on invariant fea-
tures of video frames,” Human-Centric Computing and Informa-
tion Sciences,vol.,no.,pp.–,.
[] P. Viola and M. J. Jones, “Robust real-time face detection,”
International Journal of Computer Vision,vol.,no.,pp.–
, .
[] C. Shahabi, S. H. Kim, L. Nocera et al., “Janus—multi source
event detection and collection system for eective surveillance
of criminal activity,” Journal of Information Processing Systems,
vol.,no.,pp.–,.
[] D. Ghimire and J. Lee, “A robust face detection method based on
skin color and edges,” Journal of Information Processing Systems,
vol.,no.,pp.–,.
[] Intel Corp, “Intel OpenCV Library,” Santa Clara, Calif, USA,
http://sourceforge.net/projects/opencvlibrary/les/.
[] W. J. MacLean, “An evaluation of the suitability of FPGAs for
embedded vision systems,” in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition
(CVPR ’05) Workshops, pp. –, San Diego, Calif, USA, June
.
[] M.Sen,I.Corretjer,F.Haimetal.,“ComputervisiononFPGAs:
design methodology and its application to gesture recognition,”
in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops (CVPR
Workshops '05), pp. –, San Diego, Calif, USA, June .
[] J.Xiao,J.Zhang,M.Zhu,J.Yang,andL.Shi,“Fastadaboost-
based face detection system on a dynamically coarse grain
recongurable architecture,” IEICE Transactions on Information
and Systems,vol.,no.,pp.–,.
[] R. Meng, Z. Shengbing, L. Yi, and Z. Meng, “CUDA-based
real-time face recognition system,” in Proceedings of the 4th
International Conference on Digital Information and Communi-
cation Technology and it’s Applications (DICTAP ’14),pp.–
, Bangkok, ailand, May .
[] T. eocharides, N. Vijaykrishnan, and M. J. Irwin, “A parallel
architecture for hardware face detection,” in Proceedings of the
IEEE Computer Society Annual Symposium on Emerging VLSI
Technologies and Architectures, pp. –, March .
[] Y. Shi, F. Zhao, and Z. Zhang, “Hardware implementation of
adaboost algorithm and verication,” in Proceedings of the 22nd
International Conference on Advanced Information Networking
and Applications Workshops (AINA ’08), pp. –, March
.
[] J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, “FPGA-based
face detection system haar classiers,” in Proceedings of the
ACM/SIGDAInternationalSymposiumonFieldProgrammable
Gate Arrays, pp. –, February .
[] M. Hiromoto, H. Sugano, and R. Miyamoto, “Partially paral-
lel architecture for AdaBoost-based detection with Haar-like
features,” IEEE Transactions on Circuits and Systems for Video
Technolog y,vol.,no.,pp.–,.
[] H.-C.Lai,M.Savvides,andT.Chen,“ProposedFPGAhardware
architecture for high frame rate (> fps) face detection
using feature cascade classiers,” in Proceedings of the 1st IEEE
International Conference on Biometrics: eory, Applications,
and Systems (BTAS ’07), pp. –, September .
[] C. Gao and S.-L. Lu, “Novel FPGA based haar classier face
detection algorithm acceleration,” in Proceedings of the Interna-
tional Conference on Field Programmable Logic and Applications,
pp. –, September .
[] C. Kyrkou and T. eocharides, “A exible parallel hardware
architecture for AdaBoost-based real-time object detection,”
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems,vol.,no.,pp.–,.
[] R. C. Luo and H. H. Liu, “Design and implementation of
ecient hardware solution based sub-window architecture of
Haar classiers for real-time detection of face biometrics,” in
Proceedings of the IEEE International Conference on Mechatron-
ics and Automation (ICMA ’10),pp.–,August.
[] A. Bigdeli, C. Sim, M. Biglari-Abhari, and B. C. Lovell, “Face
detection on embedded systems,” in Embedded Soware and
Systems, pp. –, Springer, Berlin, Germany, .
[] Y. Wei, X. Bing, and C. Chareonsak, “FPGA implementation
of AdaBoost algorithm for detection of face biometrics,” in
Proceedings of the IEEE International Workshop on Biomedical
Circuits and Systems, pp. S/–-, December .
[] M.Yang,Y.Wu,J.Crenshaw,B.Augustine,andR.Mareachen,
“Face detection for automatic exposure control in handheld
camera,” in Proceedings of the 4th IEEE International Conference
on Computer Vision Systems (ICVS ’06), p. , IEEE, January
.
[] V.Nair,P.-O.Laprise,andJ.J.Clark,“AnFPGA-basedpeople
detection system,” EURASIP Journal on Advances in Signal
Processing,vol.,no.,pp.–,.
[] R. Lienhart and J. Maydt, “An extended set of Haar-like features
for rapid object detection,” in Proceedings of the International
Conference on Image Processing (ICIP ’02),vol.,pp.I-–I-
, IEEE, September .
International Journal of
Aerospace
Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2010
Robotics
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Active and Passive
Electronic Components
Control Science
and Engineering
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
International Journal of
Rotating
Machinery
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Journal of
Engineering
Volume 2014
Submit your manuscripts at
http://www.hindawi.com
VLSI Design
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Shock and Vibration
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Civil Engineering
Advances in
Acoustics and Vibration
Advances in
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Electrical and Computer
Engineering
Journal of
Advances in
OptoElectronics
Hindawi Publishing Corporation
h
ttp://www.hindawi.com
Volume 2014
The Scientic
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Sensors
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
h
ttp://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Chemical Engineering
International Journal of
Antennas and
Propagation
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Navigation and
Observation
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Distributed
Sensor Networks
International Journal of