ArticlePDF Available

System Architecture for Real-Time Face Detection on Analog Video Camera

Wiley
International Journal of Distributed Sensor Networks
Authors:

Abstract and Figures

This paper proposes a novel hardware architecture for real-time face detection, which is efficient and suitable for embedded systems. The proposed architecture is based on AdaBoost learning algorithm with Haar-like features and it aims to apply face detection to a low-cost FPGA that can be applied to a legacy analog video camera as a target platform. We propose an efficient method to calculate the integral image using the cumulative line sum. We also suggest an alternative method to avoid division, which requires many operations to calculate the standard deviation. A detailed structure of system elements for image scale, integral image generator, and pipelined classifier that purposed to optimize the efficiency between the processing speed and the hardware resources is presented. The performance of the proposed architecture is described in comparison with the detection results of OpenCV using the same input images. For verification of the actual face detection on analog cameras, we designed an emulation platform using a low-cost Spartan-3 FPGA and then experimented the proposed architecture. The experimental results show that the processing time for face detection on analog video camera is 42 frames per second, which is about 3 times faster than previous works for low-cost face detection.
Content may be subject to copyright.
Research Article
System Architecture for Real-Time Face Detection on
Analog Video Camera
Mooseop Kim,
1
Deokgyu Lee,
2
and Ki-Young Kim
1
1
Creative Future Research Laboratory, Electronics and Telecommunications Research Institute, 138 Gajeongno,
Yuseong-gu, Daejeon 305-700, Republic of Korea
2
Department of Information Security, Seowon University, 377-3 Musimseo-ro, Seowon-gu, Cheongju-si,
Chungbuk 361-742, Republic of Korea
Correspondence should be addressed to Deokgyu Lee; deokgyulee@gmail.com
Received  October ; Accepted March 
Academic Editor: Neil Y. Yen
Copyright ©  Mooseop Kim et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
is paper proposes a novel hardware architecture for real-time face detection, which is ecient and suitable for embedded systems.
e proposed architecture is based on AdaBoost learning algorithm with Haar-like features and it aims to apply face detection to a
low-cost FPGA that can be applied to a legacy analog video camera as a target platform. We propose an ecient method to calculate
the integral image using the cumulative line sum. We also suggest an alternative method to avoid division, which requires many
operations to calculate the standard deviation. A detailed structure of system elements for image scale, integral image generator, and
pipelined classier that purposed to optimize the eciency between the processing speed and the hardware resources is presented.
e performance of the proposed architecture is described in comparison with the detection results of OpenCV using the same
input images. For verication of the actual face detection on analog cameras, we designed an emulation platform using a low-cost
Spartan- FPGA and then experimented the proposed architecture. e experimental results show that the processing time for
face detection on analog video camera is  frames per second, which is about times faster than previous works for low-cost face
detection.
1. Introduction
Face detection is the process of nding the locations and sizes
of all possible faces in a given image or in video streams. It
is the essential step for developing many advanced computer
vision and multimedia applications such as object detection
and tracking [], object recognition [], privacy masking
[], and video surveillance [, ]. e object detection scheme
proposed by Viola and Jones []isoneofthemostecient
and widely used techniques for the face detection from the
advantage of its high detection rate and fast processing.
e recent trend in video surveillance is changed to
IP network cameras. But the analog video cameras are
still widely used in many surveillance services because of
availability, cost, and eectiveness. e face detection now
available with IP cameras is not available in analog video
cameras. We would like to propose an alternative method,
which can provide face detection in analog video cameras.
Recently, real-time processing of the face detection is
required for embedded systems such as security systems
[], surveillance cameras [],andportabledevices.e
challenges of the face detection in embedded environments
include an ecient pipelined design, the bandwidth con-
straints set by low cost memory, and an ecient utilization
of the available hardware resources. In addition, consumer
applications require the reliability to guarantee the process-
ing deadlines. Among these limitations, the main design
concerns for face detection on embedded system are circuit
area and computing speed for real-time processing. e
face detection, however, essentially requires considerable
computation load because many Haar-like feature classiers
check all pixels in the images. erefore, the design methods
Hindawi Publishing Corporation
International Journal of Distributed Sensor Networks
Volume 2015, Article ID 251386, 11 pages
http://dx.doi.org/10.1155/2015/251386
International Journal of Distributed Sensor Networks
to achieve the best trade-o between several conicting
design issues are required.
Currently, Viola-Jones face detection scheme is used in
personal computer system in the form of the Open Computer
Vision Library (OpenCV) []. However, the implementation
of the OpenCVs face detection on the embedded system is
not a suitable solution because the computing power of the
processor used in an embedded system is not as powerful as
those in PCs. is disparity between the real-time processing
andthelimitedcomputingpowerclearlyshowsthenecessity
of coprocessor acceleration for image processing on the
embedded system.
In general, the hardware system is implemented on an
application specic integrated circuit (ASIC) or on eld-
programmable gate arrays (FPGAs). Although slower than
ASIC devices, the FPGA has the advantage of the fast imple-
mentation of a prototype and the ease of design changes.
Recently, the improvement of the performance and the
increase of the density of FPGAs such as embedded memory
and DSP cores have led to this device becoming a viable and
highly attractive solution to computer vision [].
e high-speed vision system developed so far accelerates
the computing speed by using massively parallel processors
[, ] or by implementing the dedicated circuits in recon-
gurable hardware platform []. However, the previous
researches focused on the enhancement of the execution
speed instead of the implementation on the feasible area,
which is the real concern of embedded systems. Only few
attempts have been made to realize the Viola-Jones face detec-
tion scheme in embedded systems []. Although these
approaches have been tried to develop an eective detection
system for embedded system, their performance seems to be
still insucient to meet the real-time detection. erefore,
a spatially optimized architecture and a design method that
enhances the detection speed and can be implemented on a
small area are required.
In this paper, we present ecient and low-cost FPGA-
based system architecture for the real-time Viola-Jones face
detection applicable to legacy analog video cameras. e
proposed design made an eort to minimize the complexity
of architecture for implementing both integral image gen-
erator and classier. e main contributions of this work
are summarized as follows. Firstly, we propose an ecient
methodtocalculatetheintegralimageusingthecumulative
line sum. Secondly, we suggest an alternative method to avoid
division, which requires many operations to calculate the
standard deviation. e integral image window is generated
by the combination of the piped register array, which allows
fast computation of integral image as introduced in [],
and the proposed compact integral image generator. e
classier then detects a face candidate using the generated
integral image window. Finally, the proposed architecture
uses only one classier module, which consists of -stage
pipeline based on training data of OpenCV with  stages
and , features. As the result of applying the proposed
architecture, we can design a physically feasible hardware
system to accelerate the processing speed of the operations
required for real-time face detection in analog video cameras.
2. Related Works
In this section, we rstly give a brief introduction of face
detection based on AdaBoost learning algorithm using Haar-
like features. en we evaluate the most relevant previous
works in the literature.
2.1. AdaBoost Face Detection. Viola and Jones proposed
a robust and real-time object detection method using
AdaBoost algorithm to select the Haar-like features and to
train the classier. ey selected a small number of weak
classiers and then these classiers combined in a cascade
to construct strong classiers. Haar-like feature consists of
several black and white rectangles and each feature has
predened number and size of rectangles. Figure (a) shows
some examples of Haar-like feature and Figure (b) shows
how Haar-like features are applied to a subwindow for face
detection.
e computation of a Haar-like feature involves the
subtraction of the sum of the pixel values in the black
rectangle from the sum of pixel values in the white rectangle
of the feature. To speed up the feature computation, Viola
and Jones introduced the integral image.eintegralimage
is a simple transformation of the original input image to the
alternative image where each pixel location represents the
sumofallthepixelstotheleandaboveofthepixellocation
as shown in Figure (c). is operation is described as follows,
where (,)is integral image at the location of (,)and
(,)isapixelvalueintheimage:
,=
𝑖≤𝑥
𝑗≤𝑦
,.
()
e computation of the integral image for rectangle
in Figure (d) canbecalculatedbytwoadditionsandtwo
subtractions using the integral image values of four corner
points: (
1
,
1
)−(
2
,
1
)−(
1
,
2
)+(
2
,
2
). erefore,
only four values are required to compute a rectangle area of
each feature regardless of the feature size.
To achieve fast detection, Viola and Jones also proposed
a cascade structure of classier. Each stage of the cascade
consists of a group of Haar-like features selected by AdaBoost
learning algorithm. For the rst several stages, the classiers
are trained to reject most of the negative subwindows while
detecting almost all face-like candidates. is architecture
can speed up the detection process dramatically because most
of negative images can be discarded during the rst two or
three stages. erefore, the computation eort can be focused
on face-like subwindows. Subwindows are sequentially evalu-
ated by stage classiers and the result of each Haar-like feature
in a stage is accumulated. When all the features in a stage are
computed, the accumulated value is compared with a stage
threshold value to determine if the current subwindow is a
face-likecandidate.esubsequentstageisactivatedonly
if the previous stage turns out to be a positive result. If a
candidate passes all stages in the cascade, it is determined that
the current subwindow contains a face.
2.2. Related Works. SinceViolaandJoneshaveintroduceda
novel face detection scheme, numerous considerable research
International Journal of Distributed Sensor Networks
(a) (b)
P
y
x
(c)
A
B
C
D
x
1
x
2
y
1
y
2
P1
P2
P3
P4
(d)
F : Haar-like features: (a) examples of Haar-like features, (b) Haar-like features applied to a subwindow, (c) integral image of pixel
(,), and (d) integral image computation for rectangle = 123+4,where1, 2, 3,and4are the integral image at
coordinates (
1
,
1
), (
2
,
1
), (
1
,
2
),and(
2
,
2
),respectively.
eorts have already expended on the ecient implementation
of their scheme. Most of these literatures mainly focused on
the optimization of feature calculation and cascade structure
of classiers because they are the most time consuming part
in the detection system.
Lienhart and Maydt [] were the rst to introduce the
face detection algorithm into Intel Integrated Performance
Primitives, which was later included in OpenCV library [].
e optimized code for x architecture can be detected
accurately on an image of  ×  with GHz Pentium-
processor in real-time. However, on embedded platforms,
thisperformanceismuchpoorerthanonthedesktop
platform. A  MHz ARM processor can only detect the
same resolution image at the speed of fps, which is far
from real-time execution. is means that the face detection
is still a time consuming process on embedded platforms.
In order to detect a face in an image, a massive number of
subwindows within each image must be evaluated. erefore,
a hardware design of AdaBoost algorithm could be an alter-
native solution for embedded systems. eocharides et al.
[] proposed a parallel architecture using a structure called
CDTU (Collection and Data Transfer Unit) array on an ASIC
platform to accelerate the processing speed. e simulation
results in their paper reported that they obtained a rough
estimate of  frames per second targeting  MHz clock
cycle. However, the CDTU architecture consumes massive
hardware resources, which is dicult to adopt in embedded
systems. Moreover, VLSI technology requires a large amount
of development time and cost and it is dicult to change
design. Recently, much attention has been paid to implement
a face detection system using FPGA platform. FPGA can
provide a low-cost platform to realize the face detection
algorithm in a short design time with the exibility of ne
tuning the design for more parallel operations as needed.
In recent years, new generations of FPGAs with embedded
DSP resources have provided an attractive solution for image
and video processing applications. In the work presented
byShietal.[] some optimization methods are suggested
to speed up the detection procedure considering systolic
AdaBoost implementations. e proposed work introduces
International Journal of Distributed Sensor Networks
two pipelines in the integral image array to increase the
detection process: a vertical pipeline that computes the
integral image and a horizontal pipeline that can compute a
rectangle feature in one cycle. However, their results are not
hardware implementation but come from the cycle accurate
simulation. Cho et al. [] proposed a parallelized architec-
ture of multiple classiers for a face detection system using
pipelined and parallel processing. ey adopted cell array
architecture for the main classication modules. e integral
imageisgeneratedforeachsubwindowandisthenusedfor
classication through a cell array. Recently, Hiromoto et al.
[] proposed a hybrid model of face detection architecture
consisting of parallel and sequential modules. To achieve high
speed detection, the parallel modules are assigned to the early
stages of the algorithm which are frequently used whereas
the latter stages are mapped onto sequential modules as they
are rarely executed. However, the separation of the parallel
and sequential stages requires additional hardware resources
to hold the integral image values for current subwindow to
process in sequential stages while parallel stages compute a
new subwindow. Moreover, the detailed experimental results
and analysis of the implemented system are not discussed.
Lai et al. []presentedahardwarearchitecturethat
employs very similar architecture to the ones presented in
[, ].eyusedapipedregistermoduleforintegralimage
calculation with  columns and  rows. According to their
report, it can achieve theoretical  fps detection for  ×
-pixel images. However, they used only  classiers in a
singlestage.Becauseoftheirsmallnumberofcascadestages
and classiers, their results show lower detection rate and
higher false alarm rate than OpenCV’s implementations. Gao
and Lu [] presented an approach to use an FPGA to acceler-
ate Haar-like feature based face detection. ey retrained the
Haar-like classier with  classiers per stage. However, only
classiers are implemented in the FPGA. e integral image
computing is processed in a host microprocessor.
e aforementioned approaches achieve fast detection
but they still require too much hardware resources to be
realized in embedded systems. Bigdeli et al. []studiedthe
eects of replacing certain soware bottleneck operations by
custom instructions on embedded processors, Altera Nios
II processor, especially the image resizing and oating point
operations, but did not fully implement the entire algorithm
in hardware. A simple version of the algorithm was proposed
in [] where techniques such as scaling input images and
xed-point expression to achieve fast processing with a
smaller circuit area were used. e architecture presented
in [] was reported to achieve  fps at  MHz. However,
the image size is too small ( × pixels) to be practical
and only three stages of classiers are actually implemented.
Another low-cost architecture implemented on an inexpen-
sive ALTERA Cyclone II FPGA was reported by Yang et al.
[]. ey used a complex control scheme to meet hard real-
time deadlines by sacricing detection rate. e frame rate of
this system is  fps with low detection rate of about %. On
the other hand, Nair et al. [] proposed an embedded system
for human detection on a FPGA platform, which performed
on an input image of about  pixels. However, the reported
frame rate was only . fps. Although they insist that their
implementations target embedded system, the disadvantage
of such designs is low pixel images or a low performance,
which is far from real-time detection in embedded systems.
3. Proposed Hardware Architecture
e structure of the proposed face detection system for
analog video camera is shown in Figure .Intheproposed
design, we made an eort to minimize the complexity
of the architecture for implementing each block. We also
considered the contribution for the trade-o between the
performance and the consuming hardware resources. e
proposed method is able to detect a face in conventional
analog video cameras without a large-scale change of the
system. Our method only reads the analog input video signal
through the physical wiring and detects faces in each input
frame. e detection result is then transmitted to a camera
or system by using the general-purpose serial interface.
erefore, there is almost no eect on the operation of
conventional systems.
e proposed architecture consists of three major blocks:
an image scale block (ISB), an integral image processing
block (IPB), and a feature processing block (FPB). For the
video image acquisition, we used a commercial device, which
supports the analog-to-digital conversion and the decoding
of a composite signal into a NTSC signal. e A/D converter
in the image acquisition module converts the analog image
signals into digital image data. e video sync module
converts digital image data to BT. video protocol.
e basic ow of the proposed architecture is as follows.
e ISB receives the input video frame and scales down the
frame image. Aer a frame image is stored, the operating
state is changed to calculate the integral images for each
subwindow of the current frame. e IPB is responsible for
generating and updating the integral image for the integral
imagearray.eoperationstateisthenmovedtotheface
detection state, which is processed in the FPB. e classier
in the FPB evaluates the integral image transferred from IPB
and provides the output of the detection results.
3.1. Image Scale Block (ISB). e ISB receives the input video
image frames from the image acquisition module and scales
down each frame image. e image scaling is repeated until
the downscaled image is similar to the size of the subwindows.
e video interface module of the ISB receives image frame in
therow-wisedirectionandsavesimagepixelsintotheframe
image buer.
e detailed structure of the video interface module
is shown in Figure . e data path receives BT. video
sequences and selects active pixel data regions in the input
video packet. e sync control generates the control signal to
synchronize the video frame using the embedded sync signals
of the BT. video sequences such as EAV (end of active
video) and SAV (start of active video) preamble codes. e
generated control signals are used to generate the address and
control signals for managing the frame image buer.
e frame image buer stores a frame image at the start of
operation. In general, the management of data transmission
International Journal of Distributed Sensor Networks
becomes an issue in embedded video system. e clock
frequency of inner modules of the face detection system
is running at higher speed than the clock rate of input
video sequence. Using a dual-port memory has proven to
be an eective interface strategy for bridging dierent clock
domain logics. e frame image buer consists of a dual-
port RAM. Physically, this memory has two completely
independent write and read ports. erefore, input video
streams are stored using one port and inner modules read out
the saved image using the other one.
e image scale module scales down the frame image
stored in the frame image buer by the scale factor of ..
To make downscaled images, we used the nearest neighbor
interpolation algorithm, which is the simplest interpolation
algorithm and requires a lower computing cost. is module
keeps scaling down until the height of scaled frame image is
similar with the size of the subwindow size ( × pixels).
erefore, the image scale module has  scale factors (.
0
.
10
)for× -pixel image. is module consists of
two memory blocks (mem and mem). Once the frame
image data is saved in frame image buer, the image scale
module starts to generate the rst scaled image, which is
saved in the mem. At the same time, the original frame
image is transferred to the IPB to start the computing of the
integral image. Aer all the computing of IPB and FPB over
the original image is nished, data stored in the mem are
transferredtoIPBandthesecondscaledimageiswrittenback
to the mem. is process is continued until the scaled image
is similar to subwindow size. e scale controller generates
control signals for moving and storing pixel values to generate
scaled images. It also checks the state of video interface and
manages the memory modules.
3.2. Integral Image Processing Block (IPB). e IPB is in
charge of calculating integral images, which are used for
classication process. Integral image is an image represen-
tation method where each pixel location holds the sum of
all the pixels to the le and above of the original image. We
calculate the integral image by subwindow separately and the
subwindow is scanned from the top to the bottom and then
from the le to the right in the frame image and the scaled
images. In the context of this work, the subwindow is dened
astheimageregionexaminedforthetargetobjects.e
proposed architecture used  × pixels as a subwindow.
e integral image generator of IPB performs the precal-
culation to generate the integral image window. It computes
the cumulative line sum (CLS) for the single row ()ofthe
current processing subwindow as shown in (),where(,)
is the pixel value. Consider the following:
CLS ,=
𝑦
𝑖=1
(
,
)
.
()
e detailed structure of the integral image generator
is shown in Figure .elineregisterreadsonerowofa
subwindow from top to bottom lines. erefore, to store
the  pixels of image data, the width of the line register
requires  × bits. Each pixel value for a horizontal line is
accumulated to generate the line integral image.
e computing of the line integral image requires the
sum of -pixel data results in an increase of delay time. To
solve this problem, we used a tree adder based on carry-save
adders. e tree adder receives eight pixel values and a carry-
in as the input variables and outputs eight integral image
values corresponding to the input pixels. erefore, three
clock cycles are required to compute a line integral image.
is process is performed in parallel and at the same time
with the operation of the pipelined classier module. us,
when considering overall behavior of the system, it has the
eect of taking one clock cycle to compute the integral image.
e output of tree adder is fed into corresponding position of
the integral image line buer. Each pixel in the line integral
image buer has a resolution of  bits to represent the bit
width of integral image pixels for  × size of a subwindow.
When all the pixel values in the line integral image buer are
updated, the data will be transferred to the bottom line of the
integral image window.
e integral image window calculates the integral image
of the current subwindow. e structure for integral image
window is shown in Figure (a). e size of the integral
image window is identical to the size of subwindow except
the dierence of the data width for the pixel values.
e integral image window consists of  × window
elements (WE) to represent the integral image for the current
subwindow. As shown in Figure (b), each WE includes a
data selector, a subtractor to compute the updated integral
image,andaregistertoholdtheintegralimagevaluefor
each pixel. e register takes the pixel value of the lower line
(
𝑖
)ortheupdatedpixelvalue(
𝑜
) subtracting the rst lines
value (
1
) from current pixel value according to the control
signal. e control signal for data selection and data storing
is provided by the integral image controller in Figure .
e data loading for the integral image window can be
divided into initial loading and normal loading according
to the coordinate of the le-up corner of the current
processing subwindow in the frame image. e initial loading
is executed when the coordinate of the rst line in a
subwindow is zero (=0). is loading is performed for the
rst  rows during vertical scan of the window in the original
and scaled images. In the initial loading,  CLS values are fed
continuously to the bottom line of the integral image array
andsummedwiththepreviousvalueasshownin
,=,1+CLS , if =24,
,=,+1 otherwise.
()
e WEs at the bottom line of the integral image array
accumulate these line integral images and output the accu-
mulated value to the next upper line. e other lines of the
integral image array shi current values by one pixel upward.
Aer  clock cycles, each WE holds their appropriate
integral image values.
e normal loading is executed during the rest of the
vertical scan (>0). As the search window moves down one
line, it is required to update the corresponding integral image
array. e updating of the integral image array for normal
loading is prepared by shiing and subtracting as shown in
(),where(,)represents current integral image, 
󸀠
(,)
International Journal of Distributed Sensor Networks
Stage 1
Stage 2
Rejected subwindows
All subwindows
Frame image
Next
subwindow
Load T
F
T
F
F
T
Detect
face
Stage n
···
···
···
.
.
.
.
.
.
.
.
.
F : Cascade structure for Haar classiers.
Scale controller
Camera
Integral image controller
Pipelined
classier
Classier controller
Weig ht
Image scale block (ISB) Int. image processing block (IPB)
Feature proc. block (FPB)
A/D converter
Video sync
Integral image array
for subwindow computing
Feature
memory
Detection results
BT.656
Int. image generator
Composite
WE
Video interface
Frame image buer
Image scale
Squared image gen.
Var i ance
reshold
Positions
Le_right
Stage_th
w×h
.
.
.
F : Structural overview of the proposed face detection system.
Data path
Frame
image
buer
Sync control
adder
WE
BT.656 video
sequence
c
Y
C
in_buer
c
pixel_y
F : e block diagram of the video interface module.
15
15
15
24 pixels
8 pixels
18
line_register
carry_register
int_image_line_buer
To subwin. array
From image scale
To square
18 bit ×8
18 bits ×24
d
in
c
in
+
···
.
.
.
.
.
.
tree_adder (9:2)
F : e block diagram of the integral image generator.
Integral image window
WE
+
Local control
Con.
(a)
(b)
+ +
+
Integral image line buer
Line 1
Line 23
Line 24
+
WE
WE
WE
Reg.
···
···
···
···
.
.
.
S
i
E
o
E
i
S
o
F : Structure of the integral image window (a) and internal
implementations of window element (b).
means updating integral image, and means the line number
of the integral image window. Consider the following:

󸀠
,=,1
(
,1
)
+CLS , if =24,

󸀠
,=,+1
(
,1
)
otherwise.
()
International Journal of Distributed Sensor Networks
During the normal loading, the rst  lines for the
updatingintegralimagearrayareoverlappedwiththevalueof
the line to  in the current integral image. erefore, these
valuesarecalculatedbysubtractingthevaluesoftherstline
(line)fromthelinetointhecurrentintegralimageand
then shi the array one line up. e last line of the updating
integralimageiscalculatedbyaddingtheCLSofthenewrow
with the last line of the current integral image array and then
subtracting the rst line of the current integral image array.
AdaBoost framework uses a variance normalization to
compensate for the eect of dierent lighting conditions.
erefore, the normalization should be considered in the
design of the IPB as well. is process requires the compu-
tation of the standard deviation ()forcorrespondingsub-
windows. e standard deviation ()andthecompensated
threshold (
𝑐
) are dened as (), where VAR represents the
variance, is the pixel value of the subwindow, and is the
area of the subwindow. Consider the following:
=
VA R =
1
2
−
1

2
,
𝑐
=
0
⋅.
()
e AdaBoost framework multiplies the standard devi-
ation () to the original feature threshold (
0
)giveninthe
training set, to obtain the compensated threshold (
𝑐
). e
computation of the compensated threshold is required only
once for each subwindow. However, the standard deviation
computing needs the square root operation. In general the
square root computing takes much hardware resource and
requires much computing time because of its computational
complexity. erefore, as an alternative method, we squared
bothsidesofthesecondlineof().Nowthecomputingloadis
converted to multiplying the variance with the squared value
of the original threshold. We expand this result once again
using the reciprocal of the squared value of the subwindow
area by multiplying to both sides to avoid costly division
operation as shown in
2
𝑐
=
2
0
VA R ,
2
⋅
2
𝑐
=
2
−

2
⋅
2
0
.
()
erefore we can eciently compute the lighting correc-
tion by just using the multiplication and subtraction, since the
subwindow size is already xed and the squared value of the
original threshold values can be precomputed and stored in
the training set.
e functional block diagram to map the method of ()
in hardware is shown in Figure , which is responsible for
handling the right part of the second line of ().esquared
image generator module, the le part of Figure ,calculates
the squared integral image values for the current subwindow.
e architecture of the squared image generator is identical to
that of the integral image generator, except for using squared
pixel values and storing the last columns values. erefore,
the computing of the squared integral image requires  pixel
values in the last vertical line of a subwindow. e output of
Square line reg.
Squared buer
h
Squared
win. array
Tree
adder
+ ×
N
+
+
×
Squared image
generator
Var i ance
···
···
Σx
i
t
2
0
Σx
2
i
^2
^2
F : Structure for squared image generator and variance
module.
thesquaredimagegeneratoristhesummationoftheintegral
image for squared pixel values (
2
).
e other parts of () are computed in the variance
module, located in the right part of Figure . e upper input
ofthevariancemodulemeanstheintegralimagevalueof
the coordinate for (, ) of the integral image window. e
outputofthevariancemoduleistransferredtotheFPBto
check with the feature sum of all weighted feature rectangles.
3.3. Feature Processing Block (FPB). e FPB computes the
cascaded classier and outputs the detected faces’ coordinates
and scale factors. As shown in Figure ,theFPBconsistsof
three major blocks: pipelined classier, feature memory, and
classier controller.
e pipelined classier is implemented based on a seven-
stage pipeline scheme as shown in Figure .Ineachclock
cycle,theintegralimagevaluesarefedfromtheintegral
image window and the trained parameters of Haar classier
are inputted from the external feature memory. e classier
controller generates appropriate control signals for the mem-
ory access and the data selection in each pipeline stage.
Tocomputerectanglevalueforeachfeatureofthe
cascaded classier, FPB interfaces to the external memory
that holds the training data for the features’ coordinates
and the weights associated with each rectangle. We use one
o-chip, -bit-wide ash memory for the trained data for
coordinates, and weight of features. For storing other trained
data,weusedtheBlockRAMsinthetargetdevice.
e rst phase of the operation for the classiers is the
loading of the features data from the integral image window.
In each clock cycle, eight or twelve pixel values of integral
image for a feature are selected from the integral image array.
To select pixel values, the coordinate data corresponding to
each feature is required. We used a multiplexer to select or
 pieces of data from the integral image array according to
the feature data stored in the external memory.
e rectangle computation takes place in the next pipeline
stage. As shown in Figure , two additions and a subtraction
are required for a rectangle computing. Each rectangle value
is multiplied with predened weight, also obtained from
the training set. e sum of all weighted feature rectangles
represents the result of one weak classier (V
𝑓
).
International Journal of Distributed Sensor Networks
Pipelined classier
MUX
Rectangle
computing
MUX
Position Weight Le Right
Stage
threshold
> >
Feature memory
Classier
controller
WE
values
Result
S
1
S
2
S
3
S
4
S
5
S
6
S
7
.
.
.
2
f
·N
2
f
V
L
V
R
t
s
W
1∼3
Σ
F : Block diagram for FPB.
e value of the single classier is squared and then
multiplied with the square value of the subwindow’s area (
2
)
to compensate the variance depicted as in (). is result is
comparedwiththesquareofcompensatedfeaturethreshold
(
𝑐
). If the result is smaller than the square of the compensated
feature threshold, the result of this Haar classier selects the
le value (
𝐿
), a predetermined value obtained from the
training set. Otherwise, the result chooses the right value
(
𝑅
), another predetermined value also obtained from the
training set. is le or right value is accumulated during
a stage to compute a stage sum. e multipliers and adders,
which require the high performance computing modules, are
implemented with dedicated Xilinx’s DSPE on-chip cores.
At the end of each stage, the accumulated stage sum is
compared to a predetermined stage threshold (
𝑠
). If the stage
sum is larger, the current subwindow is a successful candidate
region to contain a face. In this case, the subwindow carries
out the next stage and so on to decide if current subwindow
could pass all cascaded stages. Otherwise, the subwindow is
discarded and omitted from the rest of the computation.
4. Experiment Results
e proposed architecture was described using VHDL and
veried by the ModelSim simulator. Aer successful simula-
tion, we used the Xilinx ISE tool for synthesis. We selected
a recongurable hardware for a target platform because it
oers several advantages over xed logic systems in terms
of production cost for small volumes but, more critically,
inthattheycanbereprogrammedinresponsetochanging
operational requirements. As a target device, we selected the
Spartan- FPGA device because the Spartan series is low
priced with less capability than the Xilinx’s Virtex series.
4.1.SystemSetup.In order to evaluate the experimental
testing of our design, we designed an evaluation system. e
logical architecture of the evaluation system is depicted in
Figure .
We used a commercial chip for video image capture. e
TVP module in the bottom of the gure for system archi-
tecture receives the analog video sequence from video camera
and converts it to BT. video sequences. In all experiments,
FPGA was congured to contain a face detection core,
TI DM6446
Interrupt
ARM
processor
DSP
processor
Memory
controller
Clock
manage
VPBE
A/D conveter
Video decoder
X-tal
Sync
MCLK
TVP5146
Display
Camera
Y, C
SPI
Xilinx FPGA
Feature memory
VPFE
JTAG
Image
scale
Control
Integral
image
Classier
SPIVPFE
Int.
BT.656 Interface
Video input
Power
TVP5146
Spartan FPGA
DM6446
Video output
V
clk
F : System architecture and photo view of the evaluation
platform.
interface block, and feature memory except coordinate data
of features. e input stream converted to BT. format is
loaded into FPGA and each frame data is processed in the
face detection core. e detection results, the -bit data to
represent the coordinates and scale factors for each candidate
face, are transferred to the DM processor through SPI
interface. When it comes to the overhead of transferring the
detection data from FPGA to DM processor, the SPI
provides at least Mbps of bandwidth, which is sucient to
send the detection results. e FPGA outputs the detection
resultssuchasfacecoordinationandscalefactorthrough
serial peripheral interface (SPI) bus to DM processor.
e Video Processing Front-End (VPFE) module of the
International Journal of Distributed Sensor Networks
(a) (b)
F : Captured results for functional verication of the face detection between OpenCV (a) and the proposed hardware (b) on the same
input images.
DM processor captures the input image from FPGA.
en the ARM processor calculates the position and the
size of candidate faces and marks the face region on the
frame image based on the transferred detection results. It also
performs a postprocessing, which merges multiple detections
into a single face. Finally, the Video Processing Back-End
(VPBE) module outputs the result of the detector to a VGA
monitor, for visual verication, along with markings on
where the candidate faces were detected.
4.2. Verication and Implementation. We st arted from the
functional verication for the experiments of the proposed
structure. For the functional test, we generated the binary
les, which contain the test frame images using the MATLAB.
We used a sample of  test images which contained several
faces, obtained through the internet and MIT + CMU test
images, and sized and formatted to the design requirements.
egeneratedbinaryleswerestoredastextlesandthen
loaded to frame image buer as an input sequence during the
detection process. We next proceeded to run the functional
RTL simulation of the face detection system using the
generated binary les from test images. e detection results,
which contain the coordinates and scale factors for face
candidates, were stored as text les. We used the MATLAB
program in order to output the result of the face detection
system to the tested input image for visual verication, along
with markings on where the candidate faces were detected.
For a fair verication of the functionality, we also tested the
same input images on the OpenCV program to compare the
T : Resource utilization of the Xilinx Spartan- FPGA.
Logic utilization Used Available Utilization
Slices , , %
Slice ip op , , %
-input LUT , , %
Block RAMs   %
DSPs   %
results of the face detection system as shown in Figure .In
such way, we could visually verify the detection results.
We also tried to achieve a real-time detection while
consuming a minimum amount of hardware resource. Table
shows the synthesis results and resource utilization of our
face detection system based on the logic element of the
target FPGA. Considering the essential element like memory
for saving the trained data, Table shows that the resource
sharingofthelogicblocksisfullyusedtodesigntheproposed
face detection system.
We could estimate the average of the detection frame rate
using the obtained clock frequency from the synthesis results.
e performance evaluation presented in this section is based
on the combination of the circuit area and the timing esti-
mation obtained from Xilinx’s evaluation tools and Mentor
Graphics design tools, respectively. Table summarizes the
design features obtained from our experiments.
To estimate the performance of the proposed face detec-
tion system, we processed all input images continuously
 International Journal of Distributed Sensor Networks
T : Test results for target FPGA platform.
Features Results
Target platform Xilinx xcsda
Clock frequency  MHz
Input image resolution  ×
Detection rate %
Detection speed  fps
to measure the total number of clock cycles. e system
operating at the maximum frequency of  MHz processed
 test images within . sec, which means the processing
rate is estimated to be  fps. is result is including actual
overhead for switching frames and subwindows. Addition-
ally, the FPGA implementation achieved % accuracy of
detecting the faces on the images when compared to the
OpenCV soware, running on the same test images. is
discrepancyismainlyduetothefactthattheOpenCV
implementation scales the features up which does not result
in data loss. However, our implementation scales down the
frame image instead of feature size during detection process.
Another reason for the above dierence is that the FPGA
implementation used the xed-point arithmetic to compute
oating point number, which essentially has a precision loss.
A direct quantitative comparison with previous imple-
mentations is not practical because previous works dier
in terms of targeting devices and the need for additional
hardware or soware. Moreover, some conventional works
did not open their detailed data for their implementation.
Although the target platform is dierent in its design purpose,
webelievethatitisenoughtoshowtheroughcomparison
with previous designs.
Table presents the comparison of the proposed design
between previous works focused on low-cost face detection.
e implementation of Wei et al. []hasthesamedesign
strategy with the proposed design in terms of compact
circuit design. In general, the major factors for evaluating
the performance of the face detection system are the input
image size, the stage number, and the feature number of
the classier. e image size has a direct eect on the face
detection time. erefore, the bigger image size requires the
longer processing time. Although we use four times larger
input image than the implementation of Wei et al. [], our
architecture is about times faster. Due to the nature of the
cascade structure, the speed of a face detection system is
directly related to the number of stages and to the number of
features used for the cascaded classier. e smaller number
of stages or of features used in the classier gets the faster
detection time. However, in the inverse proportion to the fast
speed, the accuracy of face detecting will decrease. From this
point of view, the experimental results showed outstanding
performance of the proposed architecture for face detection
compared to that of [, ]. As a result, we can say that the
uniqueness of our face detection system is that it supports
real-time face detection on embedded systems, which urge
for high-performance and small-sized solution using low-
price commercial FPGA chip.
T : Comparisons between the proposed design and the
previous works for low-cost face detection.
Wei et al. [ ]
Yang et al .
[]
is work
Image size
 ×  ×
Stage number

Feature number
,  ,
Target device
Xilinx
Virtex-
Altera
Cyclone II
Xilinx
Spartan-
Max. frequency
 MHz  MHz  MHz
Performance
fps fps fps
5. Conclusion
is paper proposed a low-cost and ecient FPGA-based
hardware architecture for the real-time face detection system
applicable to an analog video camera. We made an eort to
minimize the complexity of architecture for integral image
generator and classier. e experimental results showed that
the detection-rate of the proposed architecture is around
% of that of OpenCV’s detection results. Moreover, the
proposed architecture is implemented on a Spartan- FPGA
as an example of practical implementation. e results of
practical implementation showed that our architecture can
detect faces in a  ×-pixel image of analog camera at
 fps with the maximum operating frequency of  MHz.
Considering the balance between the hardware performance
andthedesigncost,theproposedarchitecturecouldbean
alternative way of a feasible solution for low-cost recong-
urable devices, supporting the real-time face detection on
conventional analog video cameras.
Conflict of Interests
e authors declare that there is no conict of interests
regarding the publication of this paper.
References
[] X. Yang, G. Peng, Z. Cai, and K. Zeng, “Occluded and low
resolution face detection with hierarchical deformable model,
Journal of Convergence,vol.,no.,pp.,.
[] H. Cho and M. Choi, “Personal mobile album/diary application
development, JournalofConvergence,vol.,no.,pp.,
.
[] K. Salim, B. Hada, and R. S. Ahmed, “Probabilistic models
for local patterns analysis, Journal of Information Processing
Systems,vol.,no.,pp.,.
[] H. Kim, S.-H. Lee, M.-K. Sohn, and D.-J. Kim, “Illumination
invariant head pose estimation using random forests classier
andbinarypatternrunlengthmatrix,Human-Centric Comput-
ing and Information Sciences, vol. , article , .
[] K.Goswami,G.S.Hong,andB.G.Kim,“Anovelmesh-based
moving object detection technique in video sequence, Journal
of Convergence,vol.,no.,pp.,.
International Journal of Distributed Sensor Networks 
[] R. Raghavendra, B. Yang, K. B. Raja, and C. Busch, A new
perspective—face recognition with light-eld camera, in Pro-
ceedings of the 6th IAPR International Conference on Biometrics
(ICB ’13),pp.,June.
[] S. Choi, J.-W. Han, and H. Cho, “Privacy-preserving H.
video encryption scheme, ETRI Journal,vol.,no.,pp.
, .
[] D. Bhattacharjee, “Adaptive polar transform and fusion for
human face image processing and evaluation, Human-centric
Computing and Information Sciences,vol.,no.,pp.,.
[] S.-M. Chang, H.-H. Chang, S.-H. Yen, and T. K. Shih, “Pano-
ramic human structure maintenance based on invariant fea-
tures of video frames, Human-Centric Computing and Informa-
tion Sciences,vol.,no.,pp.,.
[] P. Viola and M. J. Jones, “Robust real-time face detection,
International Journal of Computer Vision,vol.,no.,pp.
, .
[] C. Shahabi, S. H. Kim, L. Nocera et al., “Janus—multi source
event detection and collection system for eective surveillance
of criminal activity, Journal of Information Processing Systems,
vol.,no.,pp.,.
[] D. Ghimire and J. Lee, A robust face detection method based on
skin color and edges, Journal of Information Processing Systems,
vol.,no.,pp.,.
[] Intel Corp, “Intel OpenCV Library, Santa Clara, Calif, USA,
http://sourceforge.net/projects/opencvlibrary/les/.
[] W. J. MacLean, An evaluation of the suitability of FPGAs for
embedded vision systems, in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition
(CVPR ’05) Workshops, pp. –, San Diego, Calif, USA, June
.
[] M.Sen,I.Corretjer,F.Haimetal.,“ComputervisiononFPGAs:
design methodology and its application to gesture recognition,
in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition Workshops (CVPR
Workshops '05), pp. –, San Diego, Calif, USA, June .
[] J.Xiao,J.Zhang,M.Zhu,J.Yang,andL.Shi,“Fastadaboost-
based face detection system on a dynamically coarse grain
recongurable architecture, IEICE Transactions on Information
and Systems,vol.,no.,pp.,.
[] R. Meng, Z. Shengbing, L. Yi, and Z. Meng, “CUDA-based
real-time face recognition system, in Proceedings of the 4th
International Conference on Digital Information and Communi-
cation Technology and it’s Applications (DICTAP ’14),pp.
, Bangkok, ailand, May .
[] T. eocharides, N. Vijaykrishnan, and M. J. Irwin, A parallel
architecture for hardware face detection, in Proceedings of the
IEEE Computer Society Annual Symposium on Emerging VLSI
Technologies and Architectures, pp. –, March .
[] Y. Shi, F. Zhao, and Z. Zhang, “Hardware implementation of
adaboost algorithm and verication, in Proceedings of the 22nd
International Conference on Advanced Information Networking
and Applications Workshops (AINA ’08), pp. –, March
.
[] J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, “FPGA-based
face detection system haar classiers, in Proceedings of the
ACM/SIGDAInternationalSymposiumonFieldProgrammable
Gate Arrays, pp. –, February .
[] M. Hiromoto, H. Sugano, and R. Miyamoto, “Partially paral-
lel architecture for AdaBoost-based detection with Haar-like
features, IEEE Transactions on Circuits and Systems for Video
Technolog y,vol.,no.,pp.,.
[] H.-C.Lai,M.Savvides,andT.Chen,“ProposedFPGAhardware
architecture for high frame rate (> fps) face detection
using feature cascade classiers, in Proceedings of the 1st IEEE
International Conference on Biometrics: eory, Applications,
and Systems (BTAS ’07), pp. –, September .
[] C. Gao and S.-L. Lu, “Novel FPGA based haar classier face
detection algorithm acceleration, in Proceedings of the Interna-
tional Conference on Field Programmable Logic and Applications,
pp. –, September .
[] C. Kyrkou and T. eocharides, A exible parallel hardware
architecture for AdaBoost-based real-time object detection,
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems,vol.,no.,pp.,.
[] R. C. Luo and H. H. Liu, Design and implementation of
ecient hardware solution based sub-window architecture of
Haar classiers for real-time detection of face biometrics, in
Proceedings of the IEEE International Conference on Mechatron-
ics and Automation (ICMA 10),pp.,August.
[] A. Bigdeli, C. Sim, M. Biglari-Abhari, and B. C. Lovell, “Face
detection on embedded systems, in Embedded Soware and
Systems, pp. –, Springer, Berlin, Germany, .
[] Y. Wei, X. Bing, and C. Chareonsak, “FPGA implementation
of AdaBoost algorithm for detection of face biometrics, in
Proceedings of the IEEE International Workshop on Biomedical
Circuits and Systems, pp. S/–-, December .
[] M.Yang,Y.Wu,J.Crenshaw,B.Augustine,andR.Mareachen,
“Face detection for automatic exposure control in handheld
camera, in Proceedings of the 4th IEEE International Conference
on Computer Vision Systems (ICVS 06), p. , IEEE, January
.
[] V.Nair,P.-O.Laprise,andJ.J.Clark,“AnFPGA-basedpeople
detection system,EURASIP Journal on Advances in Signal
Processing,vol.,no.,pp.,.
[] R. Lienhart and J. Maydt, An extended set of Haar-like features
for rapid object detection, in Proceedings of the International
Conference on Image Processing (ICIP ’02),vol.,pp.I-I-
, IEEE, September .
International Journal of
Aerospace
Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2010
Robotics
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Active and Passive
Electronic Components
Control Science
and Engineering
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
International Journal of
Rotating
Machinery
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
Journal of
Engineering
Volume 2014
Submit your manuscripts at
http://www.hindawi.com
VLSI Design
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Shock and Vibration
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Civil Engineering
Advances in
Acoustics and Vibration
Advances in
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Electrical and Computer
Engineering
Journal of
Advances in
OptoElectronics
Hindawi Publishing Corporation
h
ttp://www.hindawi.com
Volume 2014
The Scientic
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Sensors
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
h
ttp://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Chemical Engineering
International Journal of
Antennas and
Propagation
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Navigation and
Observation
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Distributed
Sensor Networks
International Journal of
... It is particularly effective for detecting objects with well-defined features, such as faces. Fig. 1 illustrates the workings of the Haar cascade algorithm [14]. Fig. 1 Working of Haar-cascade algorithm [14] The classifier is trained using positive and negative samples to learn the distinguishing characteristics of the target object and can then be used to detect and localise instances of the object in new images or frames. ...
... Fig. 1 illustrates the workings of the Haar cascade algorithm [14]. Fig. 1 Working of Haar-cascade algorithm [14] The classifier is trained using positive and negative samples to learn the distinguishing characteristics of the target object and can then be used to detect and localise instances of the object in new images or frames. ...
... Software solutions use optimized implementations on OpenCV (Open Source Computer Vision Library), the library for image processing originally developed by Intel; or have translated the algorithms to GPUs (Fredj et al., 2020;Jain & Patel, 2016). However, the optimized code does not provide good results on embedded platforms (Kim et al., 2015). An alternative to exploit parallelization is to use a hardware approach that speeds up the computations using an application-specific design. ...
... The synthesized detector divides the cascade into three sequential stages containing 9, 16, and 200 Haar-like features respectively. The parallelization of the computations within the stages that compose the single cascade ( Fig. 2a) has been repeated by other subsequent work (Kim et al., 2015;Lai et al., 2007). Taking into consideration that only former weak classifiers on the cascade are almost always executed, Hiromoto et al. (2007) proposed a partially parallel approach. ...
Article
The detection of a person’s eyes is a basic task in applications as important as iris recognition in biometric identification or fatigue detection in driving assistance systems. Current commercial and research systems use software frameworks that require a dedicated computer, whose power consumption, size and price are significantly large. This paper presents a hardware-based embedded solution for eye detection in real-time. From an algorithmic point-of-view, the popular Viola–Jones approach has been redesigned to enable highly parallel, single-pass image-processing implementation. Synthesized and implemented in an All-Programmable System-on-Chip (AP SoC), this proposal allows us to process more than 88 frames per second (fps), taking the classifier less than 2 ms per image. Experimental validation has been successfully addressed in an iris recognition system that works with walking subjects. In this case, the prototype module includes a CMOS digital imaging sensor providing 16 Mpixels images, and it outputs a stream of detected eyes as 640 × 480 images. Experiments for determining the accuracy of the proposed system in terms of eye detection are performed in the CASIA-Iris-distance V4 database. Significantly, they show that the accuracy in terms of eye detection is 100%.
... Schematic depiction of the detection cascade[20]. T: True; F: False. ...
Article
Full-text available
In this paper, a mechatronics system was designed and implemented to include the subjects of artificial intelligence, control algorithms, robot servo motor control, and human-machine interface (HMI). The goal was to create an inexpensive, multi-functional robotics lab kit to promote students’ interest in STEM fields including computing and mechtronics. Industrial robotic systems have become vastly popular in manufacturing and other industries, and the demand for individuals with related skills is rapidly increasing. Robots can complete jobs that are dangerous, dull, or dirty for humans to perform. Recently, more and more collaborative robotic systems have been developed and implemented in the industry. Collaborative robots utilize artificial intelligence to become aware of and capable of interacting with a human operator in progressively natural ways. The work created a computer vision-based collaborative robotic system that can be controlled via several different methods including a touch screen HMI, hand gestures, and hard coding via the microcontroller integrated development environment (IDE). The flexibility provided in the framework resulted in an educational lab kit with varying levels of difficulty across several topics such as C and Python programming, machine learning, HMI design, and robotics. The hardware being used in this project includes a Raspberry Pi 4, an Arduino Due, a Braccio Robotics Kit, a Raspberry Pi 4 compatible vision module, and a 5-inch touchscreen display. We anticipate this education lab kit will improve the effectiveness of student learning in the field of mechatronics.
... Then, Viola and Jones did a development that led to the creation of the Haar-Like feature. The Haar-Like feature processes images in the form of boxes, which in a box consists of a number of pixels, each square which is then processed and looks for differentiating values that indicate dark areas and bright areas [21]. These values will then be the basis for image processing [22]. ...
Article
Coronavirus Disease (COVID-19) is a new virus variant that emerged in 2019. The World Health Organization (WHO) states that 394,381,395 people have been infected with COVID-19, and 5,735,178 have died. This epidemic has been found in Indonesia since March 2020. New cases in Indonesia are still increasing every day as a whole. The Government as a policy has imposed a policy on anyone who will be required to wear a mask and also carry out physical distancing so that they can work without the maker being exposed to the virus. In the midst of a pandemic, the use of masks has increased to prevent transmission. Various types of masks are easy to find, but not all masks are recommended to avoid transmission. Among them are the N-95 masks, which are recommended to prevent transmission. This application uses the haar cascade and naive bayes methods. The pycharm edition 2021.2 tools and python 3.8 are the detection systems used in this mask. The haar cascade method is also used in detecting objects with masks or not and naive Bayes, which is used as an accuracy calculation. This study uses a dataset of 1092, which is divided into 192 positive images and 900 negative images. Accuracy results using the haar cascade method are 100% more accurate, while the nave Bayes method is 76.6% less accurate.
... Schematic depiction of the detection cascade[20]. T: True; F: False. ...
Conference Paper
Full-text available
In this paper, a mechatronics system was designed and implemented to include the subjects of artificial intelligence, control algorithms, robot servo motor control, and human-machine interface (HMI). The goal was to create an inexpensive, multi-functional robotics lab kit to promote students' interest in STEM fields including computing and mechtronics. Industrial robotic systems have become vastly popular in manufacturing and other industries, and the demand for individuals with related skills is rapidly increasing. Robots can complete jobs that are dangerous, dull, or dirty for humans to perform. Recently, more and more collaborative robotic systems have been developed and implemented in the industry. Collaborative robots utilize artificial intelligence to become aware of and capable of interacting with a human operator in progressively natural ways. The work created a computer vision-based collaborative robotic system that can be controlled via several different methods including a touch screen HMI, hand gestures, and hard coding via the microcontroller integrated development environment (IDE). The flexibility provided in the framework resulted in an educational lab kit with varying levels of difficulty across several topics such as C and Python programming, machine learning, HMI design, and robotics. The hardware being used in this project includes a Raspberry Pi 4, an Arduino Due, a Braccio Robotics Kit, a Raspberry Pi 4 compatible vision module, and a 5-inch touchscreen display. We anticipate this education lab kit will improve the effectiveness of student learning in the field of mechatronics.
Article
Full-text available
The stage before the conversion of agricultural products into post-harvest consumer products is the process of separating the raw products into appropriate classes. Today, this difficult manual separating process is a process in which a large number of workers work at an intense pace on the product line and the workforce is intensively spent. Disruptions in separating as a result of carelessness cause product loss, loss of time and cost increases. In this study, as an alternative to manual separating processes, a real-time separating system, which detects the products in the factory band with object recognition methods and enables fast positioning of the separating tool on the products, works simultaneously with object recognition and traveling salesman problem algorithms has been created. In this way, a low-budget separating system is recommended for large selecting processes with a time- and cost-effective selecting model. In the study, the creation of a real-time fast separating system with the support of the traveling salesman algorithm, performance evaluation and research and findings on the fast separating model are presented.
Chapter
Every place be it a household or organization, big or small, like banks have something that needs to be secured to ensure efficient operations and management. Security is always a concern as what is being protected is valuable. Security systems based on singular or multiple biometrics such as face, voice, iris, fingerprint and palm along with things being carried in person such as RFID card or security key(s) are used along with or instead of pin, password based existing lock systems is mostly used because of the uniqueness and added layer of security provided by the aforementioned features. But the implementation of these features alone is not sufficient to thwart any malicious actions to gain access to a secure location, due to rise in technology capable of beating/bypassing said security systems. Thus, this paper proposes a robust security system that will be take care of security requirements of any location that might contain something valuable and to be retrofitted with the problems prevailing in the present systems. The proposed system is capable of detecting & recognizing a person’s face, their emotion based on facial expression, the liveliness factor of their face to determine physical presence, identifying the speaker along with a word/phrase in their speech and detecting factors in the surrounding environment that may threaten a user. The system is designed in a way such that anyone who wants to enter/access a secure location has to pass through all of these layers like password, facial recognition, facial emotion recognition, facial liveliness recognition, speaker recognition, speaker phrase detection, and environmental threat detection etc. of security working in unison, none of which can be bypassed easily. All the sensors for detecting, identifying and recognizing said biometric features are securely connected to a singular security device to ensure success of this goal.KeywordsSecurityFaceFacial expression and facial liveliness detectionRecognitionSpeaker and speaker safety phrase detectionRecognitionEnvironmental threat detectionRecognitionSmart device
Article
Full-text available
Pandemi COVID-19 telah menyebabkan ribuan manusia terinfeksi virus COVID-19 dan meninggal dunia. Pada saat ini telah di berlakukan peraturan di mana setiap orang wajib menggunakan masker dan melakukan physical distancing pada saat keluar rumah. Penggunaan masker dan physical distancing telah beroperasi pada lingkungan perusahaan yang mewajibkan karyawannya menggunakan masker sebelum masuk ke kantor. Agar kebiasaan disiplin menggunakan masker di kantor dapat berjalan dengan baik, maka dibuatlah sistem deteksi masker dengan metode haar cascade pada era new normal COVID-19. Sistem deteksi masker ini menggunakan tools pycharm community edition 2020 dan python 3.8 module docs. Metode haar cascade digunakan untuk mendeteksi objek bermasker dan tidak bermasker. Hasil dari penelitian ini adalah sistem dapat mendeteksi orang yang menggunakan masker dan alarm akan berbunyi jika ada salah satu karyawan yang tidak menggunakan masker pada saat di dalam kantor.
Article
Full-text available
The emergence of the COVID-19 pandemic has had a lasting impact on countries around the world since 2019. Face mask detection has been a significant advance in the field of image processing and deep learning studies. Many face detection models have been designed using different algorithms and techniques. The approach proposed in this proposal is developed to prevent people without masks from entering public places (i.e. Malls, Universities, Offices, …etc) by detecting face masks using deep learning, TensorFlow, Keras, and OpenCV methods and sending a signal to the connected Arduino device. to the door to be opened. To detect a person's face in real-time and identify whether the person is wearing a mask or not. This method uses datasets collected from various sources.
Article
Full-text available
Human face processing and evaluation is a problem due to variations in orientation, size, illumination, expression, and disguise. The goal of this work is threefold. First, we aim to show that the variant of polar transformation can be used to register face images against changes in pose and size. Second, implementation of fusion of thermal and visual face images in the wavelet domain to handle illumination and disguise and third, principal component analysis is applied in order to tackle changes due to expressions up to a particular extent of degrees. Finally, a multilayer perceptron has been used to classify the face image. Several techniques have been implemented here to depict an idea about improvement of results. Methods started from the simplest design, without registration; only combination of PCA and MLP as a method for dimensionality reduction and classification respectively to the range of adaptive polar registration, fusion in wavelet transform domain and final classification using MLP. A consistent increase in recognition performance has been observed. Experiments were conducted on two separate databases and results yielded are very much satisfactory for adaptive polar registration along with fusion of thermal and visual images in the wavelet domain.
Article
Full-text available
In this paper, a novel approach for head pose estimation in gray-level images is presented. In the proposed algorithm, two techniques were employed. In order to deal with the large set of training data, the method of Random Forests was employed; this is a state-of-the-art classification algorithm in the field of computer vision. In order to make this system robust in terms of illumination, a Binary Pattern Run Length matrix was employed; this matrix is combination of Binary Pattern and a Run Length matrix. The binary pattern was calculated by randomly selected operator. In order to extract feature of training patch, we calculate statistical texture features from the Binary Pattern Run Length matrix. Moreover we perform some techniques to real-time operation, such as control the number of binary test. Experimental results show that our algorithm is efficient and robust against illumination change.
Article
Full-text available
Recently, many large organizations have multiple data sources (MDS') distributed over different branches of an interstate company. Local patterns analysis has become an effective strategy for MDS mining in national and international organizations. It consists of mining different datasets in order to obtain frequent patterns, which are forwarded to a centralized place for global pattern analysis. Various synthesizing models [2,3,4,5,6,7,8,26] have been proposed to build global patterns from the forwarded patterns. It is desired that the synthesized rules from such forwarded patterns must closely match with the mono-mining results (i.e., the results that would be obtained if all of the databases are put together and mining has been done). When the pattern is present in the site, but fails to satisfy the minimum support threshold value, it is not allowed to take part in the pattern synthesizing process. Therefore, this process can lose some interesting patterns, which can help the decider to make the right decision. In such situations we propose the application of a probabilistic model in the synthesizing process. An adequate choice for a probabilistic model can improve the quality of patterns that have been discovered. In this paper, we perform a comprehensive study on various probabilistic models that can be applied in the synthesizing process and we choose and improve one of them that works to ameliorate the synthesizing results. Finally, some experiments are presented in public database in order to improve the efficiency of our proposed synthesizing method.
Article
Full-text available
Recent technological advances provide the opportunity to use large amounts of multimedia data from a multitude of sensors with different modalities (e.g., video, text) for the detection and characterization of criminal activity. Their integration can compensate for sensor and modality deficiencies by using data from other available sensors and modalities. However, building such an integrated system at the scale of neighborhood and cities is challenging due to the large amount of data to be considered and the need to ensure a short response time to potential criminal activity. In this paper, we present a system that enables multi-modal data collection at scale and automates the detection of events of interest for the surveillance and reconnaissance of criminal activity. The proposed system showcases novel analytical tools that fuse multimedia data streams to automatically detect and identify specific criminal events and activities. More specifically, the system detects and analyzes series of incidents (an incident is an occurrence or artifact relevant to a criminal activity extracted from a single media stream) in the spatiotemporal domain to extract events (actual instances of criminal events) while cross-referencing multimodal media streams and incidents in time and space to provide a comprehensive view to a human operator while avoiding information overload. We present several case studies that demonstrate how the proposed system can provide law enforcement personnel with forensic and real time tools to identify and track potential criminal activity.
Conference Paper
Full-text available
Face recognition has received a substantial attention from industry and also academics. The improvement in the image sensors has further boosted the performance of the face recognition algorithm in real-world scenarios. In this paper, we evaluate the strength of the light-field camera for for face recognition applications. The main advantage of a light-field camera is that, it can provide different focus (or depth) images in a single capture which is not possible with a conventional 2D camera. We first collected a new face dataset using both the light-field and a conventional camera by simulating real-world scenarios. We then propose a new scheme to select the best focused face image from a set of focus images rendered by the light-field camera. Extensive experiments are carried out on our new face dataset to bring out the merits and demerits of employing the light-field camera for face recognition applications.
Article
As a growing number of individuals are exposed to surveillance cameras, the need to prevent captured videos from being used inappropriately has increased. Privacyrelated information can be protected through video encryption during transmission or storage, and several algorithms have been proposed for such purposes. However, the simple way of evaluating the security by counting the number of brute-force trials is not proper for measuring the security of video encryption algorithms, considering that attackers can devise specially crafted attacks for specific purposes by exploiting the characteristics of the target video codec. In this paper, we introduce a new attack for recovering contour information from encrypted H.264 video. The attack can thus be used to extract face outlines for the purpose of personal identification. We analyze the security of previous video encryption schemes against the proposed attack and show that the security of these schemes is lower than expected in terms of privacy protection. To enhance security, an advanced block shuffling method is proposed, an analysis of which shows that it is more secure than the previous method and can be an improvement against the proposed attack.
Article
This paper presents a hierarchical deformable model for robust human face detection, especially with occlusions and under low resolution. By parsing, we mean inferring the parse tree (a configuration of the proposed hierarchical model) for each face instance. In modeling, a three-layer hierarchical model is built consisting of six nodes. For each node, an active basis model is trained, and their spatial relations such as relative locations and scales are modeled using Gaussian distributions. In computing, we run the learned active basis models on testing images to obtain bottom-up hypotheses, followed by explicitly testing the compatible relations among those hypotheses to do verification and construct the parse tree in a top-down manner. In experiments, we test our approach on CMU+MIT face test set with improved performance obtained.
Article
As a growing number of individuals are exposed to surveillance cameras, the need to prevent captured videos from being used inappropriately has increased. Privacy-related information can be protected through video encryption during transmission or storage, and several algorithms have been proposed for such purposes. However, the simple way of evaluating the security by counting the number of brute-force trials is not proper for measuring the security of video encryption algorithms, considering that attackers can devise specially crafted attacks for specific purposes by exploiting the characteristics of the target video codec. In this paper, we introduce a new attack for recovering contour information from encrypted H.264 video. The attack can thus be used to extract face outlines for the purpose of personal identification. We analyze the security of previous video encryption schemes against the proposed attack and show that the security of these schemes is lower than expected in terms of privacy protection. To enhance security, an advanced block shuffling method is proposed, an analysis of which shows that it is more secure than the previous method and can be an improvement against the proposed attack.
Article
Panoramic photography is becoming a very popular and commonly available feature in the mobile handheld devices nowadays. In traditional panoramic photography, the human structure often becomes messy if the human changes position in the scene or during the combination step of the human structure and natural background. In this paper, we present an effective method in panorama creation to maintain the main structure of human in the panorama. In the proposed method, we use an automatic method of feature matching, and the energy map of seam carving is used to avoid the overlapping of human with the natural background. The contributions of this proposal include automated panoramic creation method and it solves the human ghost generation problem in panorama by maintaining the structure of human by energy map. Experimental results prove that the proposed system can be effectively used to compose panoramic photographs and maintain human structure in panorama.
Conference Paper
This paper proposes a real-time face recognition system based on the Compute Unified Device Architecture (CUDA) platform, which effectively completed the face detection and recognition tasks. In the face detection phase with Viola-Jones cascade classifier, we implemented and improved novel parallel methodologies of image integral, calculation scan window processing and the amplification and correction of classifiers. In the face recognition phase, we explored the parallelizing of the algorithm and parallelized some part of the testing phase. Through the optimization of the two important part of face recognition system, the system we proposed make a big difference. The experimental results demonstrate that, in comparison with traditional CPU program, the proposed approach running on an NVidia GTX 570 graphics card could respectively achieve 22.42 times speedup in detection phase and 1668.56 times speedup in recognition phase when only training 2000 images and testing 40 images compared with the CPU program running on an Intel core i7 processor. The recognition speed will increase until it reaches the hardware resource limit. It shows that the system we proposed achieves a good real-time performance.