Content uploaded by Won-Du Chang
Author content
All content in this area was uploaded by Won-Du Chang on Apr 26, 2022
Content may be subject to copyright.
Object Detection using Histogram of Oriented Gradients
Masahiro Terayama, Jungpil Shin, Won-Du Chang
Graduate School of Computer Science and Engineering, The University of Aizu,
Tsuruga, Ikki-machi Aizuwakamatsu City, Fukushima, 965-8580, Japan
jpshin@u-aizu.ac.jp
Abstract
In this paper, we present object detection using
Histogram of Oriented Gradients. The method of the
person detection is a matching of the source image and
the template image. In step one, the brightness
inclination is calculated from the image, 5x5 pixels
become a cell, and 3x3 cells become a block. In step
two, processing compares a source image and a
template image in accordance with each matching. The
last processing encloses the part recognized as the
person. In order to speed up, we propose three
methods. First, the pixel matching is carried out by
omitting the processing of making to cells and making
to blocks. Next, the cell matching is carried out by
omitting the processing of making to blocks and
changing from pixel movement into cell movement.
Finally, the block matching is carried out by changing
from pixel movement into cell movement. As a result,
the pixel matching and the cell matching were
succeeded to improve speed with the detection rate was
low rate. The block matching was succeeded to
improve speed and to keep good detection rate.
1. Introduction
Object detection uses image processing and
surveillance cameras can detect persons from those
camera images human detection is difficult because of
the changes in illumination, shadows and backgrounds.
A precision detection is required for solving
recognition. Background subtraction is widely used as
a technique for detecting moving objects [1, 2]. First,
the background subtraction is a road image a
background. The background image has no moving
objects. Next, the background image is subtracted from
an image taken in the same place. However, this
technique is weak for changing the shadow and light,
and cannot detect time zone in any place.
In this research, object detection uses the Histogram of
Oriented Gradients (HOG) which Dalal, Triggs
proposed in 2005 [3]. HOG feature vectors detect
histogram of gradient from image brightness. HOG is
effective for changing illumination and shadows.
However, HOG is weak when it comes to changing
scales. The background subtraction uses only stable
cameras because of background images. HOG do not
use background image, therefore HOG can use moving
cameras. HOG is presented recently, so HOG is
applied by various researchers [4]. The goal of this
thesis improves HOG, and makes a faster algorithm.
Chapter 2 explains HOG. Chapter 3 shows the
matching results of detecting human. Chapter 4
presents the result of the experiment. Chapter 5
discusses the results. And Chapter 6 concludes.
2. HOG
Histogram of Oriented Gradients is a feature vector
that is obtained from image brightness. HOG shows
the shape of objects, and sets the gradients of the
adjacent pixels as a histogram for local domain, which
is needed for changing illumination and shadows.
2.1. Brightness Gradient Calculation
The gradient strength (m) and the gradient direction (θ)
and calculated for these pixels of brightness.
22 ),(),(),( yxfyxfyxm yx +=
(1)
=−
),(
),(
tan),( 1yxf
yxf
yx
x
y
(2)
−−+=
−−+=
)1,()1,(),(
),1(),1(),(
yxLyxLyxf
yxLyxLyxf
y
x
(3)
2.2. Making to Cell
We make histograms from a 5x5 pixels region (See
Fig. 1b). The histogram is made for from the
summation of gradient strength having the same
directional-bins. We divide gradient directions into 9
bins (Fig. 2).
2.3. Making to Block
A block is defined with each 3x3 cells. In
regularization, the cell having maximum gradient
strength represents the block.
3. Detection
In this paper, detection is the main subject. We
performed four kinds of matching that uses HOG for
inspecting the processing time and precision.
3.1. Base Matching
Base matching is foundation for HOG. We matched a
template image for every pixel to source images each
pixel. The Matching procedure is as follows:
1. Calculating brightness gradient from the image
of a template.
2. Setting a template image of the first position.
3. Calculating the brightness gradient from the
source image at the template region.
4. Matching each pixel.
5. Moving a template image for each pixel.
6. Continuing the same processing until the image
of template is complete.
The difference between two oriented gradients (A
and B) is calculated with (Eq.4).
))()(cos()()(2
)()( 22
BAmBmAC
CmBmAdiff
−=
−+=
.
(4)
3.2. Pixel Matching
Gradient direction is calculated with (2), we matched a
template image to every pixel of a source image.
Improving speed-up attempted by omitting the
processing of making to cells and making to blocks.
The matching procedure is as follows.
1. Calculating the brightness gradient from the
source image and the template image.
2. Setting the template image of template in the
first position.
3. Matching every pixel.
4. Moving the image of template one pixel.
5. Continuing same processing until the image of
template is complete.
3.3. Cell Matching
The Histogram of Oriented Gradients is calculated, and
we matched a template image of every cell to a source
image of every cell. Improvement is attempted by
leaving out making to blocks and changing from pixel
movement into cell movement. The Matching
procedure is as follows.
1. Calculating the Histogram of Oriented Gradients
from the source image and the image of template.
2. Setting the image of template in the first position.
3. Matching every cell.
4. Moving the image of template one cell.
5. Continuing same processing until the image of
template is complete.
3.4. Block Matching
Fig. 1: (a) source image, (b) cell image
Fig. 2: Sample of a histogram of oriented gradients
Fig. 3: A block consists of 3x3 cells
Using the calculated Blocks, we matched a template
image to every cell source image. Improving speed is
attempted by changing from pixel movement into cell
movement. The matching procedure is as follows.
1. Calculating the Block from the source image and
the template image.
2. Setting the template image in the first position.
3. Matching every block.
4. Moving the template image one cell.
5. Continuing same processing until the template
image is complete.
4. Experiment
The resolution of the source image used was
450x340 (Fig. 4a). The resolution of the template
image used was 35x85 (Fig. 4b). The result of
matching is shown as follows. The Fig. 5 (a) is base
matching (3.1), the Fig. 5 (b) is pixel matching (3.2),
the Fig. 5c is cell matching (3.3), Fig. 5d is block
matching (3.4). The calculation time and detection rate
are shown as Table 1. Next, the resolution of the
source image used was 450x340 (Fig. 6). The template
is same Fig. 4. The result of matching is shown as well
as Fig. 4 (Fig. 5), the result of the calculation time and
detection rate is shown Table 2. The Table 3 is result
of average magnification time and average detection
rate for ten tests. The last experiment is result of eye
detection. Fig. 7 is result for one person. Fig. 8 is result
for many people. The Table 4 is result of average
detection rate for ten tests.
Table1: Calculation Time of Fig. 4
Image
processing
Time
Matching
Detection time
time rate
Base
matching
0.25
28.54
100%
Pixel
matching
0.25
13.23
60.00%
Cell
matching
0.27
0.13
22.20%
Block
matching
0.27
0.032
100%
Table2: Calculation Time of Fig. 5
Image
processing
Time
Matching
Detection time
time rate
Base
matching
0.25
24.13
42.9%
Pixel
matching
0.24
12.75
66.7%
Cell
matching
0.25
0.11
50%
Block
matching
0.25
0.063
42.9%
Table 3: Averege Calculation Time of ten tests
(a) (b)
Fig. 4: (a) source image, (b) template image
(a) (b)
(c) (d)
Fig. 5: (a) base matching, (b) pixel matching, (c) cell
matching, (d) block matching
(a) (b)
(c) (d)
Fig. 6: (a) base matching, (b) pixel matching, (c) cell
matching, (d) block matching
Image
processing
Time
Matching
Detection time
time rate
Base
matching
1.00
1.00
89.3%
Pixel
matching
0.99
1.89
41.0%
Cell
matching
0.97
193.73
19.7%
Block
matching
0.93
684.61
84.3%
Table4: Eye Detection of Average Calculation Time of
ten tests.
Average Detection rate
Base matching
83.4%
Pixel matching
52.7%
Cell matching
45.6%
Block matching
82.1%
5. Discussion
5.1. False Negative
Fig. 9 (a) and (c) are clipping from Fig. 5 and, Fig. 9
(b) is clipping from Fig. 4. The (a) and the (c) did not
detect small people, because HOG is weak to scale
change. Then in the (c), two people detected as one
person. The (b) is false negative for left side person.
5.2. False Positive
Fig. 10 (a) and (b) are clipping from Fig. 4 and, Fig. 10
(c) is clipping from Fig. 5. These images are false
positive. The false positive is recognizing noise as a
person.
5.3. Calculation time
In table 3, the difference was hardly seen at the image
processing time, but the matching time has many
differences. In particular, the difference between pixel
matching and cell matching is more than 100 times. In
addition, the difference between cell matching and
block matching is about 4 times.
5.4. Detection rate
Detection rate is calculated. In table 3, the pixel
matching and the cell matching were bad detection rate.
The block matching indicated a value close to the base
matching. In table 4, the pixel matching and the cell
matching were better rate than table 3. However the
rate was still low. The block matching was a rate high
as well as table 3.
(a) (b)
(c) (d)
Fig. 7: (a) base matching, (b) pixel matching, (c) cell
matching, (d) block matching
(a) (b)
(c) (d)
Fig. 8: (a) base matching, (b) pixel matching, (c) cell
matching, (d) block matching
(a) (b) (c)
Fig. 9: (a) pixel matching, (b) cell matching, (c) block
matching
(a) (b) (c)
Fig. 10: (a) pixel matching, (b) cell matching, (c) block
matching
6. Conclusion
In this paper, we presented three methods to speed up
human detection using Histogram of Oriented
Gradients. The pixel matching and the cell matching
speeded up, but there were bad detection rate. The
block matching succeeded in shorting the matching
time and keeping good detection rate. The pixel
matching detected many false positive. The cell
matching detected many false negative and false
positive. In future work, it is necessary to increase the
template images. Because this program is weak to
change scale, therefore small object is not detected. To
increase the template images can improve to change
scale. The Scale invariant Feature Transform (SIFT) is
strong to change scale. If the HOG and the SIFT were
combined, detection rate will be good. The block
matching was good result; therefore the block
matching can apply to animation processing.
7. References
[1] A. Elgammal, R. Duraiswami, D. Harwood and L.
S. Davis, “Background and Foreground Modeling
Using Nonparametric Kernel Density Estimation for
Visual Surveillance,” Proceeding of the IEEE, vol, 90,
no.7, July 2002.
[2] W.-D. Chang and J. Shin, “Paused Object
Detection with Background Updating,” Proceedings of
the Ninth International Conference on Humans and
Computers, pp.13-18, 2006.
[3] N. Dalal and B. Triggs, “Histograms of Oriented
Gradients for Human Detection,” IEEE Computer
Vision and Pattern Recognition, pp. 886-893, 2005.
[4] Y. Yamauchi, H. Fujiyoshi, B. Hwang and T.
Kanade, “People Detection Based on Cooccurrence of
Appearance and Spatiotemporal Features,” MIRU2007,
pp.1-4, 2007.