A visual shape descriptor using sectors and shape context of contour lines.
ABSTRACT This paper describes a visual shape descriptor based on the sectors and shape context of contour lines to represent the image local features used for image matching. The proposed descriptor consists of two-component feature vectors. First, the local region is separated into sectors and their gradient magnitude and orientation values are extracted; a feature vector is then constructed from these values. Second, local shape features are obtained using the shape context of contour lines. Another feature vector is then constructed from these contour lines. The proposed approach calculates the local shape feature without needing to consider the edges. This can overcome the difficulty associated with textured images and images with ill-defined edges. The combination of two-component feature vectors makes the proposed descriptor more robust to image scale changes, illumination variations and noise. The proposed visual shape descriptor outperformed other descriptors in terms of the matching accuracy: 14.525% better than SIFT, 21% better than PCA-SIFT, 11.86% better than GLOH, and 25.66% better than the shape context.
- [show abstract] [hide abstract]
ABSTRACT: This paper is integrated with discrete wavelet transformation (DWT), self-organizing map (SOM) neural network and fuzzy logic control methods to approach the intelligent vision-based car-like vehicle backing system. All the presented procedures are implemented as software simulations and are synthesized as hardware to describe their superiorities in our experimental platform. In a word, this study has been implemented as a car-like vehicle consists of a field programmable gate array (FPGA) chip, a CMOS image sensor, a microprocessor, a flash memory module, and servo motors. From our experimental results, this paper not only illustrates the results with computer simulation, but also demonstrates the practical car parking achievements in a real environment.Expert Syst. Appl. 01/2009; 36:7500-7509.
Conference Proceeding: An Integrated UAV Navigation System Based on Aerial Image Matching[show abstract] [hide abstract]
ABSTRACT: The aim of this paper is to explore the possibility of using geo-referenced satellite or aerial images to augment an Unmanned Aerial Vehicle (UAV) navigation system in case of GPS failure. A vision based navigation system which combines inertial sensors, visual odometer and registration of a UAV on-board video to a given geo-referenced aerial image has been developed and tested on real flight-test data. The experimental results show that it is possible to extract useful position information from aerial imagery even when the UAV is flying at low altitude. It is shown that such information can be used in an automated way to compensate the drift of the UAV state estimation which occurs when only inertial sensors and visual odometer are used.Aerospace Conference, 2008 IEEE; 04/2008
Conference Proceeding: A consensus-based fusion algorithm in shape-based image retrieval[show abstract] [hide abstract]
ABSTRACT: Shape-based image retrieval techniques are among the most successful content-based image retrieval (CBIR) approaches. In recent years, the number of shape-based image retrieval techniques has dramatically increased; however, each technique has both advantages and shortcomings. This paper proposes a consensus-based fusion algorithm to integrate several shape-based image retrieval techniques so as to enhance the performance of the image retrieval process. In this algorithm, several techniques work as a team: they exchange their ranking information based on pair-wise co-ranking to reach a consensus that will improve their final ranking decisions. Although the proposed algorithm handles any number of CBIR techniques, only three common techniques are used to demonstrate its effectiveness. Several experiments were conducted on the widely used MPEG-7 database. The results indicate that the proposed fusion algorithm significantly improves the retrieval process.Information Fusion, 2007 10th International Conference on; 08/2007
A visual shape descriptor using sectors and shape context of contour lines
Shao-Hu Penga, Deok-Hwan Kima,*, Seok-Lyong Leeb, Chin-Wan Chungc
aDepartment of Electronic Engineering, Inha University, South Korea
bSchool of Industrial and Management Engineering, Hankuk University of Foreign Studies, South Korea
cDivision of Computer Science, Department of EECS, KAIST, South Korea
a r t i c l ei n f o
Received 15 May 2008
Received in revised form 20 January 2010
Accepted 26 April 2010
a b s t r a c t
This paper describes a visual shape descriptor based on the sectors and shape context of
contour lines to represent the image local features used for image matching. The proposed
descriptor consists of two-component feature vectors. First, the local region is separated
into sectors and their gradient magnitude and orientation values are extracted; a feature
vector is then constructed from these values. Second, local shape features are obtained
using the shape context of contour lines. Another feature vector is then constructed from
these contour lines. The proposed approach calculates the local shape feature without
needing to consider the edges. This can overcome the difficulty associated with textured
images and images with ill-defined edges. The combination of two-component feature vec-
tors makes the proposed descriptor more robust to image scale changes, illumination vari-
ations and noise. The proposed visual shape descriptor outperformed other descriptors in
terms of the matching accuracy: 14.525% better than SIFT, 21% better than PCA-SIFT,
11.86% better than GLOH, and 25.66% better than the shape context.
? 2010 Elsevier Inc. All rights reserved.
Image matching plays an important role in intelligent systems, such as human face recognition [8,19,21], intelligent cars
[1,9] and unmanned air vehicles [2,17]. Some of the most important considerations in image matching are the robustness
with respect to the image scale, illumination and noise. Interest region identification and descriptor computation have pro-
ven successful in overcoming these problems, and they are used widely in object recognition [4,6,13,18] and image retrieval
For image matching, the interest regions are first identified through the use of effective detectors, such as Harris-Laplace
 and difference-of-Gaussian (DoG) . Local descriptors are then calculated for the interest region to represent the im-
age features. Finding effective local descriptors is very important not only for the matching accuracy but also for the match-
ing efficiency. Several effective descriptors, such as the scale-invariant feature transform (SIFT)  and shape context ,
have been developed in recent years.
The SIFT proposed by Lowe is designed for robust image feature detection, and has been proven to be invariant with re-
spect to scale, noise and affine transformations. Several new descriptors based on SIFT have been proposed to improve the
matching performance, for example PCA-SIFT , GLOH , etc. These were evaluated by Mikolajczyk and Schmid, who
demonstrated that these new descriptors perform well under scale transformations, illumination variation and noise.
0020-0255/$ - see front matter ? 2010 Elsevier Inc. All rights reserved.
* Corresponding author. Address: #814 High-Tech, Inha University, Yonghyun-dong, Nam-gu Incheon 402 751, South Korea. Tel.: +82 32 860 7424; fax:
+82 32 868 3654.
E-mail addresses: firstname.lastname@example.org (S.-H. Peng), email@example.com (D.-H. Kim), firstname.lastname@example.org (S.-L. Lee), email@example.com (C.-W.
Information Sciences 180 (2010) 2925–2939
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/ins
This paper presents a visual shape descriptor based on the sectors and shape context of contour lines. Fig. 1 shows a sche-
matic of the process used to compute this descriptor.
The scale-space extreme points are first detected and the keypoint is located by removing the unstable ones. The keypoint
orientation is then assigned. For feature calculation, a circular region is then set around each keypoint. Finally, a sector-based
feature vector and a feature vector based on shape context of contour lines are calculated. Both vectors are then combined to
form a new descriptor.
For the sector-based feature vector, the region is separated into sectors of equal radii and angles similar to log-polar .
The gradient magnitude and orientation of the pixels in each sector are calculated and mapped to a gradient histogram. A
feature vector is then generated using the histograms from all sectors around the keypoint. For the feature vector based
on the shape context of contour lines, the gray value of the keypoint is projected into the sea level of the geographical
map. The altitude of each point in the circular region is then obtained by comparing the gray value of the point to that of
the keypoint. Contour lines are then formed by grouping the points of similar altitude and the shape context of each contour
line in the region is obtained. Similarly, the feature vector is generated using the shape context histograms of all contour
lines. Finally, both feature vectors are combined to form a new descriptor.
The main contribution of the proposed method is as follows:
? The dimensionality of the descriptor is reduced using the sectors to calculate the gradient magnitude and orientation.
? The shape features of the region without edge detection are obtained using the shape context of the contour lines.
A combination of the above feature vectors makes the proposed descriptor more robust with respect to image scale, illu-
mination and noise. This new descriptor can achieve better matching accuracy and lower dimensionality than the SIFT
The remainder of this paper is organized as follows: Section 2 presents the background of the paper. Section 3 introduces
the new approach. The experiments and conclusions are discussed in Sections 4 and 5, respectively.
The SIFT descriptor proposed by Lowe  is used extensively for image matching. It includes four stages: scale-space
extrema point detection, keypoint localization, orientation assignment, and descriptor formation.
In the first stage, the image is smoothed by the Gaussian kernel of different scales and then down-sampled to construct a
Gaussian pyramid (Fig. 2). A DoG pyramid is then built by subtracting the adjacent smoothed images in the same octave.
Finally, the value of each point in the DoG image is compared with the other points around it. A point of which value is either
the maximum or minimum is determined to be a keypoint.
In the second stage, either the keypoints with low contrast or keypoints close to the edge of the image are pruned to
achieve stable keypoints.
In the third stage, one or more consistent orientations are assigned for each keypoint to achieve invariance of image
Feature vector based on contour lines Feature vector based on sectors
Circular region setting
Calculation of gradient
magnitude and orientation
Gradient histogram generation
Sea level mapping for keypoint
Altitude formation for the region
Contour lines grouping
Shape context histogram generation
New shape descriptor
Fig. 1. The process of generating the new descriptor.
S.-H. Peng et al./Information Sciences 180 (2010) 2925–2939
In the fourth stage, the region around the keypoint is separated into 16 blocks and the descriptor is calculated in these
blocks. The gradient magnitude and orientation for each pixel in each block are then calculated. The position and orientation
of the point are rotated to the orientation of the keypoint. The descriptor is thus robust with respect to the image rotation.
Finally, the gradient magnitude is projected onto an 8-bin histogram for the orientation (Fig. 3).
Since there are 16 blocks in the region and 8 orientation bins for each block, the dimensionality of the SIFT descriptor is
Mortensen et al. introduced a new SIFT descriptor with a global context , which combined the SIFT descriptor with
curvilinear shape information to improve the matching accuracy. The curvilinear shape information is collected by comput-
ing the maximum curvature at each pixel in a region larger than the one used in SIFT. Since the curvilinear information is
calculated from a larger region, the level of mismatch is reduced when there are multiple similar descriptors in the same
image. However, its dimensionality is 188 compared to 128 for the SIFT descriptor.
The PCA-SIFT descriptor  was introduced by Kel and Sukthankar to reduce the dimensionality and improve the match-
ing accuracy. However, experiments showed that its matching accuracy was low in some situations . Mikolajczyk and
Schmid proposed a new descriptor based on the gradient location-orientation histogram (GLOH) . They calculated the
descriptor in log-polar coordinates with three bins in the radial direction (the radii were set to 6, 11, and 15) and 8 bins
in the angular direction, for a total of 17 location bins. The gradients were separated into 16 bins, which led to a descriptor
dimensionality of 272. Finally, PCA technology was used to reduce the dimensionality to 128.
3. The visual shape descriptor
The proposed local descriptor algorithm uses scale-space DoG extrema detection steps similar to the SIFT detection algo-
After detecting all keypoints, a visual shape descriptor was constructed. The descriptor consists of a two-component fea-
ture vector, which includes the gradient magnitudes and orientations of the sectors as well as the shape context of contour
lines around the keypoint. Therefore, the proposed descriptor can be defined as follows:
ð1 ? wÞFSectors
is the smooth scale of octave 1,
Octave 1Octave 2
Fig. 2. The process of building the Gaussian pyramid.
Fig. 3. The process of the descriptor calculation.
S.-H. Peng et al./Information Sciences 180 (2010) 2925–2939
where FSectorsis a 64-dimensional feature vector based on sectors, FContourLinesis a 48-dimensional feature vector based on the
local shape context, and w is the weight of two components according to user’s preference . Hence, the proposed descrip-
tor algorithm consists of three parts:
1. Find the feature vector using the gradient magnitude and orientation of pixels in the sectors around the keypoint.
2. Find the feature vector using the shape context of the contour lines in a circular region centered at the keypoint.
3. Combine both feature vectors to construct the visual shape descriptor.
First, a circular region is defined from which a two-component feature vector is extracted. Since the keypoint is detected
by the DoG detector , within different scales, the size of the circular region is defined as being relative to the scale of the
keypoint. The keypoint is also set as the center of the circular region because it is the most stable point in that region. There-
fore, the radius of the circular region is set by the following equation:
R ¼ D ? S;
where D is an experimentally determined multiplier, and S is a smoothing scale of the octave that the keypoint belongs to.
Fig. 4 shows the circular region centered at the keypoint.
3.1. Feature vector based on sectors
In this section, the gradient magnitude and the orientation are extracted in Cartesian coordinates. However, instead of
using Cartesian coordinates to separate the local region into 4-by-4 sub-regions , the local region is separated according
to the sectors to reduce the dimensionality of the descriptor. Since the points close to the keypoint are more important than
those farther away from it, the former can be viewed within small grids, whereas the latter can be viewed within large grids
using polar coordinates to separate the circular region. Furthermore, a Gaussian weighting function with r equal to R is as-
signed to each point to emphasize that the points are close to the keypoint. Therefore, the size of the grid can be enlarged to
reduce the dimensionality of the descriptor without degrading its performance.
The circular region is separated into N sectors. A sector is a sub-region divided by the same angle and radius (Fig. 5). The
gradient magnitudes and orientations of the points in the sector are then calculated to form a histogram with M bins. The
gradient magnitude and orientation can be computed as follows:
hðx;yÞ ¼ tan?1ðLðx;y þ 1Þ ? Lðx;y ? 1Þ
mðx;yÞ ¼ðLðx þ 1;yÞ ? Lðx ? 1;yÞÞ2þ ðLðx;y þ 1Þ ? Lðx;y ? 1ÞÞ2
Lðx þ 1;yÞ ? Lðx ? 1;yÞÞ;
where x and y are the Cartesian coordinates of a point, m(x,y) is its gradient magnitude, h(x,y) is its orientation, and L(x,y) is
its gray value. For each point in the sector, its gradient magnitude is mapped to the histogram bins corresponding to its ori-
entation. Note that the position and orientation of the point are aligned to the orientation of the keypoint, invariance with
respect to the image rotation. Finally, all histograms of N sectors around the keypoint are concatenated to form the feature
vector FSectorswith N*M-dimension.
3.2. The shape context of the contour lines
Mikolajczyk and Schmid  showed that the shape context  achieved high performance for the images with clear
edges. However, its performance suffered when applied to textured scenes or ill-defined edge images. Our innovative ap-
proach utilizes the shape context in the circular region without edge detection to overcome the performance problems asso-
ciated with the problematic edges of textured scenes.
Fig. 4. The circular region.
S.-H. Peng et al./Information Sciences 180 (2010) 2925–2939
As discussed in Section 2, the keypoint is detected from the DoG pyramid by comparing its value to the other point values
around it. Clearly, the keypoint is the most stable point in that region. Consider that the region near the keypoint is mapped
to some terrain when all pixels in the region are projected into a geographical map and the gray value of each pixel repre-
sents its altitude in the map. Its altitude increases with increasing value of the point, and vice versa.
Contour lines are useful for representing the terrain characteristics; c.f. geographical maps. Therefore, it is important to
find the contour lines using the altitudes of the pixels in the circular region. The shape context of the contour lines was then
calculated to represent the shape feature of the circular region.
A similar approach to contour lines is adopted, as seen in geographical maps, and the contour line is defined as the dis-
tribution of points of similar altitudes in a given region. Therefore, the contour lines are allowed to intersect, and the points
are not strictly requested to connect to each other. In addition, the line formed by the points does not need to be closed. Fig. 6
shows three contour lines of an image local region, similar to geographical maps.
Fig. 7 highlights the similarity between the gray values of the pixels and the altitudes of the terrain model. The image can
be well-modeled by the ‘‘terrain”. The terrain model of Fig. 7a is shown in Fig. 7c, where the x- and y-axes denote the pixel
position. The z-axis denotes the gray value of each pixel in the image. Similarly, the terrain model of a local region focused in
Fig. 7b is shown in Fig. 7d.
The process of calculating the shape context of the contour lines includes four stages:
3.2.1. Sea level mapping for the keypoint
As mentioned earlier, the keypoint is the most stable point in a circular region. Therefore, the gray value of the keypoint is
set to sea level in the circular region for the terrain model.
Fig. 5. The feature vector based on the sectors.
Fig. 6. Contour lines of the geographical map vs. image terrain.
S.-H. Peng et al./Information Sciences 180 (2010) 2925–2939