Content uploaded by Arun Ross
Author content
All content in this area was uploaded by Arun Ross on Feb 17, 2015
Content may be subject to copyright.
22
A Survey on Ear Biometrics
AYMAN ABAZA, WVHTC Foundation
ARUN ROSS, West Virginia University
CHRISTINA HEBERT, and MARY ANN F. HARRISON, WVHTC Foundation
MARK S. NIXON, University of Southampton
Recognizing people by their ear has recently received significant attention in the literature. Several reasons
account for this trend: first, ear recognition does not suffer from some problems associated with other non-
contact biometrics, such as face recognition; second, it is the most promising candidate for combination with
the face in the context of multi-pose face recognition; and third, the ear can be used for human recognition in
surveillance videos where the face may be occluded completely or in part. Further, the ear appears to degrade
little with age. Even though current ear detection and recognition systems have reached a certain level of
maturity, their success is limited to controlled indoor conditions. In addition to variation in illumination,
other open research problems include hair occlusion, earprint forensics, ear symmetry, ear classification, and
ear individuality.
This article provides a detailed survey of research conducted in ear detection and recognition. It provides
an up-to-date review of the existing literature revealing the current state-of-art for not only those who are
working in this area but also for those who might exploit this new approach. Furthermore, it offers insights
into some unsolved ear recognition problems as well as ear databases available for researchers.
Categories and Subject Descriptors: I.5.4 [Pattern Recognition]: Applications
General Terms: Design, Algorithms, Performance
Additional Key Words and Phrases: Biometrics, ear recognition/detection, earprints, person verifica-
tion/identification
ACM Reference Format:
Abaza, A., Ross, A., Herbert, C., Harrison, M. A. F., and Nixon, M. S. 2013. A survey on ear biometrics. ACM
Comput. Surv. 45, 2, Article 22 (February 2013), 35 pages.
DOI =10.1145/2431211.2431221 http://doi.acm.org/10.1145/2431211.2431221
1. INTRODUCTION
The science of establishing human identity based on the physical (e.g., fingerprints
and iris) or behavioral (e.g., gait) attributes of an individual is referred to as biometrics
[Jain et al. 2004]. Humans have used body characteristics such as face and voice
for thousands of years to recognize each other. In contemporary society, there is a
pronounced interest in developing machine recognition systems that can be used for
A. Abaza is also affiliated with the Biomedical Engineering Department at Cairo University.
A. Abaza, C. Herbert, and M. A. F. Harrison were supported by ONR under contract to WVHTF no. N00014-
09-C-0388. A. Ross was sopported by NSF CAREER grant no. IIS-0642554.
Authors’ addresses: A. Abaza (corresponding author), Advanced Technologies Group, WVHTC Foundation,
Fairmont, WV 26554; email:aabaza@wvhtf.org; A. Ross, Lane Department of Computer Science and Electrical
Engineering, West Virginia University, Morgantown, WV 26506; C, Herbert, M. A. F. Harrison, Advanced
Technologies Group, WVHTC Faoundation, Fairmont, WV 26554; M. S. Nixon, Department of Electronics
and Computer Science and Engineering, University of Southampton, Southampton SO17 1BJ, UK.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or permissions@acm.org.
c
2013 ACM 0360-0300/2013/02-ART22 $15.00
DOI 10.1145/2431211.2431221 http://doi.acm.org/10.1145/2431211.2431221
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:2 A. Abaza et al.
Fig. 1. The ear biometric has tremendous potential when the side profile of a face image is available. In this
example, the ear is much more easily observable than the frontal part of the face.
automated human recognition. With applications ranging from forensics to national
security, biometrics is slowly becoming an integral part of modern society. The most
common biometric systems are those based on characteristics that have been commonly
used by humans for identification, such as fingerprint and face images, which have
the largest market share. More recently, the iris biometric has been used in large-scale
identity management systems such as border control applications. However, many
other human characteristics are also being studied as possible biometric cues for
human recognition. The ear structure is one such biometric cue, since the geometry
and shape of the ear has been observed to have significant variation among individuals
[Jain et al. 2004]. It is a prominent visible feature when the face is viewed in profile
and, consequently, is readily collectable from video recording or photography. Figure 1
shows the side profile of some individuals where the ears are in a very clear pose
compared to their frontal face.
In this article, we survey the current literature and outline the scientific work
conducted in ear biometrics as well as clarify some of the terminology that has
been introduced. There are existing surveys on ear biometrics including the ones by
Lammi [2004], Pun and Moon [2004], and Islam et al. [2007]. Choras [2007] described
feature extraction methods for ear biometric systems. Our goal for this survey is to
expand on previous surveys by:
(1) including more than fifty ear publications from 2007–2010 that were not discussed
in the previous surveys,
(2) adding references to available databases that are suitable for ear recognition stud-
ies,
(3) highlighting ear performance in multibiometric systems, and
(4) listing the open research problems in ear biometrics.
This article is organized as follows: Section 2 presents background information about
ear anatomy, history of ear recognition in forensics, a brief description of a typical ear
biometric system, and an overview of preliminary attempts that were made to create
a working system; Section 3 presents most of the ear databases available for research;
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:3
Fig. 2. The human ear develops from auricular hillocks (center) that arise in the 5th week of embryonic
development. Between the 6th (left) and 9th (right) weeks of development, the hillocks enlarge, differentiate,
and fuse to form the structures of the outer ear. Additionally, the ear translocates from the side of the
neck to a more cranial and lateral site. This figure is taken from https://syllabus.med.unc.edu/courseware/
embryo images/unit-ear/ear htms/ear014.htm with permission of the School of Medicine, University of North
Carolina.
Section 4 presents a survey of the various ear detection methods; the various feature
extraction methods discussed in the literature are presented in Section 5; Section 6
discusses the role of the ear in multibiometric systems; and, finally, Section 7 highlights
some of the open research areas in the field.
2. BACKGROUND
2.1. Ear Anatomy and Development
The ear starts to appear between the fifth and seventh weeks of pregnancy. At this
stage, the embryo’s face takes on more definition as a mouth perforation, nostrils,
and ear indentations become visible. Though there is still disagreement as to the
precise embryology of the external ear [ArbabZavar and Nixon 2011], the overall ear
development during pregnancy is as follows.1
(1) The embryo develops initial clusters of embryonic cells that serve as the foundation
from which a body part or organ develops. Two of these clusters, termed the first
and second pharyngeal arches, form six tissue elevations called auricular hillocks
during the fifth week of development. Figure 2 (center) shows a sketch of the embryo
with the six auricular hillocks, labeled 1 through 6. Figure 2 (left) shows the growth
and development of the hillocks after the sixth week of embryonic development.
(2) In the seventh week, the auricular hillocks begin to enlarge, differentiate, and fuse,
producing the final shape of the ear, which is gradually translocated from the side
of the neck to a more cranial and lateral site. By the ninth week, shown in Figure 2
(right), the morphology of the hillocks is recognizable as a human ear. Hillocks 1–3
form the first arch of the ear (tragus, helix, and cymba concha), while hillocks 4–6
form the second arch of the ear (antitragus, antihelix, and concha).
The external anatomy of the ear2is illustrated in Figure 3. The forensic science
literature reports that ear growth after the first four months of birth is highly linear
[Iannarelli 1989]. The rate of stretching is approximately five times greater than nor-
mal during the period from four months to the age of eight, after which it is constant
until around the age of seventy when it again increases.
2.2. Ear Biometric Systems
An ear biometric system may be viewed as a typical pattern recognition system where
the input image is reduced to a set of features that is subsequently used to compare
1https://syllabus.med.unc.edu/courseware/embryo images/unit-ear/ear htms/ear014.htm.
2http://www.plasticsurgery4u.com/procedure folder/otoplasty anatomy.html.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:4 A. Abaza et al.
(1) Helix Rim
(2) Lobule
(3) Antihelix
(4) Concha
(5) Tragus
(6) Antitragus
(7) Crus of Helix
(8) Triangular Fossa
(9) Incisure Intertragica
Fig. 3. External anatomy of the ear. The visible flap is often referred to as the pinna. The intricate structure
of the pinna coupled with its morphology is believed to be unique to an individual although large-scale
evaluation of automated ear recognition has not been conducted.
Fig. 4. The block diagram of a typical ear recognition system.
against the feature sets of other images in order to determine its identity. Ear recog-
nition can be accomplished using 2D images of the ear or 3D point clouds that capture
the three-dimensional details of the ear surface. The ear biometric system has two pos-
sible modes of operation. In the verification mode, where the subject claims an identity,
the input image is compared against that of the claimed identity via their respective
feature sets in order to validate the claim. In the identification mode, where the subject
does not claim an identity, the input ear image is compared against a set of labeled3ear
images in a database in order to determine the best match and, therefore, its identity.
The salient stages of a classical ear recognition system are illustrated in Figure 4.
(1) Ear detection (segmentation). The first and foremost stage involves localizing the
position of the ear in an image. Here, a rectangular boundary is typically used to
indicate the spatial extent of the ear in the given image. Ear detection is a critical
component since the errors in this stage can undermine the utility of the biometric
system.
(2) Ear normalization and enhancement. In this stage, the detected (segmented) ear
is subjected to an enhancement routine that improves the fidelity of the image.
Further, the ear image may be subjected to certain geometric or photometric cor-
rections in order to facilitate feature extraction and matching. In some cases, a
curve that tightly fits the external contour of the ear may be extracted.
(3) Feature extraction. While the segmented ear can be directly used during the match-
ing stage, most systems extract a salient set of features to represent the ear. Fea-
ture extraction refers to the process in which the segmented ear is reduced to a
mathematical model (e.g., a feature vector) that summarizes the discriminatory
information.
3The term labeled is used to indicate that the identity of the images in the database is known.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:5
Fig. 5. Iannarelli’s measurement.
(4) Matching. The features extracted in the previous stage have to be compared against
those stored in the database in order to establish the identity of the input ear. In its
simplest form, matching involves the generation of a match score by comparing the
feature sets pertaining to two ear images. The match score indicates the similarity
between two ear images.
(5) Decision. In the decision stage, the match score(s) generated in the matching mod-
ule are used to render a final decision. In the verification mode of operation, the
output is a “yes” or a ”no", with the former indicating a genuine match and the
latter indicating an impostor. In the identification mode of operation, the output is
a list of potential matching identities sorted in terms of their match score.
2.3. Ear Recognition History
The potential of the human ear for personal identification was recognized and advocated
as early as 1890 by the French criminologist Alphonse Bertillon [1896], who wrote4:
“The ear, thanks to these multiple small valleys and hills which furrow across it, is
the most significant factor from the point of view of identification. Immutable in its
form since birth, resistant to the influences of environment and education, this organ
remains, during the entire life, like the intangible legacy of heredity and of the intra-
uterine life". Bertillon made use of the description and some measurements of the ear
as part of the Bertilllonage system that was used to identify recidivists.
One of the first ear recognition systems is Iannarelli’s system which was originally
developed in 1949 [Iannarelli 1989]. This is a manual system based upon 12 measure-
ments as illustrated in Figure 5. Each photograph of the ear is aligned such that the
lower tip of a standardized vertical guide on the development easel touches the upper
flesh line of the cocha area, while the upper tip touches the outline of the antitragus.
Then the crus of helix is detected and used as a center point. Vertical, horizontal, diag-
onal, and anti-diagonal lines are drawn from that center point to intersect the internal
and external curves on the surface of the pinna. The 12 measurements are derived
from these intersections and used to represent the ear.
Fields et al. [1960] made an attempt to identify newborn babies in hospitals. They
visually assessed 206 sets of ear photographs, and concluded that the morphological
constancy of the ear can be used to establish the identity of the newborn.
Currently, there exists no commercial biometric system to automatically identify or
verify the identity of individuals by way of their ear biometric. Burge and Burger [2000]
presented one of the most cited ear biometric methods in the literature. They located
the ear by using deformable contours on a Gaussian pyramid representation of the
image gradient [Burge and Burger 1997]. Then they constructed a graph model from
4This statement is taken from Hurley et al. [2007].
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:6 A. Abaza et al.
the edges and curves within the ear, and invoked a graph-based matching algorithm for
authentication. They do not report any performance measures on the proposed system.
Moreno et al. [1999] were the first to describe a fully automated system for ear recog-
nition. They used multiple features and combined the results of several neural classi-
fiers. Their feature vector included outer ear points, ear shape, and wrinkles, as well
as macro-features extracted by a compression network. To test that system, two sets of
images were acquired. The first set consisted of 168 images pertaining to 28 subjects
with 6 photos per subject. The second set was composed of 20 images, corresponding
to 20 different individuals. Later, Mu et al. [2004] extended this method. They repre-
sented the ear feature vector as a combination of the outer ear shape and inner ear
structure. Then they employed a neural network for classification. This method can be
considered as a simplified automation of Iannarelli’s system [Iannarelli 1989].
Yuizono et al. [2002] treated the ear image recognition problem as regular search
optimization problem, where they applied a Genetic Algorithm (GA) to minimize the
mean square error between the probe and gallery images. They assembled a database of
660 images corresponding to 110 persons. They demonstrated an accuracy of 99-100%.
Like other biometric traits, research in ear recognition is directed by the databases
that are available for algorithm evaluation and performance analysis. Therefore, we
first discuss the various databases that have been assembled by multiple research
groups for assessing the potential of ear biometrics.
3. DATABASES
Test and development of robust ear recognition algorithms require databases of suf-
ficient size (many subjects, multiple samples per subject, etc.) that include care-
fully controlled variations of factors such as lighting and pose. In the literature, the
Carreira-Perpinan [1995] database has been widely used; however, it is a very small
database of 19 subjects. In this section, we review several databases that have been
used in the literature of ear recognition (and detection). Most of these databases are
either available for the public or can be transferred under license.
3.1. WVU Database
The West Virginia University (WVU) ear database was collected using the system
[Fahmy et al. 2006] shown in Figure 6. This system had undergone various design, as-
sembly, and implementation changes. The main hardware components for this system
include the following.
—PC. It provides complete control of the moving parts and acquiring video from camera.
—Camera. It captures video. It is attached to the camera arm, and the latter is con-
trolled by a stepper motor.
—Linear Actuator. This is a unique custom-made device (by Netmotion, Inc.) that has
a 4-ft span and allows smooth, vertical (up or downward) translation. This device is
used to adjust the height of the camera according to the subject height.
—Light. For this database, the light was fixed to the camera arm.
—Structural Framework. This consists of tinted wall, rotating arms, and other struc-
tural supports. A blackboard was added behind the chair to serve as a uniform
background during video capture.
There are various software packages that were used: (i) Posteus IPE (stepper system
control): to adjust the camera height, and to rotate the camera; (ii) EVI Series Demon-
stration Software (camera control): to adjust zoom, tilt, and focus of the used camera;
and (iii) IC Video Capture: to record the subject’s images during the camera rotation.
The WVU ear database consists of 460 video sequences for 402 different subjects,
and multisequence for 54 subjects [Abaza 2008]. Each video begins at the left profile
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:7
Fig. 6. The data acquisition system designed at West Virginia University for collecting ear images at
multiple angles.
of a subject (0 degrees) and terminates at the right profile (180 degrees) in about 2
minutes. This database has 55 subjects with eyeglasses, 42 subjects with earrings, 38
subjects with partially occluded ears, and 2 fully occluded ears. The WVU database is
currently not available for public use.
3.2. USTB Databases
The University of Science and Technology Beijing (USTB) databases5are available for
academic research [USTB 2005].
—IMAGE DATABASE I. 180 images of 60 volunteers. For each subject the follow-
ing three images were acquired: (a) normal ear image; (b) image with small angle
rotation; and (c) image under a different lighting condition.
—IMAGE DATABASE II. 308 images of 77 volunteers. By defining the angle when the
CCD camera is perpendicular to the ear as being the profile view (0◦), for each subject
the following four images were acquired: (a) profile image; (b) two images with 30◦
and −30◦angle variations; and (c) one with illumination variation.
—IMAGE DATABASE III. 79 volunteers. For each subject the following ear images
were acquired:
—Regular ear images. The subject rotates his head from 0 degrees to 60 degrees
toward the right side. Images of the head were acquired at the following angles:
0◦,5
◦,10
◦,15
◦,20
◦,25
◦,30
◦,40
◦,45
◦,50
◦,60
◦. Two images were recorded at each
angle resulting in a total of 22 images per subject. Similarly, as the subject rotates
his head from 0 degrees to 45 degrees toward the left side, images of the head were
acquired at the following angles: 0◦,5
◦,10
◦,15
◦,20
◦,25
◦,30
◦,40
◦,45
◦. Two images
were recorded at each angle resulting in a total of 18 images per subject.
—Ear images with partial occlusion. The total number of ear images with partial
occlusion is 144 pertaining to 24 subjects with 6 images per subject. Occlusion
is due to three conditions: partial occlusion (disturbance from some hair), trivial
occlusion (little hair), and regular occlusion (natural occlusion due to hair).
5http://www1.ustb.edu.cn/resb/en/index.htm
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:8 A. Abaza et al.
Fig. 7. Image samples from USTB databases III [with permission of USTB].
Fig. 8. Image samples from UCR data set of color, (c) [Chen and Bhanu 2007] (IEEE reprinted with permis-
sion).
—IMAGE DATABASE IV. A camera system consisting of 17 CCD cameras, distributed
around the subject at an interval of 15◦between them, was used to acquire ear
and face images of 500 volunteers at multiple poses/angles. Samples from USTB
databases are shown in Figure 7.
3.3. UCR Database
The University of California Riverside (UCR) database was assembled from images
captured by the Minolta Vivid 300 camera [Chen and Bhanu 2007]. This camera uses
the light-stripe method to emit a horizontal stripe light to the object and the reflected
light is then converted by triangulation into distance information. The camera outputs
a range image and its registered color image in less than one second. The range image
contains 200 ×200 grid points and each grid point has a 3D coordinate (x, y, z) and
a set of color (r, g, b) values. The database contains 902 shots for 155 subjects. Each
subject has at least four shots. There are 17 females; six subjects have earrings and 12
subjects have their ears partially occluded by hair (with less than 10% occlusion). The
UCR database is currently not available to the public. Samples from USTB databases
are shown in Figure 8.
3.4. UND Databases
The University of Notre Dame (UND) databases6are available to the public (free of
charge). There are several collections for various modalities. The following are the
collections that can be used for ear biometrics.
—Collection E. 464 visible-light face side profile (ear) images from 114 human subjects
captured in 2002.
6http://www3.nd.edu/∼cvrl/CVRL/Data Sets.html.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:9
Fig. 9. Image samples from the UND databases [with permission of UND].
Fig. 10. Image samples from XM2VTS face database (with permission of XM2VTS).
—Collection F. 942 3D (+ corresponding 2D) profile (ear) images from 302 human
subjects captured in 2003 and 2004.
—Collection G. 738 3D (+ corresponding 2D) profile (ear) images from 235 human
subjects captured between 2003 and 2005.
—Collection J2. 1800 3D (+ corresponding 2D) profile (ear) images from 415 human
subjects captured between 2003 and 2005.
Figure 9 shows examples from the aforementioned collections of the UND databases.
3.5. XM2VTS Database
The XM2VTS database7was collected for research and development of identity veri-
fication systems using multimodal (face and voice) input data. The database contains
295 subjects, each recorded at four sessions over a period of 4 months. At each ses-
sion two head rotation shots (as shown in Figure 10) and six speech shots (subjects
reading three sentences twice) were recorded. Sets of data taken from this database
include high-quality color images, 32 KHz 16-bit sound files, video sequences, and a 3D
model [Messer et al. 1999]. The XM2VTS database is available for public use for a cost
[XM2VTSDB 1999].
3.6. UMIST Database
The UMIST Face database8contains 564 images of 20 subjects slowly rotating their
head from profile to frontal view. UMIST is a small database that is available to the
public, free of charge [Graham and Allison 1998; UMIST 1998]. In the literature, the
UMIST database was only used for ear detection experiments.
3.7. NIST Mugshot Identification Database (MID)
The NIST Mugshot Identification special database 189contains both front and side
(profile) views when available (as shown in Figure 11). Separating front views and
profiles, there are 131 cases with two or more front views and 1418 with only one front
7http://www.ee.surrey.ac.uk/CVSSP/xm2vtsdb/.
8http://www.sheffield.ac.uk/eee/research/iel/research/face.
9http://www.nist.gov/srd/nistsd18.cfm.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:10 A. Abaza et al.
Fig. 11. Image samples from MID profile face database (with permission of NIST).
Fig. 12. Image samples from FERET profile database (with permission of FERET).
view. Profiles have 89 cases with two or more profiles and 1268 with only one profile.
Cases with both fronts and profiles have 89 cases with two or more of both fronts and
profiles, 27 with two or more fronts and one profile, and 1217 with only one front and
one profile. The MID database is available for public use for a cost [MID 1994]. In the
literature, MID was used only for ear detection experiments.
3.8. FERET Database
The FERET10 program set out to establish a large database of facial images
that was gathered independently from the algorithm developers Phillips [1998] and
Phillips et al. [2000]. The images were collected in a semi-controlled environment. To
maintain a degree of consistency throughout the database, the same physical setup
was used in each photography session. Because the equipment had to be reassembled
for each session, there were some minor variations in images collected on different
dates. The FERET database was collected in 15 sessions between August 1993 and
July 1996. The database contains 1564 sets of images for a total of 14126 images that
include 1199 individuals and 365 duplicate sets of images. For some individuals, im-
ages were collected at right and left profile (labeled pr and pl), as shown in Figure 12,
and are suitable for 2D ear recognition. The FERET database is available for public
use [FERET 2003].
3.9. CAS-PEAL Database
The CAS-PEAL11 face database [Gao et al. 2004] is constructed by the Joint Research
and Development Laboratory for Advanced Computer and Communication Technolo-
gies (JDL) of Chinese Academy of Sciences (CAS), under the support of the Chinese
National Hi-Tech (863) Program and the ISVISION Tech. Co. Ltd. The CAS-PEAL
database includes face images with various Poses, Expressions, Accessories, and Light-
ing (PEAL).
The CAS-PEAL face database contains 99,594 images of 1040 individuals (595 males
and 445 females). For each subject, nine cameras spaced equally in a horizontal semi-
10 http://www.itl.nist.gov/iad/humanid/feret/feret master.html.
11 http://www.jdl.ac.cn/peal/home.htm
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:11
Fig. 13. Image samples from CAS-PEAL database (with permission of CASPEAL).
circular shelf are set up to simultaneously capture images across different poses in one
shot (as shown in Figure 13). Each subject is also asked to look up and down to capture
18 images in another two shots. The CAS-PEAL database also includes 5 kinds of ex-
pressions, 6 kinds accessories (3 glasses, and 3 caps), and 15 lighting directions, as well
as varying backgrounds, distance from cameras, and aging variation. The CAS-PEAL
database is available for public use [Gao et al. 2008]. In the literature, the CAS-PEAL
database was only used for ear detection experiments.
4. EAR DETECTION
Ear detection (segmentation) is an essential step for automated ear recognition sys-
tems, though many of the published recognition approaches achieve this manually.
However, there have been several approaches aimed at providing fully automated ear
detection. This section describes some of the semi-automated (computer-assisted) and
automated techniques proposed in the literature. Table I summarizes these ear detec-
tion techniques.
4.1. Computer-Assisted Ear Segmentation
These semi-automated methods require user-defined landmarks specified on an
image, and then ear segmentation is automated from that point onward.
Yan and Bowyer [2005a] used a two-line landmark, with one line along the border
between the ear and the face, and the other from the top of the ear to the bottom, in or-
der to detect the ear region. Alvarez et al. [2005] proposed a modified snake algorithm
and ovoid model technique. This technique requires the user to manually draw an ap-
proximated ear contour which is then used for estimating the ovoid model parameters
for matching.
4.2. Template Matching Techniques
Burge and Burger [2000] located the ear using deformable contours on a Gaussian
pyramid representation of the image gradient. Then edges are computed using the
Canny operator, and edge relaxation is used to form larger curve segments, after which
the remaining small curve segments are removed.
Ansari and Gupta [2007] used outer helix curves of ears moving parallel to each other
as features for localizing the ear in an image. Using the Canny edge detector, edges are
extracted from the whole image. These edges are segmented into convex and concave
edges. From these segmented edges, expected outer helix edges are determined12. They
assembled a database of 700 side faces, and reported an accuracy of ∼93%.
AbdelMottaleb and Zhou [2006] segmented the ear from a face profile based on tem-
plate matching, where they modeled the ear by its external curve. Yuizono et al. [2002]
also used a template matching technique for detection. They used both a hierarchical
12They observed that the ear resembles an ellipse in shape; hence they assumed that the shape of helix curve
is convex.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:12 A. Abaza et al.
Table I. Accuracy of Carious Ear Detection Techniques
Technique Database Used Accuracy
Template matching
Burge and Burger[2000] N/A N/A
Ansari and Gupta[2007] 700 side faces ∼93%
AbdelMottaleb and Zhou[2006] 103 subjects N/A
Yuizono et al.[2002] 110 (×3) subjects N/A
Chen and Bhanu[2004] UCR, 30 subjects ∼91.5%
Chen and Bhanu[2005b] UCR,52 (×6) subjects ∼92.6%
Shape based
ArbabZavar and Nixon[2007] XM2VTS, 252 images 100%
UND, F 91%
Zhou et al.[2010] UND, F-142 images 100%
Morphological Operators
HajSaid et al.[2008] WVU, 376 (×10) subjects 90%
Hybrid Techniques
Skin color and template based
Prakash et al.[2009] 150 side faces 94%
Jet space similarity
Watabe et al.[2008] XM2VTS, 181 (×2) subjects N/A
Shape of low-level features
Cummings et al.[2010] XM2VTS, 252 images 99.6%
2D Skin color and
3D template based
Chen and Bhanu[2007] UCR, 902 subjects 99.3%
UND, 700 subjects 87.71
Ear contour extraction
Yan and Bowyer[2007]
UND, 415 subjects (color) 79%
UND, 415 subjects (depth) 85%
Haar-based
Islam et al.[2008.b] UND, 203 images 100%
XM2VTS, 104 occluded 52%
Yuan and Zhang[2009] CASPEAL, 166 images FRR=3.0, FAR=3.6
UMIST, 48 images FRR=2.1, FAR=0
USTB, 220 images FRR=0.5, FAR=2.3
Abaza et al.[2010] UMIST, 225 images FRR=0, FAR=1.33
UND, 940 images FRR=5.63, FAR=5.85
WVHTF, 228 images FRR=6.14, FAR=1.32
USTB, 720 images FRR=6.25, FAR=1.81
FERET, 100 occluded FRR=35, FAR=3
Ear contour extraction
Yan and Bowyer[2007]
UND, 415 subjects (color) 79%
UND, 415 subjects (depth) 85%
pyramid and sequential similarity computation to speed up the detection of the ear
from 2D images.
In the context of 3D ear detection, Chen and Bhanu [2004] used a model-based (tem-
plate matching) technique for ear detection. The model template is represented by an
averaged histogram of shape index13. The detection is a four-step process: step edge de-
tection and thresholding, image dilation, connected component labeling, and template
matching. Based on a test set of 30 subjects from the UCR database, they achieved
a91.5% detection rate with 2.52% false alarm rate. Later, Chen and Bhanu [2005b]
developed another shape-model-based technique for locating human ears in side face
range images where the ear shape model is represented by a set of discrete 3D ver-
13Shape index is a quantitative measure of the shape of a surface at each point, and is represented as a
function of the maximum and minimum principal curvatures.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:13
tices corresponding to the helix and anti-helix parts. They started by locating the edge
segments and grouping them into different clusters that are potential ear candidates.
For each cluster, they register the ear shape model with the edges. The region with the
minimum mean registration error is declared to be the detected ear region. Based on
52 subjects from the UCR database, with 6 images per subject, they achieved a 92.6%
detection rate.
4.3. Shape Based
ArbabZavar and Nixon [2007] enrolled the ear based on finding the elliptical shape of
the ear using a Hough Transform (HT). They achieved a 100% detection rate using
the XM2VTS face profile database consisting of 252 images from 63 subjects, and 91%
using the UND, collection F, database.
In the context of 3D ear detection, Zhou et al. [2010] introduced a novel shape-based
feature set, termed the Histograms of Categorized Shapes (HCS), for robust 3D ear
detection. They used a sliding window approach and a linear Support Vector Machine
(SVM) classifier. They reported a perfect detection rate, that is, a 100% detection rate
with a 0% false positive rate, on a validation set consisting of 142 range profile images
from the UND, collection F, database.
4.4. Morphological Operators Techniques
HajSaid et al. [2008] addressed the problem of a fully automated ear segmentation
scheme by employing morphological operators. They used low computational cost
appearance-based features for segmentation, and a learning-based Bayesian classi-
fier for determining whether the output of the segmentation is incorrect or not. They
achieved a 90% accuracy on 3750 facial images corresponding to 376 subjects in the
WVU database.
4.5. Hybrid Techniques
Prakash et al. used skin color and template-based technique for automatic ear detection
in a side profile face image [Prakash et al. 2009, 2008]. The technique first separates
skin regions from nonskin regions and then searches for the ear within the skin re-
gions using a template matching approach. Finally, the ear region is validated using a
moment-based shape descriptor. Experimentation was done on an assembled database
of 150 side profile face images, and yielded a 94% accuracy.
Watabe et al. [2008] introduced the notion of “jet space similarity" for ear detection,
which denotes the similarity between Gabor jets and reconstructed jets obtained via
Principal Component Analysis (PCA). They used the XM2VTS database for evaluation;
however, they did not report their algorithm’s accuracy.
Cummings et al. [2010] used the image ray transform, based upon an analogy to
light rays, to detect ears in an image. This transformation is capable of highlighting
tubular structures such as the helix of the ear and spectacle frames. By exploiting the
elliptical shape of the helix, this method was used to segment the ear region. This
technique achieved a detection rate of 99.6% using the XM2VTS database.
Chen and Bhanu [2007] fused skin color from color images and edges from range
images to perform ear detection. In the range images, they observed that the edge mag-
nitude is larger around the helix and the antihelix parts. They clustered the resulting
edge segments and deleted the short irrelevant edges. Using the UCR database, they
reported a correct detection rate of 99.3% (896 out of 902). Using the UND databases
(collections F and a subset of G), they reported a correct detection rate of 87.71% (614
out of 700).
Yan and Bowyer developed a fully automatic ear contour extraction algorithm
[Yan and Bowyer 2007, 2006]. First, they detected the ear pit based on the nose po-
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:14 A. Abaza et al.
sition and by searching within a sector. Then, they segmented the ear contour using
the active contour initialized around the ear tip. In Yan and Bowyer [2007], using only
color information, 88 out of 415 (21%) images were incorrectly segmented, while using
only depth information, 60 out of 415 (15%) images were incorrectly segmented. They
speculated that all of the incorrectly segmented images in these two situations could be
correctly segmented by using a combination of color and depth information; however,
experimental results corroborating this were not reported.
4.6. Haar Based
Islam et al. [2008b] used a cascaded Adaboost technique based on Haar features for
ear detection. This technique is widely known in the domain of face detection as the
Viola-Jones method [Viola and Jones 2004]. It is a very fast and relatively robust face
detection technique. They trained the Adaboost classifier to detect the ear region, even
in the presence of occlusions and degradation in image quality (e.g., due to motion blur).
They reported a 100% detection performance on the cascaded detector tested against
203 profile images from the UND database, with a false detection rate of 5x10−6.In
a second experiment, they were able to detect 54 ears out of 104 partially occluded
images from the XM2VTS database.
Yuan and Zhang [2009] used the same technique as Islam et al. They reported a very
good detection rate even when there were multiple subjects in the same image. They
used three test sets to compose a database of 434 images:
—166 images from the CAS-PEAL database with a False Rejection Rate (FRR) of 3.0%
and a False Acceptance Rate (FAR) of 3.6%;
—48 images from the UMIST database with a FRR of 2.1% and no false acceptance;
—220 images from the USTB database with a FRR of 0.5% and FAR of 2.3%.
The main drawback of the original Viola-Jones technique is the training time, which
can take several weeks in some cases. Wu et al. [2008] modified the original approach
for face detection to reduce the complexity of the training phase of the naive Adaboost
by two orders of magnitude. Abaza et al. [2010] applied the modified Viola-Jones tech-
nique for ear detection. The training phase of their approach is about 80 times faster
than the original Viola-Jones method, and achieves ∼95% accuracy on four different
test sets (>2000 profile images for ∼450 persons). They presented experiments show-
ing robust detection in the presence of partial occlusion, noise, and multiple ears at
various resolutions.
5. EAR RECOGNITION SYSTEMS
In this section, we examine the various ear recognition algorithms proposed in the
literature and attempt to categorize them based on the feature extraction scheme used
to represent the ear biometric. Table II summarizes these ear recognition systems.
5.1. Intensity Based
Victor et al. [2002] and Chang et al. [2003] built a multimodal recognition system
based on face and ear. For the ear images the manually identified coordinates of the
triangular fossa and the antitragus are used for ear detection. Their ear recognition
system was based on the concept of eigen-ears, using Principal Component Analysis
(PCA). They reported a performance of 72.7% for the ear in one experiment, compared
to 90.9% for the multimodal system, using 114 subjects from the UND, collection E,
database.
Zhang et al. [2005] built a hybrid system for ear recognition. This system combines
Independent Component Analysis (ICA) and a Radial Basis Function (RBF) network.
The original ear image database was decomposed into linear combinations of several
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:15
Table II. Various Ear Recognition Techniques
Technique Database Accuracy
Intensity-Based
PCA, Chang et al.[2003] UND, 114 sub 72.7%
ICA, Zhang et al.[2005] Carreira Perpinan, 17 (×6)
and 60 (×3) sub 94.11%
FSLDA, Yuan and Mu[2007] USTB, 79 (×7) sub 90%
IDLLE, Xie and Mu[2008] USTB, 79 sub [−10◦to 20◦]>80%
and [0◦to 10◦]>90%
NKDA, Zhang and Liu[2008] USTB, 60 sub R1=97.7%
Sparse Representation, UND, 32 (×6) sub 96.88%
Naseem et al.[2008] USTB, 56 (×5) sub 98.21%
Force Field (FF)
Hurley et al.[2005a] XM2VTS, 63 (×4) sub 99.2%
FF then Contour extraction 29 (×2) sub
AbdelMottaleb and Zhou[2006] against 103 sub R1=87.93%
FF then NKDFA USTB, pose 25◦75.3%
Dong and Mu[2008] pose 30◦,72.2%
and pose 45◦48.1%
2D Curves Geometry
Choras and Choras[2006] 102 images FRR=0-9.6%
Fourier Descriptor
Abate et al.[2006] 70 sub [0◦,15
◦,30
◦]R1=[96%, 88%, 88%]
Wavelet Transformation
Sana and Gupta[2007] 600 and 350 (×3) sub > 96%
HaiLong and Mu[2009] USTB II, 77 sub 85.7%
USTB III, 79 sub 97.2%
Nosrati et al.[2007] USTB 90.5%
Carreira Perpinan, 17 (×6) sub 95.05%
Wang et al.[2008] USTB, [5◦,20
◦], [100%, 92.41%]
and [35◦,45
◦][62.66%, 42.41%]
Gabor Filters
Yaqubi et al.[2008] USTB, 60 (×3) sub 75%
Nanni and Lumini[2009a] UND, 114 sub R1=84%
Gabor jets (EBGM)
Watabe et al.[2008] XM2VTS, 181 (×2) sub R1=98%
Log-Gabor Wavelets
Kumar and Zhang[2007] UND, 113 sub 90%
SIFT
Kisku et al. [2009.a] 400 (×2) sub 96.93%
Dewi and Yahagi[2006] Carreira Perpinan, 17(×6) sub 78.8%
3D Features
Yan et al.[2005] UND, 302 sub 98.8%
Yan and Bowyer[2007] UND, 415 sub 97.8%
Yan and Bowyer[2006] UND, sub wearing ear rings 94.2%
3D local surface patch and ICP
Chen and Bhanu[2005a] UCR, 30 sub 93.3%
Chen and Bhanu[2007] UCR, 155 and UND, 302 sub 96.8% and 96.4%
Single step ICP,
Islam et al.[2008.a] UND, 300 sub R1=93.98%
AEM, Passalis et al.[2007] UND, 415 (×2) sub, 201 94.4%
3D polygonals from 110 sub plus Time Cut
2.5D, SFS WVU, 402 galleries R1=95.0%
Cadavid and AbdelMottaleb[2008b] and 60 probes EER=3.3%
sub: different subjects, EER: Equal Error Rate, FRR: False Reject Rate and R1: Rank one identi-
fication rate; otherwise the accuracy is the recognition rate
basic images. Then the corresponding coefficients of these combinations were fed into
an RBF network. They achieved 94.11% using two databases of segmented ear images.
The first database was the Carreira-Perpinan database [Carreira-Perpinan 1995] con-
sisting of 102 grey-scale images (6 ear images for each of 17 subjects). The second
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:16 A. Abaza et al.
database was the USTB database I, consisting of 180 images (3 ear images for each of
60 subjects).
Yuan and Mu [2007] used an automatic ear extraction and normalization method
based on an improved Active Shape Model (ASM). Ear normalization adjusts for any
scaling and rotational variation of the ear image. Then Full-space Linear Discrimi-
nant Analysis (FSLDA) was applied to perform ear recognition. They used the USTB
database III, consisting of 79 subjects and achieved a recognition rate of 90%, using
a head rotation range varying between 20-degrees left-rotation to 10-degrees right-
rotation.
Xie and Mu [2007] used an improved version of the locally linear embedding algo-
rithm. Local Linear Embedding (LLE) is based on projecting data in high-dimensional
space into a single global coordinate system of lower dimension, by preserving
neighboring relationships, in order to discover the underlying structure of the data
[Feng and Mu 2009]. LLE can better solve the problems of nonlinear dimensionality
reduction; however, it suffers from lack of labeled information in the dataset. The im-
proved version of LLE (IDLLE) first obtained the lower-dimensional representation of
the data points using the standard LLE algorithm, and then adopted Linear Discrimi-
nant Analysis (LDA) to resolve the problem of human ear classification. They used 79
subjects from the USTB database IV, and they did not mention how they performed
the detection and normalization steps. They reported the recognition rate of the multi-
pose ear as 60.75% compared to 43.03% using regular LLE. Later they used the same
database, with ear poses in the range [−45◦,45◦] [Xie and Mu 2008]. Experimentally
they showed that the recognition rate of the multi-pose ear had improved using LLE,
compared to PCA and Kernel PCA. They further showed that the improved version of
the LLE algorithm is better, compared to the regular one at these poses. The recogni-
tion rate was above 80% for ear poses in the range [−10◦,20
◦], and above 90% for those
in the range [0◦,10
◦].
Zhang and Liu [2008] examined the problem of multiview ear recognition. They used
a B-spline pose manifold construction in a discriminative projection space. This space is
formed by the Null Kernel Discriminant Analysis (NKDA) feature extraction scheme.
They conducted many experiments and performed comparisons to demonstrate the
effectiveness of their multiview ear recognition approach. Ears are segmented by man-
ual supervising from original images and the segmented ear images are saved as a
multiview ear dataset. They reported a 97.7% rank-1 recognition rate in the presence
of large pose variations using 60 subjects from the USTB database IV.
Naseem et al. [2008] proposed a general classification algorithm for (image-based)
object recognition, based on a sparse representation computed by L1 −mi ni mi za ti o n.
This framework provides new insights into two crucial issues in ear recognition: feature
extraction and robustness to occlusion [Wright et al. 2009]. From each image the ear
portion is manually cropped, and no normalization of the ear region is needed. They
conducted several experiments using the UND and the USTB databases with session
variability, various head rotations, and different lighting conditions. These experiments
yielded a high recognition rate in the order of 98%.
5.2. Force field
Hurley et al. used force field transformations for ear recognition [Hurley et al. 2000,
2005b]. The image is treated as an array of Gaussian attractors that act as the source
of the force field (as shown in Figure 14). The directional properties of that force field
are exploited to locate a small number of potential energy wells and channels that
are used during the matching stage [Hurley et al. 2005a]. The fixed size frame was
manually adjusted by eye to surround and crop the ear images. They reported a very
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:17
Fig. 14. Force field line formed by iterations [Hurley et al. 2000].
high recognition rate of 99.2%, using 4 images each of 63 subjects selected from the
XM2VTS database.
AbdelMottaleb and Zhou [2006] used force field transformation followed by recogni-
tion based on contours constructed from these features. They assembled a database of
profile face images from 103 subjects. For each person, one image was used for training,
where the ear region was detected using external contour matching. The proposed ear
recognition method was applied to 58 query images corresponding to 29 subjects. They
achieved a 87.93% rank-1 recognition rate.
Dong and Mu used force field transformation and developed a two-stage ap-
proach for multi-pose ear feature extraction and recognition, that is, force field
transformation plus null-space-based kernel fisher discriminant analysis (NKFDA)
[Dong and Mu 2008]. The kernel technique can not only efficiently represent the non-
linear relation of data but also simplify the Null Linear Discriminant Analysis (NLDA).
They cropped out the ear manually from the original images and made some prepro-
cessing such as filtering and normalization. They used the USTB database IV and
reported a recognition rate of 75.3% for pose 25◦,72.2% for pose 30◦,48.1% for pose
45◦.
5.3. 2D Ear Curves Geometry
Choras proposed an automated geometrical method [Choras 2004; 2005]. He extracted
the ear contours and centroid from the ear image, and then constructed concentric
circles using that centroid. He defined two feature vectors for the ear based on the
interest points between the various contours of the ear and the concentric circles.
Testing with an assembled database of 240 ear images (20 different views) for 12
subjects, and selecting images with very high quality and under ideal conditions of
recognition, he reported a 100% recognition rate.
Later Choras and Choras [2006] added two more geometric feature vectors extracted
using the angle-based contour representation and the geometrical parameters method.
Then they conducted a comparison study using an assembled database of 102 ear
images, where the various geometrical methods yielded a false reject rate between
(0 −9.6%).
5.4. Fourier Descriptor
Abate et al. [2006] used rotation-invariant descriptors, namely GFD (Generic Fourier
Descriptor), to extract meaningful features from ear images. This descriptor is quite
robust to both ear rotations and illumination changes. They assembled their own
databases to evaluate the proposed scheme. The first dataset A contains 210 ear images
from 70 subjects, with 3 samples for each subject: (a) looking ahead (0◦rotation), (b)
looking up (15◦rotation), and (c) looking up (30◦rotation). Images were acquired over
two sessions. The second dataset B was also obtained over two sessions, and contains
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:18 A. Abaza et al.
72 ear images from 36 subjects with 2 photos per subject looking up with a free rotation
angle. Experimentally, they showed a marginally better rank-1 recognition rate (96%)
compared to the eigen-ears algorithm (95%). Further, they showed that their technique
had a rank-1 recognition rate of 88% for images obtained at 15◦and 30◦compared to a
rate of 50% and 20% for the eigen-ears algorithm.
5.5. Wavelet Transformation
Sana and Gupta [2007] used a discrete Haar wavelet transform to extract the textu-
ral features of the ear. The ear was first detected from a raw image using a template
matching technique. A Haar wavelet transform was then used to decompose the de-
tected image and to compute coefficient matrices of the wavelet transforms which are
clustered in its feature template. The matching score was calculated using the Ham-
ming distance. They reported a recognition accuracy of 96% based on two databases:
600 subjects (3 images per subject) from the IITK database and 350 subjects from the
Saugor database.
HaiLong and Mu [2009] used the low-frequency subimages, obtained by utilizing a
two-dimensional wavelet transform, and then extracted features by applying an or-
thogonal centroid algorithm. They used the USTB databases, and they did not mention
the detection step. They reported an average performance rate of 85.7% on the USTB
database II (77 subjects) divided into four groups, and 97.2% on the USTB database IV
(79 subjects), divided into 11 groups.
Nosrati et al. [2007] applied a 2D wavelet to the geometrically normalized (aligned)
ear image. They used template matching for ear extraction, then they found three inde-
pendent features in three directions (horizontal, vertical, and diagonal). They combined
these decomposed images to generate a single feature matrix using the weighted sum.
This technique allows one to consider the changes in the ear images simultaneously
along three basic directions. Finally they applied PCA to the feature matrix for dimen-
sionality reduction and classification. They achieved a recognition accuracy of 90.5%
and 95.05% on the USTB database and Carreira-Perpinan database, respectively.
Wang et al. [2008] used Haar wavelet transforms and Uniform Local Binary Patterns
(ULBPs) to recognize ear images. First, ear images were manually segmented and de-
composed by a Haar wavelet transform. Then ULBPs were combined simultaneously
with block-based and multiresolution methods to describe the texture features of ear
subimages transformed by the Haar wavelet. Finally, the texture features were classi-
fied into identities using the nearest-neighbor method. Using the USTB database IV
(no mention of ear detection), they conducted several experiments combining ULBPs
with the multiresolution and block-based methods. They achieved a recognition rate of
100%, 92.41%, 62.66%, and 42.41% for pose angles of 5◦,20
◦,35
◦,and45
◦, respectively.
Feng and Mu [2009] also combined Local Binary Pattern (LBP) and wavelet trans-
form. They used nonuniform LBP8,1operator, and evaluated the performance of various
similarity measures and two matchers (K nearest-neighbor and two-class Support Vec-
tor Machine). They used images from USTB database III (no mention of ear detection):
79 subjects with 10 images per subject at various poses (0◦,5
◦,10
◦,15
◦,20
◦).
5.6. Gabor Filters
Yaqubi et al. [2008] used a feature extraction method based on a set of Gabor filters
followed by a maximization operation over multiple scales and positions. This method
is motivated by a quantitative model of the visual cortex. Then they used Support
Vector Machine (SVM) for ear classification. They obtained a recognition rate of 75%
on a subset of the USTB ear database where 180 ear images were manually extracted
from 60 subjects.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:19
Fig. 15. The SIFT key points matching.
Nanni and Lumini used a multimatcher system, where each matcher was
trained using features extracted from a single subwindow of the entire 2D image
[Nanni and Lumini 2007; 2009a]. The ear was segmented using two landmarks. The
features were extracted by the convolution of each subwindow with a bank of Ga-
bor filters. Then their dimensionality was reduced using Laplacian eigen maps. The
best matchers, corresponding to the most discriminative subwindows, were selected
by running Sequential Forward Floating Selection (SFFS). Experiments were carried
out using 114 subjects from the UND database (collection E) and the sum rule was
employed for fusing the selected subwindows at the score level. They achieved a rank-1
recognition rate of ∼84% and a rank-5 recognition rate of ∼93%; for verification exper-
iments, the area under the ROC curve was ∼98.5% suggesting very good performance.
Later Nanni and Lumini [2009b] improved the performance of their ear matcher using
score normalization. In order to discriminate between the genuine article and impos-
tors, they trained a quadratic discriminant classifier. Their proposed a normalization
method that overcomes the main drawback of the Unconstrained Cohort Normalization
(UCN), as it does not need a large number of background models. Experimentally, they
showed that for the ear modality, their proposed normalization and UCN reduces the
EER from ∼11.6% to ∼7.6%.
Watabe et al. [2008] extended the ideas from elastic graph matching and Principal
Component Analysis (PCA). For ear representation, they used an “ear graph" whose
vertices were labeled by the Gabor jets of body of the antihelix, superior anti-helix crus,
and inferior anti-helix crus. They developed a new ear detection algorithm, based on
the notion of “jet space similarity," which denotes the similarity between Gabor jets
and reconstructs jets obtained using PCA. They used 362 images, 2 per person, from
the XM2VTS database for performance evaluation. In a verification experiment, they
reported a FRR of 4% and a FAR of 0.1%, which was approximately 5 times better than
the PCA technique. Further, in an identification experiment, the rank-1 recognition
rate for their method was 98% compared to 81% for the PCA technique.
Kumar and Zhang [2007] used Log-Gabor wavelets to extract the phase information,
that is, ear codes, from the 1D gray-level signals. Thus each ear is represented by
a unique ear code or phase template. Then they compared the Hamming distance
between the query ear images and the database as a classifier. They reported about
90% recognition using 113 subjects from the UND, collection E, database (no mention
of the detection step).
5.7. Scale-Invariant Feature Transform (SIFT)
Dewi and Yahagi [2006] used SIFT to generate approximately 16 key-points for
each ear image. Using the Carreira-Perpinan database (segmented ear images)
[Carreira-Perpinan 1995], they reported a recognition rate of 78.8%.
Kisku et al. [2009a] used SIFT feature descriptors for structural representation of ear
images (as shown in Figure 15). They developed an ear skin color model using Gaussian
Mixture Model (GMM) and clustered the ear color pattern using vector quantization.
Finally, they applied K-L divergence to the GMM framework for recording the color
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:20 A. Abaza et al.
Fig. 16. The ear recognition module using the ear helix/antihelix and the Local Surface Patch (LSP) repre-
sentations, (c) [Chen and Bhanu 2007]. IEEE, reprinted with permission.
similarity in the specified ranges by comparing color similarity between a pair of
reference models and probe ear images. After manual segmentation of ear images in
some color slice regions, they extracted SIFT key-points. They fused these features at
feature level by augmenting a vector of extracted SIFT features. They tested using
a locally collected ear database of 400 subjects with 2 images per subject, and the
experimental results showed improvements in recognition accuracy by ∼3%.
5.8. 3D Ear
The only textbook describing ear biometrics [Bhanu and Chen 2008] focuses on a sys-
tem for ear recognition using 3D shape. Chen and Bhanu [2005a] were the first to
develop and experiment with a 3D ear biometric system. They used the shape-model-
based technique for locating human ears in side face range images, and a Local
Surface Patch (LSP) representation and the Iterative Closest Point (ICP) algorithm
for ear recognition (as shown in Figure 16). In a small proof-of-concept experiment,
they achieved a 93.3% recognition rate (2 errors out of 30), using manual segmenta-
tion. Chen and Bhanu [2007] conducted a larger experiment and automatic ear detec-
tion. They achieved a 96.77% rank-1 recognition rate (150 out of 155) using the UCR
database and a 96.36% rank-1 recognition rate (291 out of 302) on the UND, collection
F, database.
Yan et al. [2005] presented 3D ear recognition using ICP-based approaches. They
used the UND, collection F, database, where they performed the segmentation step
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:21
using a two-line landmark. They reported rank-1 recognition rate as 98.8%. Only
4 images out of 302 were incorrectly matched due to poor data quality. Later
Yan and Bowyer [2007] automated the ear detection by detecting the ear pit. They
considered two approaches for matching points from the probe image to points on the
gallery image using point-to-point and point-to-surface matching schemes. Their fi-
nal algorithm attempted to exploit the trade-off between performance and speed. The
point-to-point approach was used during the iterations to compute the transformation
matrix relating the probe image with the gallery image. In an identification scenario,
their algorithm achieved a rank-1 recognition rate of 97.8% using 415 subjects from
the UND databases with 1,386 probes. They reported another experiment showing how
the performance dropped with an increase in angle difference between the probe and
gallery. In Yan and Bowyer [2006], they reported an experiment on a subset of users
wearing earrings where the performance dropped to 94.2%.
Islam et al. [2008a] used ICP to implement a fully automated 3D ear recognition
system. For the detection, first they used a 2D Haar-based ear detector, then they
cropped the corresponding 3D segment. They used two subsets from the UND, collection
F, database. The first subset consisted of arbitrarily selected 200 profile images of 100
different subjects, while the second database consisted of 300 subjects. They achieved
a rank-1 recognition rate of 93% using single-step ICP.
Passalis et al. [2007] used a generic Annotated Ear Model (AEM) to register and fit
each ear dataset. Only the 3D geometry that resides within a sphere of a certain radius
that is centered roughly on the ear pit was automatically segmented. Then a compact
biometric signature was extracted that retains 3D information. The meta-data con-
taining this information were stored using a regular grid of lower dimension, allowing
direct comparison. They used a database containing a 1031 dataset representing 525
subjects: 830 dataset representing 415 subjects from the UND database, and two hun-
dred and one 3D polygonal dataset from 110 subjects. They achieved a recognition rate
of 94.4% on this heterogeneous database. According to the literature, computing the
similarity score for the 830 datasets from the UND database takes 276 hours on an av-
erage, while using their proposed method, it took approximately 7 hours for enrollment
and a few minutes for authentication.
Cadavid and Abdel-Mottaleb described a novel approach for 3D ear biometrics from
surveillance videos [Cadavid and AbdelMottaleb 2007; 2008a]. First they automati-
cally segment the ear region using template matching and they reconstructed 2.5D
images using the Shape from Shading (SFS) scheme. The resulting 2.5D models are
then registered using the Iterative Closest Point (ICP) algorithm to calculate the sim-
ilarity between the reference model and every model in the reference database. Later
Cadavid and AbdelMottaleb [2008b] used the mathematical morphology ear detection
technique [HajSaid et al. 2008], and they reported a 95.0% rank-1 recognition rate and
3.3% Equal Error Rate (EER) on the WVU database.
6. MULTIBIOMETRICS USING THE EAR MODALITY
In a multibiometric system, fusion can be accomplished at various levels
[Ross et al. 2006]: fusion before matching (sensor and feature levels) and fusion af-
ter matching (match score, rank, and decision levels). Combining the ear biometric
with the face modality has tremendous practical potential due to the following rea-
sons: (a) the ear is part of the face; (b) the ear can be acquired using the same sensor
as the face; and (c) the same type of feature extraction and matching algorithms can
be used for both. Table III summarizes these multibiometric systems.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:22 A. Abaza et al.
Table III. Ear in Multibiometric Systems
Multibiometrics Fusion Level/ Method Database Accuracy
Fac e - e ar
Victor et al.[2002]
Chang et al.[2003] Image/ Data UND, 114 sub 90.9%
Theoharis et al.[2008] N/A UND, F and G R1=99.7%
Mahoor et al.[2009] Score/ weighted SUM WVU, 402 galleries
and 60 probes 100%
Islam et al.[2009] Score/ weighted SUM UND, F and G R1=98.71%
Kisku et al. [2009.b] Score/ Dempster Shafer IITK, 400 (×4) sub 95.53%
Face profile - ear
Yuan et al.[2006.a] Image/ Data USTB 96.2%
Xu and Mu[2007b] Score/ SUM and MED USTB 97.62%
Pan et al.[2008],
and Xu et al.[2007] Feature/ weighted SUM USTB 96.84%
Rahman and Ishikawa[2005] Decision/ “manual" UND, 18 sub 94.44%
Xu et al.[2007] Feature/ weighted SUM USTB, 38 sub 98.68%
Face-ear-3fingers
Woodard et al.[2006] Score/ MIN UND, 85 sub R1=97%
Face - ear - signature
Monwar and Gavrilova[2009] Rank/ Borda count USTB, and
and Logistic regression synthesized data EER=1.12%
Ear - multiple algorithms
Yan and Bowyer[2005b] Score/ weighted SUM UND, 302 sub R1=98.7%
Zhang and Mu[2008] Feature/ Concatenation USTB, 79 sub R1=55%
Srinivas and Gupta[2009] Feature/ Merge 106(×10) sub 95.32%
ArbabZavar and Nixon[2011] Score weighted SUM XM2VTS, 150 sub 97.4%
Left and right ears
Lu et al.[2006] N/A 56 (5 left, 5 right) sub R1=95.1%
sub: different subjects, EER: Equal Error rate, R1: Rank one identification rate; otherwise the accuracy
is the recognition rate
6.1. Frontal Face and Ear
Victor et al. [2002] and Chang et al. [2003] discussed multimodal recognition systems
using the ear and the face. They used the UND databases and reported a performance
of 69.3% for face (PCA) and 72.7% for ear (PCA), in one experiment, compared to 90.9%
for the multimodal system (PCA based on fused face and ear images). There have been
other experiments based on eigen-faces and eigen-ears using different databases and
other fusion rules.
(1) Darwish et al. [2009] fused the face and ear scores. They tested using 10 individuals
(2 images each) from MIT, ORL (AT&T), and Yale databases, and reported an overall
accuracy of 92.24%.
(2) Boodoo and Subramanian [2009] reported the same experiment using a database
of 30 individuals (7 face and 7 ear images). They used 3 face and 3 ear images for
testing. They considered two levels of fusion. The first method combined 3 images
of the same modality using majority voting, while the second method fused the
output of the two modalities using the AND rule. They reported a recognition rate
of 96%.
(3) Luciano and Krzy_
zak [2009] presented a relatively wide experiment, where they
used 100 subjects from the FERET database and 114 from the UND database.
They reported good performance using face as a single modality, but not for the ear.
Using normalized weighted scores, the best recognition rate of ∼99% was achieved
using a weight in the range of (0.9 to 0.8)/(0.1 to 0.2) for face/ear respectively.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:23
Middendorff and Bowyer [2007] and Middendorff et al. [2007] presented an overview
of combining the frontal face and ear modalities, where they suggested several fusion
methods for combining 2D and 3D data.
Theoharis et al. [2008] used a unified approach that fused 3D facial and ear data.
An annotated deformable model was fitted to the data using ICP and Simulated
Annealing (SA). Wavelet coefficients were computed from the geometry image and
used as a biometric signature. The method was evaluated using the largest publicly
available databases (FRGC v2 3D face database and the corresponding ears from the
UND database, collections F and G). They reported a 99.7% rank-1 recognition rate
but did not describe the fusion method.
Mahoor et al. [2009] used a multimodal 2.5D ear and 2D face biometric fused at
score level. For 2.5D ear recognition, a series of frames was extracted from a video clip.
The ear segment in each frame was independently reconstructed using the shape from
shading method. Then various ear contours were extracted and registered using the
iterative closest point algorithm. For 2D face recognition, a set of facial landmarks were
extracted from frontal facial images using an active shape model. Then, the responses
of facial images to a series of Gabor filters at the locations of facial landmarks were
calculated, and used for recognition. They used the WVU database and reported a rank-
1 identification rate of 81.67%, 95%, and 100% for face, ear, and fusion, respectively.
Islam et al. [2009] fused 3D local features for ear and face at score level, using the
weighted sum rules. They used the FRGC v2 3D face database and the corresponding
ears from the UND databases, collections F and G, and achieved a rank-1 identifi-
cation rate of 98.71% and a verification rate of 99.68% (at 0.001 FAR) for neutral
face expression. For other types of facial expressions, they achieved 98.1% and 96.83%
identification and verification rates, respectively.
Kisku et al. [2009b] used Gabor filters to extract features of landmarked images of
face and ear. They used a locally collected database of 1600 images from 400 subjects.
Also they used a synthesized database where the face frontal images were taken from
BANCA database [BaillyBailliere et al. 2003], and the ear images from the Carreira-
Perpinan database [Carreira-Perpinan 1995]. They fused the scores using Dempster-
Shafer (DS) decision theory, and reported an overall accuracy of 95.53%.
6.2. Face Profile and Ear
Yuan et al. [2006a] used face profile images that include the ear (assuming fusion at
sensor / data level) and applied a full space linear discriminant analysis (i.e., using
eigenvectors corresponding to positive eigenvalues). They used the USTB database
and achieved a recognition rate of 96.2%.
Xu and Mu [2007b] used the same technique (full space linear discriminant analysis)
for combining the face profile with the ear. They carried out decision fusion using the
product, sum, and median rules according to Bayesian theory and a modified vote
rule for two classifiers. They used the USTB database [USTB 2005] and achieved a
recognition rate of 97.62% using the sum and median rules compared to 94.05% for the
ear alone and 88.10% for the face profile alone.
References [Pan et al. 2008; Xu et al. 2007] presented a modified FDA technique by
applying kernels of the feature vectors. They fused the face profile and ear at feature
level (using average, product, and weighted-sum rules). They used the USTB database
[USTB 2005] and achieved a recognition rate of 96.84% using the weighted-sum rule.
Xu and Mu [2007a] used Kernel Canonical Correlation Analysis (KCCA) for combin-
ing the face profile and the ear. They carried out decision fusion using the weighted-sum
rule, where the weights are obtained by solving the corresponding Lagrangian. They
used the 38 subjects from the USTB database and achieved a recognition rate of 98.68%.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:24 A. Abaza et al.
Rahman and Ishikawa [2005] used the PCA technique to combine the face profile
with the ear. They used a subset of 18 subjects (5 images each) from the UND database.
They reported an identification rate of 94.44%.
6.3. Face, Ear, and Third Modality
Woodard et al. [2006] proposed combining 3D face images with ear and finger surfaces
using score-level fusion. They reported a 97% rank-1 recognition rate based on a subset
of 85 subjects from the UND 3D databases.
Monwar and Gavrilova [2008] developed a multimodal biometric system that used
face, ear, and signature features extracted by PCA or Fisher’s linear discriminant meth-
ods. The fusion is conducted at rank level. The ranks of individual matchers were com-
bined using the Borda count method, the logistic regression method, or a modified Borda
count method. To test this system, Monwar and Gavrilova used a chimeric database
consisting of faces, ears, and signatures. For the face database, they used the Olivetti
Research Lab (ORL) database [Samaria and Harter 1994], which contains 400 images,
10 each of 40 different subjects. For ear, they used the Carreira-Perpinan database
[Carreira-Perpinan 1995]. For signatures, they used 160 signatures with 8 signatures
of 20 individuals from the University of Rajshahi database [RUSign 2005]. Then those
signatures were scanned. The results indicated that fusing individual modalities using
weighted Borda count improved the overall Equal Error Rate (ERR) to 9.76% compared
to an average of 16.78% for individual modalities. Later, Monwar and Gavrilova [2009]
extended their experiment by including more data from the USTB database. For the
signatures, also they used 500 signatures with 10 signatures of 50 individuals from
Rajshahi database. They achieved an EER of 1.12 using the logistic regression rank
fusion scheme.
6.4. Multi-Algorithmic Ear Recognition
Yan and Bowyer [2005b] used 2D PCA along with 3 different 3D ear recognition algo-
rithms to combine the evidence due to 2D and 3D ear images. They used the UND
databases, collection E, consisting of 1,884 (2D and 3D images) from 302 subjects. With
the same database, using an improved ICP algorithm, they obtained a 98.7% rank-1
recognition rate by adopting a multi-instance approach on the 3D images.
Zhang and Mu [2008] extracted global features using the Kernel Principal Compo-
nent Analysis (KPCA) technique and extracted local features using the Independent
Component Analysis (ICA) technique. Then they established a correlation criterion
function between two groups of feature vectors and extracted their canonical corre-
lation features according to this criterion which could be viewed as fusion at feature
level. They tested using the USTB database and achieved a rank-1 recognition rate of
55%, compared to 45% for the KPCA and 30% for the ICA alone.
Srinivas and Gupta [2009] used SIFT to extract the features from ear images at
different poses and merged them according to a fusion rule in order to produce a
single feature vector called the fused template. The similarity of SIFT features of the
probe image and the enrolled user template was measured by their Euclidean distance.
They collected 1060 images from 106 subjects. They captured 2 images at each of the
following poses for the right ear: [−40◦,−20◦,0◦,+20◦,+40◦]. The images obtained
were normalized to a size of 648 ×486. For training, they used three images per person
for enrollment: images at poses [−40◦,0◦and +40◦]. They tested using the remaining
7 images, and reported an accuracy of 95.32% for the fused template versus 88.33% for
the nonfused template.
ArbabZavar and Nixon [2011] used a part-wise description model of the ear derived
by a stochastic clustering on a set of scale-invariant features of a training set. They
further enhanced the performance of this guided model description by incorporating a
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:25
wavelet-based ear recognition technique [ArbabZavar and Nixon 2008]. This wavelet-
based analysis aims to capture information in the ear’s boundary structures, which
can augment discriminant variability. They presented several experiments using a
weighted sum of the normalized distances based on the guided model and the wavelet-
based technique. They tested these methods using 458 images of 150 subjects from the
XM2VTS database. They achieved a recognition rate of ∼97.4% compared to ∼89.1%
for the guided model and ∼91.9% for the wavelet-based model.
6.5. Right Ear + Left Ear
Lu et al. [2006] extracted ear shape features using Active Shape Models (ASMs). They
modeled the shape and local appearance of the ear in a statistical manner. In addition,
steerable features were extracted from the ear image. Steerable features encode rich
discriminant information of the local structural texture and provide guidance for shape
location. The eigen-ear shape technique was used for final classification. Lu et al. [2006]
conducted a small experiment to demonstrate how fusion of the results of the two ears
can enhance the results14. They collected 10 images each from 56 individuals: 5 images
for the left ear and 5 for the right ear, corresponding to 5 different poses. The difference
in angle between two adjacent poses was 5 degrees. They achieved a rank-1 recognition
rate of 95.1% via fusion versus 93.3% for the left ear or right ear alone.
7. OPEN RESEARCH AREAS
Research in ear biometrics is beginning to move out of its infant stage
[Hurley et al. 2007]. While early research focused on ear recognition in constrained en-
vironments, the benefits of the ear biometric cannot be realized until the accompanying
systems can work on large datasets in unconstrained surveillance-like environments.
This means a number of research areas are relatively less explored with respect to this
biometric.
7.1. Hair Occlusion
When the ear is partially occluded by hair or other artifacts, then methods for ear
recognition can be severely impacted. Based on visual assessment of 200 occluded ear
images from the FERET database, we determined that 81% of the hair occlusion is at
the top of the ear, 17% from the side, and 2% at the bottom and other portions of the
ear.
Burge and Burger [2000] suggested the use of thermogram images to detect occlusion
due to hair and mask it out of the image. A thermogram image is one in which the
surface heat of the subject is used to form an image. Figure 17 is a thermogram of
the external ear. The subject’s hair in this case has an ambient temperature between
27.2◦Cand 29.7◦C, while the external anatomy of the ear ranges from 30.0◦Cto 37.2◦C.
Removing the hair is accomplished by segmenting out the low-temperature areas.
Yuan et al. [2006b] proposed an Improved Nonnegative Matrix Factorization with
Sparseness Constraints (INMFSC) by imposing an additional constraint on the objec-
tive function of NMFSC. This improvement can control the sparseness of both the basis
vectors and the coefficient matrix simultaneously. This proposed INMFSC was applied
on normal images as well as partially occluded images. Experiments showed that their
enhanced technique yielded better performance even with partially occluded images
(as shown in Figure 18). Later Yuan et al. [2010] separated the normalized ear image
into 28 subwindows as shown in Figure 19. Then, they used neighborhood-preserving
embedding for feature extraction on each subwindow, and selected the most discrim-
inative subwindows according to the recognition rate. Finally, they applied weighted
14Lu et al. [2006] did not provide details about the fusion level or method used.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:26 A. Abaza et al.
Fig. 17. Ear thermogram, (c) [Burge and Burger 2000]. IEEE, reprinted with permission.
Fig. 18. (a) Example of occluded ears; (b) splitting the ear image into three subregions; (c) [Yuan et al.
2006b]. IEEE, reprinted with permission.
Fig. 19. Ear image is divided into subwindows.
majority voting for fusion at decision level. They evaluated images of the ear that were
partially occluded at the top, middle, bottom, left, and right of the ear, respectively, as
showninFigure20.
Kocaman et al. [2009] applied Principal Component Analysis (PCA), Fisher Linear
Discriminant Analysis (FLDA), Discriminative Common Vector Analysis (DCVA), and
Locality-Preserving Projections (LPP) for ear recognition in the presence of occlusions.
The error and hit rates of four algorithms were calculated by random subsampling and
k-fold crossvalidation for various occlusion scenarios using a mask covering 15% of the
test images.
ArbabZavar et al. [2007] used a Scale-Invariant Feature Transform (SIFT) to detect
the features within the ear images. During recognition, given a profile image of the
human head, the ear was enrolled and recognized from the various features selected
by the model. They presented a comparison with PCA to show the advantage of the
proposed model in handling occlusions. Later Bustard and Nixon [2008] evaluated the
technique using various occlusion ratios and presented a rank-1 rate of 92% and 74%
for 20% and 30%, respectively, for top-ear occlusion, and a rank-1 rate of 92% and 66%
for 20% and 30%, respectively, for left-side occlusion.
In the context of constructing 3D models for the ear and face, one issue is that
training data may contain noise and partial occlusion. Rather than exclude these re-
gions manually, Bustard and Nixon developed a classifier which automates this process
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:27
Fig. 20. Test images using mask covering: (a) 33% top; (b) 33% middle; (c) 33% bottom; (d) 50% left; (e) 50%
right.
Fig. 21. Left ear; right ear; concatenated left and right ears; left ear and its mirror image.
[Bustard and Nixon 2010]. When combined with a robust registration algorithm the re-
sulting system enabled full head morphable models to be constructed efficiently using
less constrained data.
7.2. Ear Symmetry
Yan and Bowyer [2005a] conducted a small experiment to test ear symmetry using 119
subjects. The right ear of the subject was used as the gallery and the left ear was used
as the probe. They concluded that most people’s left and right ears are symmetric to a
good extent, but that some people’s left and right ears have different shapes.
Xiaoxun and Yunde [2007] conducted an experiment where the left and right ears
were concatenated into a single image before mirror transformation (as shown in Fig-
ure 21). This concatenated image showed between 1-2% enhancement in performance
compared to using the left or right ear alone.
Abaza and Ross [2010] presented a detailed analysis of the bilateral symmetry of
human ears. They assessed the ear symmetry geometrically using symmetry operators
and Iannarelli’s measurements, where they studied the contribution of individual ear
regions to the overall symmetry. Next, to assess the ear symmetry (or asymmetry)
from a biometric recognition system perspective, they conducted several experiments
using the WVU database. These experiments suggested the existence of some degree
of symmetry in the human ears that can perhaps be systematically exploited in the
future design of commercial ear recognition systems. Finally, they presented a case
study, where they fused the scores of the right-right and the left ear image as gallery
and the reflected right ear as probe (using weighted-sum rule). This fusion enhanced
the overall performance by about 3%.
7.3. Earprint
Earprints, or earmarks, are marks left by secretions from the outer ears when someone
presses up against a wall, door, or window. They are found in up to ∼15% of crime scenes
[Rutty et al. 2005]. There have been several court cases in the U.S. and other countries
where earprints have been used as physical evidence [Lynch 2000; Bamber 2001]; how-
ever, some convictions that relied on earprints have been overturned [Morgan 1999;
Ede 2004]. The National Academy of Sciences report titled: “Strengthening Forensic
Science in the United States: A Path Forward" provides an honest view of the limita-
tions of forensic science, and presents a study on the admissibility of forensic evidence
in litigation law and science.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:28 A. Abaza et al.
Fig. 22. Examples of earprints in which various anatomical features are indicated. (1). helix; (2). crus of
helix; (3-6). parts of antihelix; (7). tragus; (8). antitragus; (9). incisure intertragic; (10). lobe [Meijerman
2006].
It is commonly argued that earprints must be unique to an individual since the
structure of the ear is unique to an individual. This line of reasoning has been dis-
puted by others [Champod et al. 2001], on the basis that high variability in a three-
dimensional malleable organ does not necessarily imply high variability in a two-
dimensional mark produced by that organ. The parts of the ear which are most fre-
quently found in earprints are the helix, anti-helix, tragus, and anti-tragus, while
lesser-seen features include the earlobe, and the crus of helix (as shown in Figure 22)
[Meijerman et al. 2004]. Specific details in these structures may contribute to the in-
dividualization of earprints. Such details include notches and angles in imprinted
features, the positions of moles, folds, and wrinkles, and the position of pressure points
[Meijerman et al. 2005]. However, individualization is confounded by several factors
that cause substantial variation in earprints of the same ear.
—Variable deformations are caused by the force applied by the ear to the surface during
listening [Meijerman et al. 2004, 2006a];
—There is duration of the ear’s contact with the surface [Meijerman 2006];
—There are ornamental modifications to the ear, such as piercing [Meijerman 2006];
—Changes in the shape and size of the ear come due to aging [Meijerman et al. 2007].
Due to these factors, not even two prints of the same ear are exactly alike. In order for
earprint recognition to be a viable biometric for forensics, intra-individual variation in
earmarks must be distinguishable or significantly less than inter-individual variation.
This is still an open and active area of research.
Though earprint recognition research is still at the initial stages, a few groups have
developed methods for semiautomated or automated earprint recognition. Rutty et al.
[2005] illustrated the concept of how to develop a computerized earprint identification
system. First, they used a grid system using two anatomical landmarks to standardize
the localization of the tags. Tags were allocated to each print at the sites of intersection
of the grid lines with the anatomical structures. They did not involve performance
data testing but rather they simplified the problem to whether or not the system and
method employed could match the suspect print to that known within the database.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:29
Meijerman et al. [2006b] proposed the first system for fully automated earmark
recognition, and tested it with a small database of earprints taken from six sets of
identical twins. Their method used key-point detection using the Difference of Gaussian
(DoG) operator, after which they used the SIFT algorithm to transform each detected
region into a 120-dimensional feature vector. Each key-point is then selected in turn and
compared with all other key-points in the candidate print in order to look for matches,
defining the best match as the one for which the Euclidean distance in the SIFT feature
space is minimum. A geometric transformation is found that maximizes the number
of matches. Finally, a similarity metric is defined as the number of key-point matches
found between the pair of prints.
The Forensic Ear IDentification (FearID) project (funded by the 6th EU research
framework) proposed to use weighted width, angular development, and anatomical
annotation as distinctive features (semi-automatic system). Manual annotations on
the prints and marks were performed before the matching process to facilitate the
segmentation of the images and to locate anatomical points. With a set 7364 prints and
216 marks from 1229 donors, this approach resulted in Alberink and Ruifrok’s [2007]
EER of 3.9% for lab-quality prints and an EER of 9.3% for matching simulated marks
against the prints database.
7.4. Ear Individuality
While the uniqueness (individuality) of the ear has been generally accepted to be
true based on empirical results [Iannarelli 1989], the underlying scientific basis of ear
individuality has not been formally established. As a result, the validity of ear (or
earprint) evidence is now being challenged in several court cases.
In response to the U.S. appeals court ruling, a large-scale study involving 10,000 sub-
jects has been proposed by Prof. Andre Moenssens to determine the variability of the ear
across the population. In 1906, Imhofer studied a set of 500 ears and noted that he could
clearly distinguish between each ear based on only 4 features [Hoogstrate et al. 2001].
Iannarelli [1989] also examined the difference in ear structures between fraternal and
identical twins. He showed that even though their ear structures were similar, they
were still clearly distinguishable. The results of these studies support the hypothesis
that the ear has a unique physiological structure.
Burge and Burger [2000] presented a study of the ear individuality using Iannarelli’s
features [Iannarelli 1989]. Assuming an average standard deviation in the population
of four units, the 12 Iannarelli’s measurements provide a space with (412) variation
which is less than 17 million distinct points. Purkait and Singh [2008] presented a
preliminary study to test the individuality of human ear patterns. They manually
extracted 12 inter-landmark linear distances from a set of 700 male and female indi-
viduals. They found that this 12-dimensional feature space could clearly distinguish
more than 99.9% of ear pairs, where very few pairs had distances which fell below the
safe distinction limit.
8. SUMMARY AND CONCLUSIONS
In this article, we have presented a survey of the ear biometric. We have considered
two main stages of an ear recognition system: first is the detection stage and second
is the features used for recognition. We have categorized the methods developed for
these two steps, discussed their characteristics, and reported their performance. Even
though current ear detection and recognition systems have reached a certain level of
maturity, their success is limited to controlled indoor conditions. For example, to the
best of our knowledge, ear biometrics have yet to be tested outdoors. In addition to
variation in illumination, other open research problems include occlusion due to hair,
ear symmetry, earprint forensics, ear classification, and ear individuality.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:30 A. Abaza et al.
For any researcher who wants to start working on ear biometrics, we have presented
a systematic discussion that includes available databases, detection and feature extrac-
tion techniques, as well as a survey of some unsolved ear recognition problems. The
benefits of ear recognition are yet to be fully realized and we anticipate an extensive
use of this biometric in next-generation face recognition systems.
REFERENCES
ABATE,A.,NAPPI,M.,RICCIO,D.,AND RICCIARDI, S. 2006. Ear recognition by means of a rotation invariant
descriptor. In Proceedings of the 18thIEEE International Conference on Pattern Recognition (ICPR).
437–440.
ABAZA, A. 2008. High performance image processing techniques in automated identification systems. Ph.D.
thesis, West Virginia University, Morgantown-WV.
ABAZA,A.,HEBERT,C.,AND HARRISON, M. F. 2010. Fast learning ear detection for real-time surveillance. In
Proceedings of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
ABAZA,A.AND ROSS, A. 2010. Towards understanding the symmetry of human ears: A biometric perspective.
In Proceedings of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
ABDELMOTTALEB,M.AND ZHOU, J. 2006. Human ear recognition from face profile images. In Proceedings of the
2nd International Conference on Biometrics (ICB). 786–792.
ALBERINK,I.AND RUIFROK, A. 2007. Performance of the fearid earprint identification system. Forensic Sci. Int.
166, 145–154.
ALVAREZ,L.,GONZALEZ,E.,AND MAZORRA, L. 2005. Fitting ear contour using an ovoid model. In Proceedings of
the IEEE International Carnahan Conference on Security Technology. 145–148.
ANSARI,S.AND GUPTA, P. 2007. Localization of ear using outer helix curve of the ear. In Proceedings of the
IEEE International Conference on Computing: Theory and Applications. 688–692.
ARBABZAVAR,B.AND NIXON, M. 2007. On shape-mediated enrolment in ear biometrics. In Proceedings of the
International Symposium on Visual Computing (ISVC). 549–558.
ARBABZAVAR,B.AND NIXON, M. 2008. Robust log-gabor filter for ear biometrics. In Proceedings of the 18th IEEE
International Conference on Pattern Recognition (ICPR).
ARBABZAVAR,B.AND NIXON, M. 2011. On guided model-based analysis for ear biometrics. Comput. Vision Image
Understand. 115, 74, 487–502.
ARBABZAVAR,B.,NIXON,M.,AND HURLEY, D. 2007. On model-based analysis of ear biometrics. In Proceedings
of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
BAILLYBAILLIERE,E.,BENGIO,S.,BIMBOT,F.,HAMOUZ,M.,KITTLER,J.,MARIETHOZ,J.,MATAS,J.,MESSER,K.,
POPOVICI,V.,POREE,F.,RUIZ,B.,AND THIRAN, J.-P. 2003. The banca database and evaluation protocol. In
Proceedings of the 4th International Conference on Audio- and Video-Based Biometric Person Authenti-
cation. Vol. 2688. 625–638.
BAMBER, D. 2001. Prisoners to appeal as unique ‘earprint’ evidence is discredited. http://www.telegraph.
co.uk/news/uknews/1364060/Prisoners-to-appeal-as-unique-earprint-evidence-is-discredited.html
BERTILLON, A. 1896. Signaletic Instructions Including: The Theory and Practice of Anthropometrical Identifi-
cation. R.W. McClaughry translation, The Werner Company.
BHANU,B.AND CHEN, H. 2008. Human Ear Recognition by Computer 1st Ed. Springer.
BOODOO,N.B.AND SUBRAMANIAN, R. K. 2009. Robust multibiometric recognition using face and ear images.
Int. J. Comput. Sci. Inf. Secur. 6,2.
BURGE,M.AND BURGER, W. 1997. Ear biometrics for machine vision. In Proceedings of the 21st Workshop of
the Austrian Association for Pattern Recognition.
BURGE,M.AND BURGER, W. 2000. Ear biometrics in computer vision. In Proceedings of the 15th IEEE Interna-
tional Conference on Pattern Recognition (ICPR). 826–830.
BUSTARD,J.AND NIXON, M. 2008. Robust 2D ear registration and recognition based on SIFT point matching.
In Proceedings of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
BUSTARD,J.AND NIXON, M. 2010. 3D morphable model construction for robust ear and face recognition. In
Proceedings of the IEEE Conference on Computer Vision and Patern Recognition (CVPR).
CADAVID,S.AND ABDELMOTTALEB, M. 2007. Human identification based on 3D ear models. In Proceedings of the
1st IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS). 1–6.
CADAVID,S.AND ABDELMOTTALEB, M. 2008a. 3D ear modeling and recognition from video sequences using shape
from shading. In Proceedings of the 19th IEEE International Conference on Pattern Recognition (ICPR).
1–4.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:31
CADAVID,S.AND ABDELMOTTALEB, M. 2008b. 3D ear modeling and recognition from video sequences using shape
from shading. IEEE Trans. Inf. Forens. Secur. 3, 4, 709–718.
CARREIRA –PERPINAN, M. A. 1995. Compression neural networks for feature extraction: Application to human
recognition from ear images. M.S. thesis, Faculty of Informatics, Technical University of Madrid, Spain.
CHAMPOD,C.,EVETT,I.,AND KUCHLER, B. 2001. Earmarks as evidence: A critical review. Forens. Sci. 46, 6,
1275–1284.
CHANG,K.,BOWYER,K.,SARKAR,S.,AND VICTOR, B. 2003. Comparison and combination of ear and face images
in appearance-based biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1160–1165.
CHEN,H.AND BHANU, B. 2004. Human ear detection from side face range images. In Proceedings of the IEEE
International Conference on Pattern Recognition (ICPR). 574–577.
CHEN,H.AND BHANU, B. 2005a. Contour matching for 3D ear recognition. In Proceedings of the IEEE Workshops
on Application of Computer Vision (WACV). 123–128.
CHEN,H.AND BHANU, B. 2005b. Shape model-based 3D ear detection from side face range images. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 122–127.
CHEN,H.AND BHANU, B. 2007. Human ear recognition in 3D. IEEE Trans. Pattern Anal. Mach. Intell. 29, 4,
718–737.
CHORAS, M. 2004. Human ear identification based on image analysis. In Proceedings of the 7th IEEE Interna-
tional Conference on Artificial Intelligence and Soft Computing (ICAISC).
CHORAS, M. 2005. Ear biometrics based on geometrical feature extraction. Electron. Lett. Comput. Vis. Image
Anal. 5, 3, 84–95.
CHORAS, M. 2007. Image feature extraction methods for ear biometrics – A survey. In Proceedings of the 6th
IEEE International Conference on Computer Information Systems and Industrial Management Applica-
tions. 261–265.
CHORAS,M.AND CHORAS, R. 2006. Geometrical algorithms of ear contour shape representation and feature
extraction. In Proceedings of the 6th IEEE International Conference on Intelligent Systems Design and
Applications (ISDA).
CUMMINGS,A.,NIXON,M.,AND CARTER, J. 2010. A novel ray analogy for enrollment of ear biometrics. In
Proceedings of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
DARWISH,A.A.,ABDELGHAFAR,R.,AND ALI, A. F. 2009. Multimodal face and ear images. J. Co m p u t . Sc i . 5 ,5,
374–379.
DEWI,K.AND YAHAGI, T. 2006. Ear photo recognition using scale invariant keypoints. In Proceedings of the
International Computational Intelligence Conference. 253–258.
DONG,J.AND MU, Z. 2008. Multi-Pose ear recognition based on force field transformation. In Proceedings of
the 2nd IEEE International Symposium on Intelligent Information Technology Application. 771–775.
EDE, R. 2004. Wrongful convictions put forensic science in the dock. The Times (London),February3.
FAHMY,G.,ELSHERBEENY,A.,MANDALA,S.,ABDELMOTTALEB,M.,AND AMMAR, H. 2006. The effect of lighting direc-
tion/condition on the performance of face recognition algorithms. In Proceedings of the SPIE Conference
on Human Identification.
FENG,J.AND MU, Z. 2009. Texture analysis for ear recognition using local feature descriptor and transform
filter. Proc. SPIE 7496,1.
FERET. 2003. Color FERET database. http : //face.nist.gov/colorf eret/
FIELDS,C.,FALLS,H.C.,WARREN,C.P.,AND ZIMBEROFF, M. 1960. The ear of newborn as an identification
constant. Obstetr. Gynecol. 16, 98–102.
GAO,W.,CAO,B.,SHAN,S.,CHEN,X.,ZHOU,D.,ZHANG,X.,AND ZHAO, D. 2008. CAS-PEAL the cas-peal
large-scale chinese face database and baseline evaluations. IEEE Trans. Syst. Man Cybernet. Part A
Syst. Hum. 38, 1, 149 – 161.
GAO,W.,CAO,B.,SHAN,S.,ZHOU,D.,ZHANG,X.,AND ZHAO, D. 2004. CAS-PEAL. http :
//www.jdl.ac.cn/peal/home.htm
GRAHAM,D.AND ALLISON, N. 1998. Characterizing virtual eigen-signatures for general-purpose face recogni-
tion. In Face Recognition: From Theory to Applications, Springer, 446–456.
HAILONG,Z.AND MU, Z. 2009. Combining wavelet transform and orthogonal centroid algorithm for ear recog-
nition. In Proceedings of the 2nd IEEE International Conference on Computer Science and Information
Technology.
HAJSAID,E.,ABAZA,A.,AND AMMAR, H. 2008. Ear segmentation in color facial images using mathematical
morphology. In Proceedings of the 6th IEEE Biometric Consortium Conference (BCC).
HOOGSTRATE,A.,VANDEN HEUVEL,H.,AND HUYBEN, E. 2001. Ear identification based on surveillance camera
images. Sci. Justice 41, 3, 167–172.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:32 A. Abaza et al.
HURLEY,D.,ARBABZAVAR,B.,AND NIXON, M. 2007. The ear as a bio-metric. In Handbook of Biometrics,Springer,
131–150.
HURLEY,D.,NIXON,M.,AND CARTER, J. 2000. Automatic ear recognition by force field transformations. In
Proceedings of the IEE Colloquium on Visual Biometrics. 7/1–7/5.
HURLEY,D.,NIXON,M.,AND CARTER, J. 2005a. Ear biometrics by force field convergence. In Proceedings of
the 5th International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA).
386–394.
HURLEY,D.,NIXON,M.,AND CARTER, J. 2005b. Force field feature extraction for ear biometrics. Comput. Vis.
Image Understand. 98, 3, 491–512.
IANNARELLI, A. 1989. Ear Identification, Forensic Identification Series. Paramount Publishing Company, Fre-
mont, CA.
ISLAM,S.,BENNAMOUN,M.,AND DAV IE S, R. 2008a. Fast and fully automatic ear detection using cascaded
adaboost. In Proceedings of the IEEE Workshop on Applications of Computer Vision. 1–6.
ISLAM,S.,BENNAMOUN,M.,MIAN,A.,AND DAV IE S, R. 2008b. A fully automatic approach for human recognition
from profile images using 2D and 3D ear data. In Proceedings of the 4th International Symposium on
3D Data Processing Visualization and Transmission.
ISLAM,S.,BENNAMOUN,M.,MIAN,A.,AND DAV IE S, R. 2009. Score level fusion of ear and face local 3D features
for fast and expression-invariant human recognition. In Proceedings of the 6th International Conference
on Image Analysis and Recognition. 387–396.
ISLAM,S.,BENNAMOUN,M.,OWENS,R.,AND DAVI ES, R. 2007. Biometric approaches of 2D-3D ear and face: A
survey. In Advances in Computer and Information Sciences and Engineering, Springer, 509–514.
JAIN,A.,ROSS,A.,AND PRABHAKAR, S. 2004. An introduction to biometric recognition. IEEE Trans. Circ. Syst.
Video Technol. 14, 1, 4–20.
KISKU,D.R.,GUPTA,P.,MEHROTRA,H.,AND SING, J. K. 2009b. Multimodal belief fusion for face and ear
biometrics. Intell. Inf. Manag. 1,3.
KISKU,D.R.,MEHROTRA,H.,GUPTA,P.,AND SING, J. K. 2009a. SIFT-Based ear recognition by fusion of detected
key-points from color similarity slice regions. In Proceedings of the IEEE International Conference on
Advances in Computational Tools for Engineering Applications (ACTEA). 380–385.
KOCAMAN,B.,KIRCI,M.,GUNES,E.O.,CAKIR,Y.,AND OZBUDAK, O. 2009. On ear biometrics. In Proceedings of
the IEEE Region 8 Conference (EUROCON).
KUMAR,A.AND ZHANG, D. 2007. Ear authentication using log-gabor wavelets. In SPIE Defence and Security
Symposium. Vol. 6539.
LAMMI, H. 2004. Ear biometrics. Tech. rep., Lappeenranta University of Technology.
LU,L.,ZHANG,X.,ZHAO,Y.,AND JIA, Y. 2006. Ear recognition based on statistical shape model. In Proceedings
of the 1st IEEE International Conference on Innovative Computing, Information and Control. 353–356.
LUCIANO,L.AND KRZYZAK, A. 2009. Automated multimodal biometrics using face and ear. In Proceedings of
the 6th International Conference on Image Analysis and Recognition (ICIAR). 451–460.
LYNCH, C. 2000. Ear-Prints provide evidence in court. Glasgow University News.
MAHOOR,M.,CADAVID,S.,AND ABDELMOTTALEB, M. 2009. Multimodal ear and face modeling and recognition. In
Proceedings of the IEEE International Conference on Image Processing (ICIP).
MEIJERMAN, L. 2006. Inter- and intra individual variation in earprints. Ph.D. thesis, University Leiden.
MEIJERMAN,L.,NAGELKERKE,N.,VAN BASTEN,R.,VANDER LUGT,C.,DECONTI,F.,DRUSINI,A.,GIACON,M.,SHOLL,
S., VANEZIS,P.,AND MAAT, G. 2006a. Inter and Intra-individual variation in applied force when listening
at a surface, and resulting variation in earprints. Med. Sci. Law 46, 141–151.
MEIJERMAN,L.,SHOLL,S.,DECONTI,F.,GIACON,M.,VANDER LUGT,C.,DRUSINI,A.,VANEZIS,P.,AND MAAT, G. 2004.
Exploratory study on classification and individualization of earprints. Forens. Sci. Int. 140, 91–99.
MEIJERMAN,L.,THEAN,A.,AND MAAT, G. 2005. Earprints in forensic investigations. Forens. Sc i. Med. Pa th ol.
1, 4, 247–256.
MEIJERMAN,L.,THEAN,A.,VANDER LUGT,C.,VAN MUNSTER,R.,VANANTWERPEN,G.,AND MAAT, G. 2006b. In-
dividualization of earprints: Variation in prints of monozygotic twins. For ens. Sci . Me d. Pat ho l. 2 , 1,
39–49.
MEIJERMAN,L.,VANDER LUGT,C.,AND MAAT, G. 2007. Cross-Sectional anthropometric study of the external ear.
Forens. Sci. 52, 286–293.
MESSER,K.,MATAS,J.,KITTLER,J.,LUETTIN,J.,AND MAITRE, G. 1999. XM2VTSDB: The extended M2VTS
database. In Proceedings of the 2nd International Conference on Audio and Video-Based Biometric Person
Authentication.
MID. 1994. NIST mugshot identification database. http: //www.nist.gov/srd/nistsd18.cfm
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:33
MIDDENDORFF,C.AND BOWYER, K. 2007. Multibiometrics using face and ear. In Handbook of Biometrics,
Springer, Chapter 16, 315–334.
MIDDENDORFF,C.,BOWYER,K.W.,AND YAN, P. 2007. Multimodal biometrics involving the human ear. In
Multimodal Surveillance: Sensors, Algorithms and Systems, Artech House, Boston, Chapter 8, 177–190.
MONWAR,M.M.AND GAVRILOVA, M. 2008. FES: A system for combining face, ear and signature biometrics using
rank level fusion. In Proceedings of the 3rd IEEE International Conference on Information Technology:
New Generations. 922–927.
MONWAR,M.M.AND GAVRILOVA, M. 2009. Multimodal biometric system using rank-level fusion approach.
IEEE Trans. Syst. Man Cybern. B39,4.
MORENO,B.,SANCHEZ,A.,AND VELEZ, J. 1999. On the use of outer ear images for personal identification in
security applications. In Proceedings of the 33rd IEEE International Conference on Security Technology.
469–476.
MORGAN, J. 1999. State v. Kunze, court of appeals of washington, division 2. 97 Wash.App. 832, 988 p.2d 977.
http://www.forensic-evidence.com/site/ID/ID Kunze.html
MU,Z.,YUAN,L.,XU,Z.,XI,D.,AND QI, S. 2004. Shape and structural feature based ear recognition. In
Proceedings of the 5th Chinese Conference on Biometric Recognition. 663–670.
NANNI,L.AND LUMINI, A. 2007. A multi-matcher for ear authentication. Pattern Recogn. Lett. 28, 16, 2219–
2226.
NANNI,L.AND LUMINI, A. 2009a. Fusion of color spaces for ear authentication. Pattern Recogn. 42, 9, 1906–
1913.
NANNI,L.AND LUMINI, A. 2009b. A supervised method to discriminate between impostors and genuine in
biometry. Expert Syst. Appl. 36, 7, 10401–10407.
NASEEM,I.,TOGNERI,R.,AND BENNAMOUN, M. 2008. Sparse representation for ear biometrics. In Proceedings
of the 4th International Symposium on Advances in Visual Computing (ISVC), Part II. 336–345.
NOSRATI,M.,FAEZ,K.,AND FARADJI, F. 2007. Using 2D wavelet and principal component analysis for per-
sonal identification based on 2D ear structure. In Proceedings of the IEEE International Conference on
Intelligent and Advanced Systems.
PAN,X.,CAO,Y.,XU,X.,LU,Y.,AND ZHAO, Y. 2008. Ear and face based multimodal recognition based on
KFDA. In Proceedings of the IEEE International Conference on Audio, Language and Image Processing
(ICALIP). 965–969.
PASSALIS,G.,KAKADIARIS,I.,THEOHARIS,T.,TODERICI,G.,AND PAPAIOANNOU, T. 2007. Towards fast 3D ear recog-
nition for real-life biometric applications. In Proceedings of the IEEE Conference on Advanced Video and
Signal Based Surveillance. 39–44.
PHILLIPS,P.J.,WECHSLER,H.,HUANG,J.,AND RAUSS, P. J. 1998. The feret database and evaluation procedure
for face recognition algorithms. Image Vis. Comput. 16, 5, 295–306.
PHILLIPS,P.,MOON,H.,RIZVI,S.A.,AND RAU SS, P. J. 2000. The feret evaluation methodology for face recognition
algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22, 10, 1090–1104.
PRAKASH,S.,JAYARAMAN,U.,AND GUPTA, P. 2008. Ear localization from side face images using distance transform
and template matching. In Proceedings of the 1st IEEE Workshops on Image Processing Theory, Tools
and Applications (IPTA).
PRAKASH,S.,JAYARAMAN,U.,AND GUPTA, P. 2009. A skin-color and template based technique for automatic ear
detection. In Proceedings of the 7th IEEE International Conference on Advances in Pattern Recognition
(ICAPR).
PUN,K.AND MOON, Y. 2004. Recent advances in ear biometrics. In Proceedings of the IEEE International
Conference on Automatic Face and Gesture Recognition (AFGR). 164–169.
PURKAIT,R.AND SINGH, P. 2008. A test of individuality of human external ear pattern: Its application in the
field of personal identification. Forens. Sci. Int. 178, 112–118.
RAHMAN,M.M.AND ISHIKAWA, S. 2005. Proposing a passive biometric system for robotic vision. In Proceedings
of the 10th International Symposium on Artificial Life and Robotics (AROB).
ROSS,A.,NANDAKUMAR,K.,AND JAIN, A. 2006. Handbook of Multibiometrics. Springer.
RUSIGN. 2005. Signature database. University of Rajshahi, Bangladesh.
RUTTY,G.,ABBAS,A.,AND CROSSLING, D. 2005. Could earprint identification be computerised? An illustrated
proof of concept paper. Int. J. Legal Med. 119, 333–343.
SAMARIA,F.AND HARTER, A. 1994. Parameterization of a stochastic model for human face identification. In
Proceedings of the 2nd IEEE Workshop on Application of Computer Vision.
SANA,A.AND GUPTA, P. 2007. Ear biometrics: A new approach. In Proceedings of the 6th International Confer-
ence on Advances in Pattern Recognition.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
22:34 A. Abaza et al.
SRINIVAS,B.G.AND GUPTA, P. 2009. Feature level fused ear biometric system. In Proceedings of the 17th IEEE
International Conference on Advances in Pattern Recognition.
THEOHARIS,T.,PASSALIS,G.,TODERICI,G.,AND KAKADIARIS, I. 2008. Unified 3D face and ear recognition using
wavelets on geometry images. Pattern Recogn. 41, 3, 796–804.
UMIST. 1998. UMIST database. http://www.shef.ac.uk/eee/research/iel/research/face.html.
USTB. 2005. University of science and technology beijing USTB database. http://www1.ustb.edu.cn/resb/
en/index.htm
VICTOR,B.,BOWYER,K.,AND SARKAR, S. 2002. An evaluation of face and ear biometrics. In Proceedings of the
16th IEEE International Conference on Pattern Recognition (ICPR). 429–432.
VIOLA,P.AND JONES, M. 2004. Robust real-time face detection. Int. J. Comput. Vis. 57, 2, 137–154.
WANG,Y.,MU,Z.,AND ZENG, H. 2008. Block-Based and multi-resolution methods for ear recognition using
wavelet transform and uniform local binary patterns. In Proceedings of the 19th IEEE International
Conference on Pattern Recognition (ICPR). 1–4.
WATABE,D.,SAI,H.,SAKAI,K.,AND NAKAMURA, O. 2008. Ear biometrics using jet space similarity. In Proceedings
of the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).
WOODARD,D.,FALTEMIER,T.,YAN,P.,FLY NN ,P.,AND BOWYER, K. 2006. A comparison of 3D biometric modalities. In
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
57–61.
WRIGHT,J.,YANG,A.Y.,GANESH,A.,SASTRY,S.S.,AND MA, Y. 2009. Robust face recognition via sparse represen-
tation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2, 210–227.
WU,J.,BRUBAKER,S.C.,MULLIN,M.D.,AND REHG, J. M. 2008. Fast asymmetric learning for cascade face
detection. IEEE Trans. Pattern Analysis Mach. Intell. 30, 3, 369–382.
XIAOXUN,Z.ANDYUNDE, J. 2007. Symmetrical null space lda for face and ear recognition. Neuro-Comput. 70,
4-6, 842–848.
XIE,Z.AND MU, Z. 2007. Improved locally linear embedding and its application on multi-pose ear recognition.
In Proceedings of the IEEE International Conference on Wavelet Analysis and Pattern Recognition.
XIE,Z.AND MU, Z. 2008. Ear recognition using lle and idlle algorithm. In Proceedings of the 19th IEEE
International Conference on Pattern Recognition (ICPR). 1–4.
XM2VTSDB. 1999. XM2VTSDB database. http : //www.ee.surrey.ac.uk/CV SSP/xm2vtsdb/
XU,X.AND MU, Z. 2007a. Feature fusion method based on kcca for ear and profile face based multimodal
recognition. In Proceedings of the IEEE International Conference on Automation and Logistics. 620–623.
XUX. AND MU, Z. 2007b. Multimodal recognition based on fusion of ear and profile face. In Proceedings of the
4th IEEE International Conference on Image and Graphics (ICIG). 598–603.
XU,X.,MU,Z.,AND YUAN, L. 2007. Feature-Level fusion method based on kfda for multimodal recognition
fusing ear and profile face. In Proceedings of the IEEE International Conference on Wavelet Analysis and
Pattern Recognition (ICWAPR). Vol. 3. 1306–1310.
YAN,P.AND BOWYER, K. 2005a. Empirical evaluation of advanced ear biometrics. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
YAN,P.AND BOWYER, K. 2005b. Multibiometrics 2D and 3D ear recognition. In Proceedings of the Audio-and
Video-Based Person Authentication Conference (AVBPA). 503–512.
YAN,P.AND BOWYER, K. 2006. An automatic 3D ear recognition system. In Proceedings of the 3rd IEEE
International Symposium on 3D Data Processing Visualization and Transmission. 326–333.
YAN,P.AND BOWYER, K. 2007. Biometric recognition using 3D ear shape. IEEE Trans. Pattern Anal. Mach.
Intell. 29, 8, 1297–1308.
YAN,P.,BOWYER,K.,AND CHANG, K. 2005. ICP-Based approaches for 3D ear recognition. In Proc. SPIE,
Biometric Technol. Hum. Identif. II. 282–291.
YAQUBI,M.,FAEZ,K.,AND MOTAMED, S. 2008. Ear recognition using features inspired by visual cortex and
support vector machine technique. In Proceedings of the IEEE International Conference on Computer
and Communication Engineering.
YUAN,L.AND MU, Z. 2007. Ear recognition based on 2D images. In Proceedings of the 1st IEEE International
Conference on Biometrics: Theory, Applications, and Systems (BTAS).
YUAN,L.,MU,Z.,AND LIU, Y. 2006a. Multimodal recognition using face profile and ear. In Proceedings of
the IEEE International Symposium on Systems and Control in Aerospace and Astronautics (ISSCAA).
887–891.
YUAN,L.,MU,Z.,ZHANG,Y.,AND LIU, K. 2006b. Ear recognition using improved non-negative matrix fac-
torization. In Proceedings of the 18th IEEE International Conference on Pattern Recognition (ICPR).
501–504.
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.
A Survey on Ear Biometrics 22:35
YUAN,L.,WANG,Z.,AND MU, Z. 2010. Ear recognition under partial occlusion based on neighborhood preserving
embedding. Proc. SPIE, Biometric Technol. Hum. Identif. VII 7667.
YUAN L. AND ZHANG, F. 2009. Ear detection based on improved adaboost algorithm. In Proceedings of the 8th
IEEE International Conference on Machine Learning and Cybernetics (ICMLC).
YUIZONO,T.,WANG,Y.,SATOH,K.,AND NAKAYAMA, S. 2002. Study on individual recognition for ear images
by using genetic local search. In Proceeding of the IEEE Congress on Evolutionary Computation (CEC).
237–242.
ZHANG,H.AND MU, Z. 2008. Ear recognition method based on fusion features of global and local features. In
Proceedings of the IEEE International Conference on Wavelet Analysis and Pattern Recognition.
ZHANG,H.,MU,Z.,QU,W.,LIU,L.,AND ZHANG, C. 2005. A novel approach for ear recognition based on ICA
and RBF network. In Proceedings of the 4th IEEE International Conference on Machine Learning and
Cybernetics. 4511–4515.
ZHANG,Z.AND LIU, H. 2008. Multi-View ear recognition based on b-spline pose manifold construction. In
Proceedings of the 7th IEEE World Congress on Intelligent Control and Automation.
ZHOU,J.,CADAVID,S.,AND ABDELMOTTALEB, M. 2010. Histograms of categorized shapes for 3D ear detection. In
Proceedings of the IEEE Conference on Biometrics: Theory, Applications, and Systems (BTAS).
Received March 2011; revised October 2011; accepted October 2011
ACM Computing Surveys, Vol. 45, No. 2, Article 22, Publication date: February 2013.