Fusion of Visual and Thermal Signatures with Eyeglass Removal for Robust Face Recognition
ABSTRACT This paper describes a fusion of visual and thermal infrared (IR) images for robust face recognition. Two types of fusion methods are discussed: data fusion and decision fusion. Data fusion produces an illumination-invariant face image by adaptively integrating registered visual and thermal face images. Decision fusion combines matching scores of individual face recognition modules. In the data fusion process, eyeglasses, which block thermal energy, are detected from thermal images and replaced with an eye template. Three fusion-based face recognition techniques are implemented and tested: Data fusion of visual and thermal images (Df), Decision fusion with highest matching score (Fh), and Decision fusion with average matching score (Fa). A commercial face recognition software FaceIt® is used as an individual recognition module. Comparison results show that fusion-based face recognition techniques outperformed individual visual and thermal face recognizers under illumination variations and facial expressions.
- SourceAvailable from: Nicholas Costen[show abstract] [hide abstract]
ABSTRACT: We describe results obtained from a testbed used to investigate different codings for automatic face recognition. An eigenface coding of shape-free faces using manually located landmarks was more effective than the corresponding coding of correctly shaped faces. Configuration also proved an effective method of recognition, with rankings given to incorrect matches relatively uncorrelated with those from shape-free faces. Both sets of information combine to improve significantly the performance of either system. The addition of a system, which directly correlated the intensity values of shape-free images, also significantly increased recognition, suggesting extra information was still available. The recognition advantage for shape-free faces reflected and depended upon high-quality representation of the natural facial variation via a disjoint ensemble of shape-free faces; if the ensemble comprised nonfaces, a shape-free disadvantage was induced. Manipulation within the shape-free coding to emphasize distinctive features of the faces, by caricaturing, allowed further increases in performance; this effect was only noticeable when the independent shape-free and configuration coding was used. Taken together, these results strongly support the suggestion that faces should be considered as lying in a high-dimensional manifold, which is locally linearly approximated by these shapes and textures, possibly with a separate system for local features. Principal components analysis is then seen as a convenient tool in this local approximationIEEE Transactions on Pattern Analysis and Machine Intelligence 09/1999; · 4.80 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: A face recognition system must recognize a face from a novel image despite the variations between images of the same face. A common approach to overcoming image variations because of changes in the illumination conditions is to use image representations that are relatively insensitive to these variations. Examples of such representations are edge maps, image intensity derivatives, and images convolved with 2D Gabor-like filters. Here we present an empirical study that evaluates the sensitivity of these representations to changes in illumination, as well as viewpoint and facial expression. Our findings indicated that none of the representations considered is sufficient by itself to overcome image variations because of a change in the direction of illumination. Similar results were obtained for changes due to viewpoint and expression. Image representations that emphasized the horizontal features were found to be less sensitive to changes in the direction of illumination. However, systems...IEEE Transactions on Pattern Analysis and Machine Intelligence 08/1997; · 4.80 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: We present a comprehensive performance study of multiple appearance-based face recognition methodologies, on visible and thermal infrared imagery. We compare algorithms within the same imaging modality as well as between them. Both identification and verification scenarios are considered, and appropriate performance statistics reported for each case. Our experimental design is aimed at gaining full understanding of algorithm performance under varying conditions, and is based on Monte Carlo analysis of performance measures. This analysis reveals that under many circumstances, using thermal infrared imagery yields higher performance, while in other cases performance in both modalities is equivalent. Performance increases further when algorithms on visible and thermal infrared imagery are fused. Our study also provides a partial explanation for the multiple contradictory claims in the literature regarding performance of various algorithms on visible data sets.Computer Vision and Image Understanding. 01/2003;
Fusion of Visual and Thermal Signatures with Eyeglass Removal
for Robust Face Recognition
Jingu Heo, Seong G. Kong, Besma R. Abidi, and Mongi A. Abidi
Imaging, Robotics, and Intelligent Systems Laboratory
Department of Electrical and Computer Engineering
The University of Tennessee
Knoxville, TN 37996-2100, U.S.A.
Abstract – This paper describes a fusion of visual and
thermal infrared (IR) images for robust face
recognition. Two types of fusion methods are
discussed: data fusion and decision fusion. Data fusion
produces an illumination-invariant face image by
adaptively integrating registered visual and thermal
face images. Decision fusion combines matching scores
of individual face recognition modules. In the data
fusion process, eyeglasses, which block thermal energy,
are detected from thermal images and replaced with an
eye template. Three fusion-based face recognition
techniques are implemented and tested: Data fusion of
visual and thermal images (Df), Decision fusion with
highest matching score (Fh), and Decision fusion with
average matching score (Fa). A commercial face
recognition software FaceIt® is used as an individual
recognition module. Comparison results show that
fusion-based face recognition techniques outperformed
individual visual and thermal face recognizers under
illumination variations and facial expressions.
Despite a significant level of maturity and a few
practical successes, face recognition is still a highly
challenging task in pattern recognition and computer
vision . Face recognition based only on the visual
spectrum has shown difficulties in performing consistently
under uncontrolled operating conditions. Face recognition
accuracy degrades quickly when the lighting is dim or
when it does not uniformly illuminate the face . Light
reflected from human faces also varies depending on the
skin color of people from different ethnic groups.
The use of thermal infrared (IR) images can improve
the performance of face recognition under uncontrolled
illumination conditions . Thermal IR spectrum
comprising mid-wave IR (3-5μm) and long-wave IR (8-
12μm) bands has been suggested as an alternative source
of information for detection and recognition of faces.
Thermal IR sensors measure heat energy emitted, not
reflected, from the objects. Hence thermal imaging has
great advantages in face recognition in low illumination
conditions or even in total darkness, where visual face
recognition techniques fail. However, thermal imaging
needs to solve several challenging problems. Thermal
signatures are subject to change according to body
temperatures caused by physical exercise or ambient
temperatures. Eyeglasses may result in loss of useful
information around the eyes in thermal face images since
glass material blocks a large portion of thermal energy.
In this paper, the fusion of visual and thermal IR images
is presented for enhancing robustness of face recognition.
Fusion exploits synergistic integration of information
obtained from multiple sources . Two types of fusion-
based face recognition techniques are developed and
compared: data fusion and decision fusion. Data fusion
refers to a sensor-level fusion of visual and thermal face
images to produce a new face image that is invariant to
illumination conditions. When eyeglasses are present,
eyeglass regions are detected with an ellipse fitting method
and replaced with template eye patterns to retain the
details useful for face recognition. Experiments show that
the data fusion method with eyeglass removal improves
the recognition accuracy. Decision fusion combines the
matching scores generated from the individual face
recognition modules. The decision fusion with average
matching score produced the highest recognition rate.
II. FUSION-BASED FACE RECOGNITION
A. Fusion of Visual and Thermal Face Recognition
Fusion techniques take advantage of the merits of
multiple information sources to improve the overall
recognition accuracy. Low-level data fusion integrates the
data from different imaging modalities to produce a new
data that contains more details. High-level decision fusion
combines the decisions from multiple classification
modules . Decision fusion can be accomplished with
majority voting, ranked-list combination , and the use
of Dempster-Shafer theory. Several fusion methods have
been attempted in face recognition. Biometric systems that
integrate face and fingerprint data  and face and speech
signals  improved the performance of personal
identification. Fusion of local and global features in the
face increased face recognition accuracy .
The combined use of visual and thermal IR image data
makes a viable means for improving the performance of
face recognition techniques . Face recognition
algorithms applied to the fusion of visible and thermal IR
images consistently demonstrated better performance than
when applied to either visible or thermal IR imagery alone
. Wilder et al.  showed an improved recognition
performance of the fusion of visual and thermal images at
the decision level.
B. Proposed Fusion Approach
This paper implements and tests fusion-based face
Figure 1 shows a schematic diagram
of the face recognition approaches discussed in this paper.
Data fusion (Df) produces illumination-invariant face
images by adaptively integrating visual and thermal face
images. Decision fusion schemes refine the classification
based on the average matching score (Fa) or on the highest
matching score (Fh). Although the concept of decision
fusion can have a much broader interpretation, the
decision fusion discussed combines the matching scores
obtained from individual face recognition modules.
Registered visual and thermal images of the same size
are normalized using the eye coordinates extracted from
the visual image. When eyeglasses are present in the
images, eyeglass regions are found by the use of ellipse
fitting and replaced, in the thermal images, with an
average eye template to enhance data fusion. FaceIt®, a
commercial face recognition software package highly
ranked in the face recognition vendor test (FRVT)
, is used as an individual face recognition module
for generating matching scores.
Figure 1: Visual and thermal face recognition techniques.
Visual (Vi), thermal (Th), data fusion (Df), and decision
fusion based on average matching score (Fa) and highest
matching score (Fh).
III. VISUAL AND THERMAL IMAGE FUSION
A. Weighted Averaging for Data Fusion
A simple data fusion can be represented as a weighted
sum of pixel intensities from individual sensor data:
x,yV x,ya x,yF
where F(x,y) denotes the fused output of a visual image
V(x,y) and a thermal image T(x,y). The coefficients a(x,y)
and b(x,y) represent the weights of each pixel (a(x,y) +
b(x,y) = 1). Figure 2 shows the image fusion based on
average intensity using both images (a(x,y) = b(x,y) = 0.5).
In general, weight factors can be determined according to
brightness intensity distributions. When a subject is
measured in low-illumination conditions, the weight
factors will be adjusted so that a(x,y) < b(x,y). When the
overall thermal contour of the face exceeds the average
contour measured in a normal room temperature range, the
weights will need to be a(x,y) > b(x,y).
(a) V(x,y) (b) T(x,y) (c) F(x,y)
Figure 2: Data fusion of visual and thermal images. (a)
Visual image, (b) Thermal image, and (c) Data-fused
image of (a) and (b) with a(x,y) = b(x,y) = 0.5.
B. Eyeglass Detection using Ellipse Fitting
The eyeglass regions in thermal face images can be
represented by ellipses. A thermal image, binarized with a
threshold, provides data points for fitting with ellipses.
After morphological filtering for noise reduction, the data
points in the binarized image are connected using the
Freeman chain coding with 8-connectivity . A non-
iterative ellipse-fitting algorithm  is applied to each set
of connected components to produce an ellipse. Figure 3
shows an ellipse with the parameters used for eyeglass
detection in thermal face images. The center of an ith
ellipse is denoted by Ci, 2αi and 2βi are the lengths of the
major axis and the minor axis respectively, and θi indicates
the orientation angle of the ellipse in the range of –π/2 < θi
Figure 3: Ellipse parameters.
Similarities of the ellipses within the face region, or
inside the biggest ellipse, are tested for possible eyeglass
regions. Among all the candidate glasses, a pair of ellipses
of similar shape and size is considered as eyeglasses in
thermal images. In this paper, the similarity of ith and jth
ellipsoids is defined as:
where θij represents the angle of the line segment that
connects the centers of the two ellipses Ci and Cj. We
assume that αjβj > αiβi so the similarity measure Sij is less
than 1. For a shape constraint, the ellipses must have the
ratio of major and minor axis (α/β ) in the range of 0.5 <
α/β < 1.5. For a size constraint, the ratio of major axis to
the face height is 0.2 < α/α
axes to face height is 0.4 < β/β < 0.8, where α and β
indicate major and minor axes of the biggest ellipse. Two
ellipses with the highest similarity measure of Sij > 0.7 are
considered as eyeglasses. Figure 4 illustrates an example
of detecting eyeglasses in thermal images using the ellipse
fitting. Among the ellipses generated from each connected
component, the biggest ellipse (C1) corresponds to the
face. Ellipses outside the face region (C2, C3, C7, C8, and
three ellipses inside the face region, the similarities are
calculated as S
result, the two ellipses C
similarity are identified as eyeglasses.
Figure 5: Performance of eyeglass detection.
Table 1 summarizes the performance of eyeglass
detection algorithm with the ellipse fitting method. Correct
detection rate was 86.6% for the subjects wearing
eyeglasses. For the face images with no eyeglasses, 97.1%
true negative accuracy was achieved. False positive and
false negative errors were 2.9% and 13.4%, respectively.
The database used in this experiment is comprised of
thermal images from the database developed by the
National Institute of Standards and Technology (NIST)
and Equinox Corporation
F < 0.8 and the ratio of minor
9) are not considered for similarity checking. For the
45 = 0.96, S = 0.38, and S
4 and C5 with the highest
56 = 0.40. As a
Table 1: Performance of eyeglass detection
Eyeglass → Eyeglass
No Eyeglass → No
Eyeglass (True Negative)
No Eyeglass → Eyeglass
Eyeglass → No eyeglass
Matched /Total Rate (%)
445/ 514 86.6
(a) (b) (c)
Figure 4: Eyeglass detection example using ellipse fitting.
(a) Original image, (b) Connected components of the
binary image, (c) Eyeglass regions detected using the
ellipse fitting method.
C. Data Fusion with Eyeglass Removal
Detected eyeglass regions are replaced with an average
eye template in the thermal images to enhance visual
quality around the eyes in data-fused images. Template
eye regions are obtained from the average of all thermal
face images without glasses.
eyeglass replacement with a template eye pattern. A
geometrical transformation of eye templates is performed
to fit the templates to the eyeglass regions detected by the
use of ellipse fitting. The eye templates for the left and the
right eyeglasses superimpose eyeglass regions after
rotating and resizing.
Figure 5 shows the performance of the glass detection
method discussed above as a function of intensity in the
range of [0,1] in terms of false acceptance rate (FAR) and
false rejection rate (FRR). The threshold can be found
where the false rejection rate reaches the minimum. In this
paper, the threshold was selected to be 0.57.
Figure 6 shows the result of
Figure 7 shows an example adaptive data fusion result
by eyeglass detection and replacement with the eye
template in thermal images. Eyeglass removal enhances
the visual quality of data-fused images.
example of face recognition using data-fused visual and
thermal images (Df). The five classification result faces
are in descending order of matching scores.
Figure 8 shows an
D. Decision Fusion
Decision fusion produces a new ranked list by
combining confidence measures from individual face
recognition modules. In this paper, matching scores are
used to determine ranked lists. Matching scores generated
and the gallery image are similar. The matching score (MF)
of decision fusion can be derived using the individual
scores of visual recognition module (M
recognition module (MT).
Decision fusion with average matching score (Fa)
determines the matching score as a weighted sum of MV
Figure 6: Eyeglass removal (a) Eyeglasses detected, (b)
Eyeglasses replaced by eye templates after rotation and
® measure the degree to which the probe image
V) and of thermal
Decision fusion with highest matching score (Fh) takes the
largest matching score of the two:
(a) (b) (c)
Figure 7: Adaptive data fusion with eyeglass removal. (a)
Original image, (b) Direct fusion of visual and thermal
images without eyeglass removal, and (c) Fused image
after eyeglass removal.
scores of visual and thermal face recognition modules. In
this paper, w = w
T denote weight factors for the matching
(a) Probe (b) 9.26 (c) 7.81
(d) 7.09 (e) 7.01 (f) 6.74
Figure 8: Face recognition with data fusion (Df) with
eyeglass removal. (a) Probe, (b)-(f) First five matches.
T = 0.5.
IV. PERFORMANCE EVALUATION
The National Institute of Standards and Technology and
Equinox Corporation built an extensive database of face
images using registered broadband-visible/IR camera
sensors for experimentation and statistical performance
evaluations . The NIST/Equinox database used for
evaluation of fusion-based face recognition performances
consists of visual and thermal IR images of 3,244 (1,622
per modality) faces from 90 individuals. One image for
each face taken with a frontal lighting condition is used for
the gallery. Probe images are divided according to
different conditions. Original 12-bits gray level thermal
images were converted into 8 bits and histogram
equalized. Table 2 describes the NIST/Equinox databases
of visual and thermal IR face images used in the
Table 2: The NIST/Equinox database of visual and thermal
IR face images
(Thermal) Eyeglass Lighting Expression
B. Performance Comparison
The recognition results of fusion-based techniques (Df,
Fa, and Fh) are compared with the single modality cases
(Vi and Th) at various lighting directions. The three probe
sets 1, 2, and 3 contain 1,018 images in total with no
eyeglasses. Figure 9 demonstrates the first 10 best matches
of different recognition methods in terms of matching
scores. Visual face recognition relatively under-performed
due to illumination variations. Fusion-based methods yield
reliable recognition results.
Figure 9: Performance evaluation of fusion-based face
recognition when no eyeglasses are present (probes 1, 2, 3).
Figure 10 compares the performances of the five face
recognition techniques when the subjects wear eyeglasses
(probes 4, 5, and 6). There are total 514 images in the set.
In Figure 10(a), thermal face recognition and data fusion
without eyeglass removal show unsatisfactory results due
to the energy blocking effect of eyeglasses. Eyeglasses
slightly affect the performance of visual face recognition
while affecting that of thermal face recognition
significantly. Figure 10(b) demonstrates that eyeglass
removal greatly improves the recognition performance in
thermal and data fusion techniques. Decision fusion with
average matching score gives the best performance.
Figure 10: Performance evaluation of fusion-based face
recognition when eyeglasses are present (probe 4, 5, 6)
(a) Without eyeglass removal, (b) With eyeglasses
replaced with templates.
Figure 11: Performance comparison in terms of the first
match success rates for the face images with eyeglasses
before and after eyeglass removal.