Fusion of Visual and Thermal Signatures with Eyeglass Removal
for Robust Face Recognition
Jingu Heo, Seong G. Kong, Besma R. Abidi, and Mongi A. Abidi
Imaging, Robotics, and Intelligent Systems Laboratory
Department of Electrical and Computer Engineering
The University of Tennessee
Knoxville, TN 37996-2100, U.S.A.
Abstract – This paper describes a fusion of visual and
thermal infrared (IR) images for robust face
recognition. Two types of fusion methods are
discussed: data fusion and decision fusion. Data fusion
produces an illumination-invariant face image by
adaptively integrating registered visual and thermal
face images. Decision fusion combines matching scores
of individual face recognition modules. In the data
fusion process, eyeglasses, which block thermal energy,
are detected from thermal images and replaced with an
eye template. Three fusion-based face recognition
techniques are implemented and tested: Data fusion of
visual and thermal images (Df), Decision fusion with
highest matching score (Fh), and Decision fusion with
average matching score (Fa). A commercial face
recognition software FaceIt® is used as an individual
recognition module. Comparison results show that
fusion-based face recognition techniques outperformed
individual visual and thermal face recognizers under
illumination variations and facial expressions.
Despite a significant level of maturity and a few
practical successes, face recognition is still a highly
challenging task in pattern recognition and computer
vision . Face recognition based only on the visual
spectrum has shown difficulties in performing consistently
under uncontrolled operating conditions. Face recognition
accuracy degrades quickly when the lighting is dim or
when it does not uniformly illuminate the face . Light
reflected from human faces also varies depending on the
skin color of people from different ethnic groups.
The use of thermal infrared (IR) images can improve
the performance of face recognition under uncontrolled
illumination conditions . Thermal IR spectrum
comprising mid-wave IR (3-5μm) and long-wave IR (8-
12μm) bands has been suggested as an alternative source
of information for detection and recognition of faces.
Thermal IR sensors measure heat energy emitted, not
reflected, from the objects. Hence thermal imaging has
great advantages in face recognition in low illumination
conditions or even in total darkness, where visual face
recognition techniques fail. However, thermal imaging
needs to solve several challenging problems. Thermal
signatures are subject to change according to body
temperatures caused by physical exercise or ambient
temperatures. Eyeglasses may result in loss of useful
information around the eyes in thermal face images since
glass material blocks a large portion of thermal energy.
In this paper, the fusion of visual and thermal IR images
is presented for enhancing robustness of face recognition.
Fusion exploits synergistic integration of information
obtained from multiple sources . Two types of fusion-
based face recognition techniques are developed and
compared: data fusion and decision fusion. Data fusion
refers to a sensor-level fusion of visual and thermal face
images to produce a new face image that is invariant to
illumination conditions. When eyeglasses are present,
eyeglass regions are detected with an ellipse fitting method
and replaced with template eye patterns to retain the
details useful for face recognition. Experiments show that
the data fusion method with eyeglass removal improves
the recognition accuracy. Decision fusion combines the
matching scores generated from the individual face
recognition modules. The decision fusion with average
matching score produced the highest recognition rate.
II. FUSION-BASED FACE RECOGNITION
A. Fusion of Visual and Thermal Face Recognition
Fusion techniques take advantage of the merits of
multiple information sources to improve the overall
recognition accuracy. Low-level data fusion integrates the
data from different imaging modalities to produce a new
data that contains more details. High-level decision fusion
combines the decisions from multiple classification
modules . Decision fusion can be accomplished with
majority voting, ranked-list combination , and the use
of Dempster-Shafer theory. Several fusion methods have
been attempted in face recognition. Biometric systems that
integrate face and fingerprint data  and face and speech
signals  improved the performance of personal
identification. Fusion of local and global features in the
face increased face recognition accuracy .
The combined use of visual and thermal IR image data
makes a viable means for improving the performance of
face recognition techniques . Face recognition
algorithms applied to the fusion of visible and thermal IR
images consistently demonstrated better performance than
when applied to either visible or thermal IR imagery alone
. Wilder et al.  showed an improved recognition
performance of the fusion of visual and thermal images at
the decision level.
B. Proposed Fusion Approach
This paper implements and tests fusion-based face
Figure 1 shows a schematic diagram
of the face recognition approaches discussed in this paper.
Data fusion (Df) produces illumination-invariant face
images by adaptively integrating visual and thermal face
images. Decision fusion schemes refine the classification
based on the average matching score (Fa) or on the highest
matching score (Fh). Although the concept of decision
fusion can have a much broader interpretation, the
decision fusion discussed combines the matching scores
obtained from individual face recognition modules.
Registered visual and thermal images of the same size
are normalized using the eye coordinates extracted from
the visual image. When eyeglasses are present in the
images, eyeglass regions are found by the use of ellipse
fitting and replaced, in the thermal images, with an
average eye template to enhance data fusion. FaceIt®, a
commercial face recognition software package highly
ranked in the face recognition vendor test (FRVT)
, is used as an individual face recognition module
for generating matching scores.
Figure 1: Visual and thermal face recognition techniques.
Visual (Vi), thermal (Th), data fusion (Df), and decision
fusion based on average matching score (Fa) and highest
matching score (Fh).
III. VISUAL AND THERMAL IMAGE FUSION
A. Weighted Averaging for Data Fusion
A simple data fusion can be represented as a weighted
sum of pixel intensities from individual sensor data:
x,yV x,ya x,yF
where F(x,y) denotes the fused output of a visual image
V(x,y) and a thermal image T(x,y). The coefficients a(x,y)
and b(x,y) represent the weights of each pixel (a(x,y) +
b(x,y) = 1). Figure 2 shows the image fusion based on
average intensity using both images (a(x,y) = b(x,y) = 0.5).
In general, weight factors can be determined according to
brightness intensity distributions. When a subject is
measured in low-illumination conditions, the weight
factors will be adjusted so that a(x,y) < b(x,y). When the
overall thermal contour of the face exceeds the average
contour measured in a normal room temperature range, the
weights will need to be a(x,y) > b(x,y).
(a) V(x,y) (b) T(x,y) (c) F(x,y)
Figure 2: Data fusion of visual and thermal images. (a)
Visual image, (b) Thermal image, and (c) Data-fused
image of (a) and (b) with a(x,y) = b(x,y) = 0.5.
B. Eyeglass Detection using Ellipse Fitting
The eyeglass regions in thermal face images can be
represented by ellipses. A thermal image, binarized with a
threshold, provides data points for fitting with ellipses.
After morphological filtering for noise reduction, the data
points in the binarized image are connected using the
Freeman chain coding with 8-connectivity . A non-
iterative ellipse-fitting algorithm  is applied to each set
of connected components to produce an ellipse. Figure 3
shows an ellipse with the parameters used for eyeglass
detection in thermal face images. The center of an ith
ellipse is denoted by Ci, 2αi and 2βi are the lengths of the
major axis and the minor axis respectively, and θi indicates
the orientation angle of the ellipse in the range of –π/2 < θi
Figure 3: Ellipse parameters.
Similarities of the ellipses within the face region, or
inside the biggest ellipse, are tested for possible eyeglass
regions. Among all the candidate glasses, a pair of ellipses
of similar shape and size is considered as eyeglasses in
thermal images. In this paper, the similarity of ith and jth
ellipsoids is defined as:
where θij represents the angle of the line segment that
connects the centers of the two ellipses Ci and Cj. We
assume that αjβj > αiβi so the similarity measure Sij is less
than 1. For a shape constraint, the ellipses must have the
ratio of major and minor axis (α/β ) in the range of 0.5 <
α/β < 1.5. For a size constraint, the ratio of major axis to
the face height is 0.2 < α/α
axes to face height is 0.4 < β/β < 0.8, where α and β
indicate major and minor axes of the biggest ellipse. Two
ellipses with the highest similarity measure of Sij > 0.7 are
considered as eyeglasses. Figure 4 illustrates an example
of detecting eyeglasses in thermal images using the ellipse
fitting. Among the ellipses generated from each connected
component, the biggest ellipse (C1) corresponds to the
face. Ellipses outside the face region (C2, C3, C7, C8, and
three ellipses inside the face region, the similarities are
calculated as S
result, the two ellipses C
similarity are identified as eyeglasses.
Figure 5: Performance of eyeglass detection.
Table 1 summarizes the performance of eyeglass
detection algorithm with the ellipse fitting method. Correct
detection rate was 86.6% for the subjects wearing
eyeglasses. For the face images with no eyeglasses, 97.1%
true negative accuracy was achieved. False positive and
false negative errors were 2.9% and 13.4%, respectively.
The database used in this experiment is comprised of
thermal images from the database developed by the
National Institute of Standards and Technology (NIST)
and Equinox Corporation
F < 0.8 and the ratio of minor
9) are not considered for similarity checking. For the
45 = 0.96, S = 0.38, and S
4 and C5 with the highest
56 = 0.40. As a
Table 1: Performance of eyeglass detection
Eyeglass → Eyeglass
No Eyeglass → No
Eyeglass (True Negative)
No Eyeglass → Eyeglass
Eyeglass → No eyeglass
Matched /Total Rate (%)
445/ 514 86.6
(a) (b) (c)
Figure 4: Eyeglass detection example using ellipse fitting.
(a) Original image, (b) Connected components of the
binary image, (c) Eyeglass regions detected using the
ellipse fitting method.
C. Data Fusion with Eyeglass Removal
Detected eyeglass regions are replaced with an average
eye template in the thermal images to enhance visual
quality around the eyes in data-fused images. Template
eye regions are obtained from the average of all thermal
face images without glasses.
eyeglass replacement with a template eye pattern. A
geometrical transformation of eye templates is performed
to fit the templates to the eyeglass regions detected by the
use of ellipse fitting. The eye templates for the left and the
right eyeglasses superimpose eyeglass regions after
rotating and resizing.
Figure 5 shows the performance of the glass detection
method discussed above as a function of intensity in the
range of [0,1] in terms of false acceptance rate (FAR) and
false rejection rate (FRR). The threshold can be found
where the false rejection rate reaches the minimum. In this
paper, the threshold was selected to be 0.57.
Figure 6 shows the result of
Figure 7 shows an example adaptive data fusion result
by eyeglass detection and replacement with the eye
template in thermal images. Eyeglass removal enhances
the visual quality of data-fused images.
example of face recognition using data-fused visual and
thermal images (Df). The five classification result faces
are in descending order of matching scores.
Figure 8 shows an
D. Decision Fusion
Decision fusion produces a new ranked list by
combining confidence measures from individual face
recognition modules. In this paper, matching scores are
used to determine ranked lists. Matching scores generated
and the gallery image are similar. The matching score (MF)
of decision fusion can be derived using the individual
scores of visual recognition module (M
recognition module (MT).
Decision fusion with average matching score (Fa)
determines the matching score as a weighted sum of MV
Figure 6: Eyeglass removal (a) Eyeglasses detected, (b)
Eyeglasses replaced by eye templates after rotation and
® measure the degree to which the probe image
V) and of thermal
Decision fusion with highest matching score (Fh) takes the
largest matching score of the two:
(a) (b) (c)
Figure 7: Adaptive data fusion with eyeglass removal. (a)
Original image, (b) Direct fusion of visual and thermal
images without eyeglass removal, and (c) Fused image
after eyeglass removal.
scores of visual and thermal face recognition modules. In
this paper, w = w
T denote weight factors for the matching
(a) Probe (b) 9.26 (c) 7.81
(d) 7.09 (e) 7.01 (f) 6.74
Figure 8: Face recognition with data fusion (Df) with
eyeglass removal. (a) Probe, (b)-(f) First five matches.
T = 0.5.
IV. PERFORMANCE EVALUATION
The National Institute of Standards and Technology and
Equinox Corporation built an extensive database of face
images using registered broadband-visible/IR camera
sensors for experimentation and statistical performance
evaluations . The NIST/Equinox database used for
evaluation of fusion-based face recognition performances
consists of visual and thermal IR images of 3,244 (1,622
per modality) faces from 90 individuals. One image for
each face taken with a frontal lighting condition is used for
the gallery. Probe images are divided according to
different conditions. Original 12-bits gray level thermal
images were converted into 8 bits and histogram
equalized. Table 2 describes the NIST/Equinox databases
of visual and thermal IR face images used in the
Table 2: The NIST/Equinox database of visual and thermal
IR face images
(Thermal) Eyeglass Lighting Expression
B. Performance Comparison
The recognition results of fusion-based techniques (Df,
Fa, and Fh) are compared with the single modality cases
(Vi and Th) at various lighting directions. The three probe
sets 1, 2, and 3 contain 1,018 images in total with no
eyeglasses. Figure 9 demonstrates the first 10 best matches
of different recognition methods in terms of matching
scores. Visual face recognition relatively under-performed
due to illumination variations. Fusion-based methods yield
reliable recognition results.
Figure 9: Performance evaluation of fusion-based face
recognition when no eyeglasses are present (probes 1, 2, 3).
Figure 10 compares the performances of the five face
recognition techniques when the subjects wear eyeglasses
(probes 4, 5, and 6). There are total 514 images in the set.
In Figure 10(a), thermal face recognition and data fusion
without eyeglass removal show unsatisfactory results due
to the energy blocking effect of eyeglasses. Eyeglasses
slightly affect the performance of visual face recognition
while affecting that of thermal face recognition
significantly. Figure 10(b) demonstrates that eyeglass
removal greatly improves the recognition performance in
thermal and data fusion techniques. Decision fusion with
average matching score gives the best performance.
Figure 10: Performance evaluation of fusion-based face
recognition when eyeglasses are present (probe 4, 5, 6)
(a) Without eyeglass removal, (b) With eyeglasses
replaced with templates.
Figure 11: Performance comparison in terms of the first
match success rates for the face images with eyeglasses
before and after eyeglass removal.