Conference PaperPDF Available

Gender recognition from face images with trainable COSFIRE filters

Authors:

Abstract and Figures

Gender recognition from face images is an important application in the fields of security, retail advertising and marketing. We propose a novel descriptor based on COSFIRE filters for gender recognition. A COSFIRE filter is train-able, in that its selectivity is determined in an automatic configuration process that analyses a given prototype pattern of interest. We demonstrate the effectiveness of the proposed approach on a new dataset called GENDER-FERET with 474 training and 472 test samples and achieve an accuracy rate of 93.7%. It also outperforms an approach that relies on handcrafted features and an ensemble of classifiers. Furthermore, we perform another experiment by using the images of the Labeled Faces in the Wild (LFW) dataset to train our classifier and the test images of the GENDER-FERET dataset for evaluation. This experiment demonstrates the generalization ability of the proposed approach and it also outperforms two commercial libraries, namely Face++ and Luxand. Example code is available here: https://it.mathworks.com/matlabcentral/fileexchange/58783-gender-recognition-from-face-images-with-trainable-cosfire-filters The paper is also available here: http://ieeexplore.ieee.org/document/7738068/
Content may be subject to copyright.
Gender recognition from face images with trainable COSFIRE filters
George Azzopardi
University of Malta
george.azzopardi@um.edu.mt
Antonio Greco, Mario Vento
University of Salerno
agreco@unisa.it, mvento@unisa.it
Abstract
Gender recognition from face images is an important ap-
plication in the fields of security, retail advertising and mar-
keting. We propose a novel descriptor based on COSFIRE
filters for gender recognition. A COSFIRE filter is train-
able, in that its selectivity is determined in an automatic
configuration process that analyses a given prototype pat-
tern of interest. We demonstrate the effectiveness of the pro-
posed approach on a new dataset called GENDER-FERET
with 474 training and 472 test samples and achieve an accu-
racy rate of 93.7%. It also outperforms an approach that re-
lies on handcrafted features and an ensemble of classifiers.
Furthermore, we perform another experiment by using the
images of the Labeled Faces in the Wild (LFW) dataset to
train our classifier and the test images of the GENDER-
FERET dataset for evaluation. This experiment demon-
strates the generalization ability of the proposed approach
and it also outperforms two commercial libraries, namely
Face++ and Luxand.
Keywords. Gender recognition, COSFIRE, trainable fil-
ters, faces
1. Introduction
In recent years the recognition of the gender from face
images has attracted interest in both fundamental and ap-
plied research. From the fundamental point of view it is
very intriguing to understand how for human beings gen-
der recognition is an effortless operation which is done very
rapidly, but for a computer vision algorithm the task could
be very challenging. The difficulties emerge from the pos-
sible variations of a face captured by a camera [1], which
depend on the image acquisition process (pose of the face,
image illumination and contrast, background), the intrinsic
differences between people’s faces (expression, age, race),
as well as the occlusions (sunglasses, scarves, hats). From
the applied research point of view, there is a commercial in-
terest to have systems that can automatically recognize the
gender from face images. Examples include surveillance
systems that can assist to restrict areas to one gender only,
Figure 1. The average face of (a) men and (b) women computed
from a subset of the FERET dataset [2].
faster processing in biometrics systems that rely on face
recognition, custom user interfaces depending on the gen-
der of the person interacting with them, smart billboards
designed to attract the attention of male or female audience,
and systems for the collection of data in support of market
analysis.
In Fig. 1 we illustrate the average face images of men
and women generated from a subset of the FERET dataset
[2]. From these images, one may observe differences in the
intensity distribution especially in the hair and eyes regions.
Based on these observations, many researchers use the pixel
intensity values of the faces to train a binary classifier for
gender recognition [3, 4, 5].
Further differences can be observed in terms of texture.
This could be due to the softer facial features of women and
more pronounced eyebrows, while men have a rougher skin
especially in the presence of beard. The most popular tex-
ture descriptors for face images are the histograms of local
binary patterns (LBP) [6, 7, 8].
One may also observe a variation in the shape of the face.
The face of a woman is generally more rounded, while the
face of a man is more elliptical. In [9], the authors exploited
this aspect and proposed the use of histogram of gradients
(HOG) descriptor [10] for the recognition of gender. In
other works, shape-based features have been combined with
other types of features in order to have a more robust clas-
sifier [11, 12, 13, 14].
Finally, there are also many subtle differences in the ge-
ometry of the faces. The average face of a man has closer
eyes, a thinner nose and a narrower mouth. These obser-
978-1-5090-3811-4/16/$31.00 c
2016 IEEE IEEE AVSS 2016, August 2016, Colorado Springs, CO, USA
235
(a) (b) (c) (d)
Figure 2. (a) A training face image of size 128 ×128 pixels. The encircled region indicates a prototype pattern of interest which is used
to configure a COSFIRE filter. The plus marker indicates the center of the prototype. (b) The superimposed (inverted) response maps of a
bank of Gabor filters with 16 orientations (θ={0, π/8, . . . 15π/8}and a single scale (λ= 4). (c) The structure of a COSFIRE filter that
is configured to be selective for the prototype pattern shown in (a). (d) The (inverted) response map of the concerned COSFIRE filter to
the input image in (a). The darker the pixel the higher the response.
vations triggered the investigation of what are known as fa-
cial fiducial distances, which are essentially the distances
between certain facial landmarks (e.g. nose, eyes contour,
eyebrows) [15]. The fiducial points may be detected us-
ing active shape model [16] or deep learning techniques
[17, 18].
We propose to use trainable COSFIRE (Combination of
Shifted Filter Responses) filters [19, 20] for gender recog-
nition from face images. COSFIRE filters have already
been found to be highly effective in different computer vi-
sion tasks, including contour detection [21, 22], retinal ves-
sel segmentation [23], object localization and recognition
[24, 25], and handwritten digit classification [26]. COS-
FIRE filters are trainable shape detectors. The term train-
able refers to the ability of determining their selectivity in an
automatic configuration process that analyses a given proto-
type pattern of interest in terms of its dominant orientations
and their mutual spatial arrangement. Our hypothesis is that
by configuring multiple COSFIRE filters that are selective
for different parts of the faces we can capture the subtle dif-
ferences that distinguish the faces of men and women.
The remaining part of the paper is organized as follows.
In Section 2 we describe how we form COSFIRE-based de-
scriptors. In Section 3 we evaluate their performance on a
subset of FERET and compare them with an approach that
relies on handcrafted features. We discuss certain aspects
of the proposed approach in Section 4 and finally we draw
conclusions in Section 5.
2. Method
In the following we give an overview of the trainable
COSFIRE approach and show how we use it to form face
descriptors. For further technical details on COSFIRE fil-
ters we refer the reader to [19].
2.1. COSFIRE filter configuration
The selectivity of a COSFIRE filter is determined in
an automatic configuration process that analyses the shape
properties of a given prototype pattern of interest. This pro-
cedure consists of the following steps. First, it applies a
bank of Gabor filters of different orientations and scales to
the given prototype image. Second, it considers a set of con-
centric circles around the prototype center and chooses the
local maximum Gabor responses along these circles. The
number of circles and their radii values are given by the
user. For each local maximum point ithe configuration pro-
cedure determines four parameter values; the scale λiand
the orientation θiof the Gabor filter that achieves the maxi-
mum response at that position, along with the polar coordi-
nates (ρi, φi) with respect to the prototype center. Finally, it
groups the parameter values of all points in a set of 4-tuples:
Sf={(λi, θi, ρi, φi)|i= 1 . . . n}(1)
where fdenotes the given prototype pattern and nrepre-
sents the number of local maximum points.
In Fig. 2a we show an image of a face. We use the encir-
cled region as a protoype to configure a COSFIRE filter to
be selective for the same and similar patterns. In Fig. 2b we
show the superimposed response maps of a bank of Gabor
filters which is used in the configuration stage and in Fig. 2c
we illustrate the structure of the resulting COSFIRE filter.
The ellipses represent the properties of the determined con-
tour parts. Their sizes and orientations indicate the param-
eters λiand θiof the concerned Gabor filters. We explain
the function of the white blobs in the next section.
2.2. COSFIRE filter response
The response of a COSFIRE filter is computed by com-
bining the responses of the involved Gabor filters indicated
in the set Sf. For each tuple iin Sfa Gabor filter with
236
a scale λiand an orientation θiis applied. Then, the next
step considers the respective Gabor responses at the loca-
tions indicated by the polar coordinates (ρi, φi)and applies
a multi-variate function to them to obtain a COSFIRE re-
sponse in every location (x, y)of an input image. For ef-
ficiency purposes, in practice the Gabor response maps are
shifted by the corresponding distance parameter value ρiin
the direction opposite to φi. In this way, all the concerned
Gabor responses meet at the support center of the filter.
In order to allow for some tolerance with respect to
the preferred positions, the Gabor response maps are also
blurred by taking the maximum of their neighbouring re-
sponses weighted by Gaussian function maps. The standard
deviation σiof such a Gaussian function depends linearly
on the distance ρifrom the support center: σi=σ0+αρi
where σ0and αare constants determined empirically on the
training set. In Fig. 2c the white blobs indicate the Gaussian
function maps that are used to blur the response maps of the
corresponding Gabor filters. The standard deviations of the
Gaussian functions increase with increasing distance from
the support center of the COSFIRE filter.
Finally, the response of a COSFIRE filter rSfin a loca-
tion (x, y)is achieved by combining the blurred and shifted
Gabor filter responses sλiiii(x, y)by geometric mean:
rSf=n
Y
i=1
sλiiii(x, y)
1
n
(2)
In Fig. 2d we illustrate the (inverted) response map of
the configured COSFIRE filter to the image in Fig. 2a. For
clarity purposes the zero values are rendered as white pix-
els and the non-zero values are rendered as shades of gray.
The darker the pixel the higher the COSFIRE response. The
maximum response is correctly obtained in the center of the
prototype that was used to configure the concerned COS-
FIRE filter. The filter, however, achieves other responses
(lower than the maximum) to patterns that are similar to the
prototype. In general, the filter responds to features that
consist of a horizontal edge surrounded by two curvatures
pointing outwards.
2.3. Face descriptor
We form a descriptor for face images by using the max-
imum responses of a collection of COSFIRE filters that are
selective for different parts of a face. In the example illus-
trated in Fig. 2 we demonstrate the configuration and appli-
cation of one COSFIRE filter that is selective for the central
region of the lips. Similarly, we may use other parts of the
face to configure more COSFIRE filters. For a given test
image we then apply all COSFIRE filters and consider a
spatial pyramid of three levels. In level zero we consider
only one tile, which is the same size of the given image, in
Figure 3. Example of the COSFIRE face descriptor using a sin-
gle filter. The circles indicate the locations of the maximum filter
responses in a three-level spatial pyramid, while the bar plots rep-
resent the values of the maximum responses.
level one we consider four tiles in a 2×2spatial arrange-
ment and in level two we consider 16 tiles in a 4×4grid.
For each of the 21 tiles we take the maximum value of
every COSFIRE filter. This means that for kCOSFIRE fil-
ters the descriptor results in a 21k-element vector. We nor-
malize to unit length the set of kCOSFIRE filter responses
in each tile. Fig. 3 shows the computation of the 21-element
vector after the application of a single filter.
The proposed approach that use the responses of multi-
ple COSFIRE filters for the description of a face is inspired
by the hypothesis of population coding in neuroscience.
Neurophysiologists believe that a shape is described by the
collective response of a set of shape-selective neurons in vi-
sual cortex [27]. Further inspiration was obtained from the
spatial pyramid matching approach with bags of features
[28].
2.4. Classification model
We use the resulting descriptors from the images in a
given training set to learn an SVM classification model with
the following chi-squared kernel K(xi, yi):
K(xi, yj) = (xiyj)2
1
2(xi+yj) + (3)
where xiand yjare the descriptors of training images iand
j, and the parameter represents a very small value1in or-
der to avoid division by zero errors. In practice, we use the
libsvm library [29] with the above mentioned custom kernel
and for the remaining parameters we use the default values.
3. Evaluation
3.1. Dataset
To the best of our knowledge there is not yet a standard
dataset for the evaluation of gender recognition algorithms.
1In Matlab we use the function eps
237
Most of the available datasets are designed for face recog-
nition purposes, and hence they do not make available the
gender labels. For this reason we decided to use a subset of
the FERET dataset [2], which is publicly available with the
name GENDER-FERET2.
In Fig. 4 we show some examples of faces available in
our new dataset, which consists of 946 frontal faces (473
m, 473 f). We randomly divided the dataset into a training
set that consists of 237 men and 237 women, and a test set
containing 236 men and 236 women. In both the training
and the test sets there are faces with different expressions,
illumination, skin colour, and backgrounds. The face of a
person is represented either in the training set or in the test
set but not in both.
3.2. Preprocessing
We applied the Viola-Jones algorithm [30] to every im-
age in the dataset and resized the detected faces to a fixed
size of 128 ×128 pixels.
3.3. Experiments with COSFIRE filters
In the following we evaluate the effectiveness of the pro-
posed approach on the GENDER-FERET dataset. We per-
formed a number of experiments by configuring and using
increasing number of COSFIRE filters. In the first exper-
iment we configured 10 filters with the following proce-
dure. First, we randomly chose five training faces of men
and five training faces of women. Then, for each randomly
picked face we chose a random region of size 19 ×19 pix-
els and used it as a prototype to configure a COSFIRE fil-
ter. If the selected prototype resulted in a COSFIRE with
less than 5 tuples we considered it as not enough salient
and chose a new one. The filters were configured with the
default parameters t1= 0.1,t2= 0.75,σ0= 0.67 and
α= 0.1as proposed in [19]. We only mention that in the
configuration of the filters we considered Gabor filter re-
sponses along three concentric circles and the center point:
ρ={0,3,6,9}. The sizes of the prototype patterns to-
gether with the number and radii of the concentric circles
were determined empirically on the training set.
Then we executed further experiments by incrementing
the set of COSFIRE filters by 10 at a time up to 250. In
Fig. 5 we plot the accuracy rate as a function of the num-
ber of filters used. For each set of COSFIRE filters we plot
two values, one of which is the training accuracy rate that
is achieved by 10-fold cross validation on the training set,
and the other one is the accuracy rate obtained on the test
set. With only 10 filters that result in a feature vector of
(21 ×10 =) 210 elements we achieved 83.79% and 81.4%
accuracy rates on the training and test sets, respectively.
2http://mivia.unisa.it/database/gender-feret.
zip
80
85
90
95
10 50 90 130 170 210 250
Accuracy rate (%)
No. of COSFIRE filters
Figure 5. Experimental results in the form of accuracy rate as a
function of the number of COSFIRE filters used. The square mark-
ers indicate the accuracy rate on the training set with a 10-fold
cross validation while the circles indicate the accuracy rates on the
test set. The solid square marker indicates the maximum accuracy
rate on the training set, which is achieved with 180 filters.
The accuracy increased rapidly up to 60 filters and then in-
creased slowly until it reached a plateau. The maximum
accuracy rate of 93.68% on the training set was achieved
with 180 COSFIRE filters. By using the same 180 filters
we achieved 93.66% accuracy on the test set.
3.4. Comparison with handcrafted features
We compared the proposed trainable approach with an
approach that relies on two handcrafted feature descriptors
namely histogram of gradients (HOG) [10] and local binary
patterns (LBP) [6], as well as raw pixel values. The selec-
tion of these types of features is motivated from the fact
that they extract different information from a given image.
The HOG descriptors extract information about the edges
that essentially describe the shape, LBP descriptors extract
information about the texture and the raw pixel values de-
scribe the intensity distribution.
3.4.1 Raw pixels
First, we rescaled the intensity values in the range [0,1] by
dividing by 255. Then we reshaped the face images of size
128 ×128 pixels to a vector of 16384 values.
3.4.2 LBP features
The LBP-based descriptor compares every pixel to its eight
neighbours. This resulted in a binary string of eight bits
which we converted to a scalar decimal value. Since we
used eight neighbours the decimal values had a range of
[0,255]. We used a spatial grid of 3×3and for each tile we
generated an L2-normalized histogram of 256 bins. Finally,
we concatenated all the histograms in a feature vector with
(256 ×9 =) 2304 elements.
238
Figure 4. Examples of face images in the GENDER-FERET dataset. The square boxes indicate the faces detections by Viola-Jones [30].
3.4.3 HOG features
For the HOG-based descriptor, we first divided a face image
in 49 blocks of 32×32 pixels that overlap by 50%. Then we
divided each block in 4 non-overlapping tiles, and for each
tile we generated an L2-normalized histogram of orienta-
tions with nine bins. We clipped the normalized histograms
at 0.2 and normalized again. The result is a feature vector
of (49 ×4×9 =) 1764 elements.
3.4.4 Experiments
For the descriptor that is based on raw pixels we learned an
SVM with a linear kernel. For the HOG- and LBP-based de-
scriptors that generated histograms of features, we learned
an SVM with a histogram intersection kernel for each of
them. We evaluated all possible combinations of these three
types of features by fusing the results of the corresponding
SVM classifiers. Fusion was achieved by summing up the
corresponding output probabilities of the classifiers. If the
total male probability was larger than the total female prob-
ability then the image was classified as a man, otherwise it
was classified as a woman.
Table 1 reports the accuracy rates that we achieved on
the test set for different combinations of features.
3.5. Testing generalization capability
We performed another experiment to test the general-
ization capability of the proposed COSFIRE-based descrip-
tors. We applied the same 180 COSFIRE filters to the train-
ing images of the Labeled Faces in the Wild (LFW) dataset
[31] and learned an SVM classification model with the chi-
squared kernel given in Eq. 3. The LFW dataset is designed
for studying the problem of unconstrained face recognition.
The face images in that dataset present challenging varia-
tions in pose, lighting, race, occlusions, and background.
Moreover, the LFW dataset is imbalanced, it consists of
7508 images of men and 2296 images of women.
We then applied the resulting classification model to the
test images of the GENDER-FERET dataset, and achieved
an accuracy rate of 90%, Table 1. This result gives a good
indication of the generalization capability of our method.
As a matter of fact, it is slightly higher than what the com-
mercial libraries Face++[32] and Luxand [33] achieve.
3.6. Discussion
The proposed approach with trainable COSFIRE filters
outperformed the combined approach of handcrafted fea-
tures and raw pixels. Our approach achieved an accuracy
rate of 93.7% and the best accuracy rate achieved with the
other features was 92.6%. The three types of features that
were used in the latter approach are complementary to each
other as the accuracy increased substantially when com-
bined together.
The COSFIRE-based descriptor that we propose is much
more versatile than the handcrafted features. It is based
on the configuration of COSFIRE filters with randomly se-
lected local patterns from training images. They do not re-
quire domain knowledge and they only expect as input the
size of the local patterns used for configuration, something
which can be determined empirically. This characteristic
makes the proposed COSFIRE-based approach suitable to
other computer vision applications.
In Table 1 we report the results obtained with different
configurations together with the results of two commercial
libraries, namely Luxand [33] and Face++ [32]. These two
libraries provide pre-trained classifiers which we used to
evaluate the performance on the GENDER-FERET test set.
239
Table 1. Experimental results. The first column indicates the name
of the dataset that was used for training: GF stands for GENDER-
FERET and LFW stands for Labeled Faces in the Wild. The head-
ings of the middle four columns indicate the features used to learn
SVM classification models with the indicated linear, histogram in-
tersection (H.Int) and chi-squared (χ2) kernels. The check marks
indicate which types of features are used to obtain the correspond-
ing accuracy.
Training Raw LBP HOG COSFIRE Acc
Dataset Linear H.Int H.Int χ2%
GF X88.3
GF X85.2
GF X90.0
GF X X 90.3
GF X X 91.9
GF X X 91.5
GF X X X 92.6
GF X93.7
LFW X90.0
Mixture Face++ [32] 89.6
Mixture Luxand [33] 89.2
The accuracy rates that they achieve are lower than that of
our method. We must point out, however, that Luxand and
Face++ were trained with a set of images that is different
than the one that we used for our approach. In order to simu-
late their scenario, we performed another experiment where
we used the training images of the LFW dataset and the test
images of the GENDER-FERET dataset. Also in this ex-
periment, our method showed better effectiveness than the
Face++ and Luxand libraries.
The comparison with Face++ and Luxand is interesting
because these two libraries use a geometric approach for the
detection of facial landmarks and gender recognition. The
library Face++ detects the gender by evaluating the position
of the fiducial points, identified using a multi-layer convo-
lutional neural network [18]. Luxand FaceSDK can auto-
matically identify a subject’s gender based on a still image
or motion stream. The SDK uses the coordinates of 66 fa-
cial feature points including eyes, eye contours, eyebrows,
lip contours and nose tip [33].
There are various directions for future work. One direc-
tion is to evaluate the performance of the proposed method
on a larger dataset that provides variations also in pose. An-
other direction is to use a keypoint detector technique, such
as Harris affine detector [34] or fiducial points [35] and use
the corresponding local patterns as prototypes to configure
COSFIRE filters. This approach would provide more infor-
mative and possible more distinctive prototype patterns in
comparison to the random region detector approach that we
use in this work. Moreover, it would be interesting to in-
vestigate various functions to transform a COSFIRE filter
response map into a descriptor. Here we kept it simple and
only used the maximum values in a spatial pyramid.
4. Conclusion
The proposed method that is based on the trainable COS-
FIRE filters and combined with an SVM of a chi-squared
kernel is highly effective for gender recognition from face
images. It outperforms an ensemble of three classifiers that
rely on the HOG and LBP handcrafted features along with
the raw pixel values.
The approach that we propose does not rely upon domain
knowledge and thus it is suitable for various image classifi-
cation tasks.
References
[1] C. B. Ng, Y. H. Tay, and B. M. Goi. A review of facial gender
recognition. Pattern Analysis and Applications, 18(4):739–
755, 2015.
[2] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The
FERET evaluation methodology for face-recognition algo-
rithms. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 22(10):1090–1104, 2000.
[3] B. Moghaddam and M. Yang. Learning gender with sup-
port faces. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 24(5):707–711, 2002.
[4] S. Baluja and H. A. Rowley. Boosting sex identification
performance. International Journal of computer vision,
71(1):111–119, 2007.
[5] J. Yang, D. Zhang, A. F. Frangi, and J. Y. Yang. Two-
dimensional pca: a new approach to appearance-based face
representation and recognition. Pattern Analysis and Ma-
chine Intelligence, IEEE Transactions on, 26(1):131–137,
2004.
[6] T. Ojala, M. Pietik¨
ainen, and T. M¨
aenp¨
a¨
a. Multiresolution
gray-scale and rotation invariant texture classification with
local binary patterns. Pattern Analysis and Machine Intelli-
gence, IEEE Transactions on, 24(7):971–987, 2002.
[7] Z. Yang and H. Ai. Demographic classification with local
binary patterns. In Advances in Biometrics, pages 464–473.
Springer, 2007.
[8] C. Shan. Learning local binary patterns for gender classifica-
tion on real-world face images. Pattern Recognition Letters,
33(4):431–437, 2012.
[9] V. Singh, V. Shokeen, and M. B. Singh. Comparison of fea-
ture extraction algorithms for gender classification from face
images. In International Journal of Engineering Research
and Technology, volume 2. ESRSA Publications, 2013.
[10] N. Dalal and B. Triggs. Histograms of oriented gradients for
human detection. In Computer Vision and Pattern Recogni-
tion, 2005. CVPR 2005. IEEE Computer Society Conference
on, volume 1, pages 886–893. IEEE, 2005.
[11] L. A. Alexandre. Gender recognition: A multiscale decision
fusion approach. Pattern Recognition Letters, 31(11):1422–
1427, 2010.
240
[12] J. E. Tapia and C. A. Perez. Gender classification based
on fusion of different spatial scale features selected by mu-
tual information from histogram of lbp, intensity, and shape.
Information Forensics and Security, IEEE Transactions on,
8(3):488–499, 2013.
[13] J. Bekios-Calfa, J. M. Buenaposada, and L. Baumela. Robust
gender recognition by exploiting facial attributes dependen-
cies. Pattern Recognition Letters, 36:228–234, 2014.
[14] G. Azzopardi, A. Greco, and M. Vento. Gender recognition
from face images using a fusion of svm classifiers. In Inter-
national Conference Image Analysis and Recognition, pages
533–538. Springer, 2016.
[15] R. Brunelli and T. Poggio. Face recognition: Features versus
templates. IEEE Transactions on Pattern Analysis & Ma-
chine Intelligence, (10):1042–1052, 1993.
[16] S. Milborrow and F. Nicolls. Locating facial features with
an extended active shape model. In Computer Vision–ECCV
2008, pages 504–513. Springer, 2008.
[17] Y. Sun, X. Wang, and X. Tang. Deep convolutional network
cascade for facial point detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recog-
nition, pages 3476–3483, 2013.
[18] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Extensive fa-
cial landmark localization with coarse-to-fine convolutional
network cascade. In Proceedings of the IEEE International
Conference on Computer Vision Workshops, pages 386–391,
2013.
[19] G. Azzopardi and N. Petkov. Trainable COSFIRE fil-
ters for keypoint detection and pattern recognition. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
35(2):490–503, Feb 2013.
[20] G. Azzopardi, L. Fernandez Robles, E. Alegre, and
N. Petkov. Increased generalization capability of trainable
cosfire filters with application to machine vision. In 23rd
International Conference on Pattern Recognition (ICPR),
2016, in print.
[21] G. Azzopardi and N. Petkov. A CORF computational model
of a simple cell that relies on lgn input outperforms the gabor
function model. Biological Cybernetics, 106:177–189, 2012.
10.1007/s00422-012-0486-6.
[22] G. Azzopardi, A. Rodriguez-Snchez, J. Piater, and N. Petkov.
A push-pull CORF model of a simple cell with antiphase
inhibition improves SNR and contour detection. PLoS ONE,
9(7):e98424, 07 2014.
[23] G. Azzopardi, N. Strisciuglio, M. Vento, and N. Petkov.
Trainable COSFIRE filters for vessel delineation with ap-
plication to retinal images. Medical Image Analysis,
19(1):4657, 2014.
[24] G. Azzopardi and N. Petkov. Automatic detection of vascu-
lar bifurcations in segmented retinal images using trainable
COSFIRE filters. Pattern Recognition Letters, 34:922–933,
2013.
[25] G. Azzopardi and N. Petkov. Ventral-stream-like shape rep-
resentation: from pixel intensity values to trainable object-
selective COSFIRE models. Frontiers in computational neu-
roscience, 8, 2014.
[26] G. Azzopardi and N. Petkov. A shape descriptor based on
trainable COSFIRE filters for the recognition of handwrit-
ten digits. In Computer Analysis of Images and Patterns
(CAIP 2013) Lecture Notes in Computer Science, pages 9–
16. Springer, 2013.
[27] A. Pasupathy and C. E. Connor. Population coding of shape
in area V4. Nature Neuroscience, 5(12):1332–1338, DEC
2002.
[28] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of
features: Spatial pyramid matching for recognizing natural
scene categories. In Proceedings of the 2006 IEEE Computer
Society Conference on Computer Vision and Pattern Recog-
nition - Volume 2, CVPR ’06, pages 2169–2178, Washing-
ton, DC, USA, 2006. IEEE Computer Society.
[29] C. C. Chang and C. J. Lin. LIBSVM: A library for
support vector machines. ACM Transactions on Intelli-
gent Systems and Technology, 2:27:1–27:27, 2011. Soft-
ware available at http://www.csie.ntu.edu.tw/
˜cjlin/libsvm.
[30] P. Viola and M. J. Jones. Robust real-time face detection.
International journal of computer vision, 57(2):137–154,
2004.
[31] LFW. Labeled faces in the wild. Available: http://
vis-www.cs.umass.edu/lfw/, 2007.
[32] Face++. Leading face recognition on cloud. Available:
http://www.faceplusplus.com/, 2014.
[33] Luxand. Facial feature detection technologies. Available:
https://www.luxand.com/, 2015.
[34] K. Mikolajczyk and C. Schmid. An affine invariant interest
point detector. In Computer VisionECCV 2002, pages 128–
142. Springer, 2002.
[35] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf. Deep-
face: Closing the gap to human-level performance in face
verification. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 1701–1708,
2014.
241
... In addition to the application of retinal vascular segmentation as mentioned before, there are some applications based on the COSFIRE filter in other directions. Azzopardi et al. (33) employed descriptors of different shapes based on trainable COSFIRE filters to recognize handwritten digits, detect vascular bifurcations in segmented retinal images in Azzopardi et al. (34), and achieve gender recognition from face images in Azzopardi et al. (35). Gecer et al. (36) proposed a method that can recognize objects with the same shape but different colors, by configuring different COSFIRE filters in different color channels. ...
Article
Full-text available
Retinal vessel extraction plays an important role in the diagnosis of several medical pathologies, such as diabetic retinopathy and glaucoma. In this article, we propose an efficient method based on a B-COSFIRE filter to tackle two challenging problems in fundus vessel segmentation: (i) difficulties in improving segmentation performance and time efficiency together and (ii) difficulties in distinguishing the thin vessel from the vessel-like noise. In the proposed method, first, we used contrast limited adaptive histogram equalization (CLAHE) for contrast enhancement, then excerpted region of interest (ROI) by thresholding the luminosity plane of the CIELab version of the original RGB image. We employed a set of B-COSFIRE filters to detect vessels and morphological filters to remove noise. Binary thresholding was used for vessel segmentation. Finally, a post-processing method based on connected domains was used to eliminate unconnected non-vessel pixels and to obtain the final vessel image. Based on the binary vessel map obtained, we attempt to evaluate the performance of the proposed algorithm on three publicly available databases (DRIVE, STARE, and CHASEDB1) of manually labeled images. The proposed method requires little processing time (around 12 s for each image) and results in the average accuracy, sensitivity, and specificity of 0.9604, 0.7339, and 0.9847 for the DRIVE database, and 0.9558, 0.8003, and 0.9705 for the STARE database, respectively. The results demonstrate that the proposed method has potential for use in computer-aided diagnosis.
... Most gender recognition work solves this problem using facial characteristics (Azzopardi, 2016;Cirne and Helio, 2017). Facial gender recognition usually consists of the following steps: facial detection, preprocessing and feature extraction, and binary classification. ...
Article
Full-text available
Automatic gender recognition is one of the frequently solved tasks in computer vision. It is useful for analysing human behaviour, intelligent monitoring or security. In this article, gender is recognized based on multispectral images of the hand. Hand (palm and back) images are obtained in the visible spectrum and thermal spectrum; then a fusion of images is performed. Some studies say that it is possible to distinguish male and female hands by some geometric features of the hand. The aim of this article is to determine whether it is possible to recognize gender by the thermal characteristics of the hand and, at the same time, to find the best architecture for this recognition. The article compares several algorithms that can be used to solve this issue. The convolutional neural network (CNN) AlexNet is used for feature extraction. The support vector machine, linear discriminant, naive Bayes classifier and neural networks were used for subsequent classification. Only CNNs were used for both extraction and subsequent classification. All of these methods lead to high accuracy of gender recognition. However, the most accurate are the convolutional neural networks VGG-16 and VGG-19. The accuracy of gender recognition (test data) is 94.9% for the palm and 89.9% for the back. Experiments in comparative studies have had promising results and shown that multispectral hand images (thermal and visible) can be useful in gender recognition.
... In the first class, the developer and the designer are required to extract features from the face images and to feed them into a machine learning classifier. For example, the authors in [16] utilized a combination of shifted filter responses (COSFIRE) [17] to extract features from points of interest in the face images. In their method, a collection of Gabor filters was used. ...
Article
Full-text available
Gender classification from human face images has attracted researchers over the past decade. It has great impact in different fields including defense, human-computer interaction, surveillance industry, and mobile applications. Many methods and techniques have been proposed depending on clear digital images and complex feature extraction preprocessing. However, most recent critical real systems use thermal cameras. This paper has the novelty of utilizing thermal images in gender classification. It proposes a unique approach called IRT_ResNet that adopts residual network (ResNet) model with different layer configurations: 18, 50, and 101. Two different datasets of thermal images have been leveraged to train and test these models. The proposed approach has been compared with convolutional neural network (CNN), principal component analysis (PCA), local binary pattern (LBP), and scale invariant feature transform (SIFT). The experimental results show that the proposed model has higher overall classification accuracy, precision, and F-score compared to the other techniques.
... In the last years plenty of methods have been proposed for gender recognition from face images, which adopt color, shape and texture features (Azzopardi et al., 2016a) (Azzopardi et al., 2017) , trainable features (Azzopardi et al., 2016b) (Simanjuntak and Azzopardi, 2019) (Azzopardi et al., 2018a), a combination of handcrafted and trainable features (Azzopardi et al., 2018b), single convolutional neural networks (CNNs) (Levi and Hassner, 2015) (Antipov et al., 2017) , multi-task CNNs (Ranjan et al., 2017) (Dehghan et al., 2017) (Gurnani et al., 2019) or ensemble of CNNs (Afifi and Abdelhamed, 2019) (Antipov et al., 2016). These approaches achieve over 95% of accuracy on facial images collected in controlled environments, but their performance decrease in presence of strong variations (pose, blur, noise, partial occlusions and so on). ...
... In contrast, the images that constitute the unconstrained or in-the-wild databases are either downloaded directly from the Internet or scanned into digital copies from printed photographs, i.e, the process of image formation is arbitrary. Recently, to address the demographic bias problems, some unconstrained datasets have been proposed that are subsets of the existing datasets such as: Gender-FERET [35], a subset of FERET [36] with a balanced number of sex; DiveFace [37], a subset of MegaFace [15] with a balanced number of sex and race; DiF [38] and FairFace [39], selected images from YFCC100M [40] dataset with balanced number of age, sex and race subjects; DemogPairs [41], a subset of VGGFace2 [13] with a balanced number of sex and race; and RFW [22], a subset of Ms-Celeb-1m [14] with a balanced number of race; ...
Preprint
Full-text available
Face recognition is a popular and well-studied area with wide applications in our society. However, racial bias had been proven to be inherent in most State Of The Art (SOTA) face recognition systems. Many investigative studies on face recognition algorithms have reported higher false positive rates of African subjects cohorts than the other cohorts. Lack of large-scale African face image databases in public domain is one of the main restrictions in studying the racial bias problem of face recognition. To this end, we collect a face image database namely CASIA-Face-Africa which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The performances of SOTA face recognition algorithms without re-training are reported as baselines. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc. The database can be downloaded from our http://www.cripacsir.cn/dataset/
... On this dataset, Azzopardi et al. (2016a) achieve an accuracy of 0.926 with a fusion of SVM classifiers fed with raw, texture and shape features. SVM classifiers trained with COSFIRE filters (Azzopardi et al. 2016b) or with a fusion of these features and local SURF descriptors (Azzopardi et al. 2018b) obtain 0.936 and 0.947 of accuracy, respectively. Basulaim and Dabash (2019) further improve the performance to 0.958 by using COS-FIRE filters and a cubic SVM. ...
Article
Full-text available
In the era of deep learning, the methods for gender recognition from face images achieve remarkable performance over most of the standard datasets. However, the common experimental analyses do not take into account that the face images given as input to the neural networks are often affected by strong corruptions not always represented in standard datasets. In this paper, we propose an experimental framework for gender recognition “in the wild”. We produce a corrupted version of the popular LFW+ and GENDER-FERET datasets, that we call LFW+C and GENDER-FERET-C, and evaluate the accuracy of nine different network architectures in presence of specific, suitably designed, corruptions; in addition, we perform an experiment on the MIVIA-Gender dataset, recorded in real environments, to analyze the effects of mixed image corruptions happening in the wild. The experimental analysis demonstrates that the robustness of the considered methods can be further improved, since all of them are affected by a performance drop on images collected in the wild or manually corrupted. Starting from the experimental results, we are able to provide useful insights for choosing the best currently available architecture in specific real conditions. The proposed experimental framework, whose code is publicly available, is general enough to be applicable also on different datasets; thus, it can act as a forerunner for future investigations.
Article
Face recognition is a popular and well-studied area with wide applications in our society. However, racial bias had been proven to be inherent in most State Of The Art (SOTA) face recognition systems. Many investigative studies on face recognition algorithms have reported higher false positive rates of African subjects cohorts than the other cohorts. Lack of large-scale African face image databases in public domain is one of the main restrictions in studying the racial bias problem of face recognition. To this end, we collect a face image database namely CASIA-Face-Africa which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The performances of SOTA face recognition algorithms without re-training are reported as baselines. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc. The database can be downloaded from our website.
Conference Paper
Gender recognition is an important function in numerous software applications that analyze facial images. This paper presents a quantum machine learning based solution towards automatic classification of gender from facial images using the hybrid classical-quantum neural network. Here we obtain a binary classifier using the knowledge of a pre-trained off-the-shelf deep neural network (DNN) combined with the transfer learning of a quantum variational circuit. The binary classifier is a hybrid network consisting of convolutional base of the DNN truncated with a dressed quantum circuit. We obtain performance better than previously reported in classifying publicly available dataset of facial images.
Article
Full-text available
The remarkable abilities of the primate visual system have inspired the construction of computational models of some visual neurons. We propose a trainable hierarchical object recognition model, which we call S-COSFIRE (S stands for Shape and COSFIRE stands for Combination Of Shifted FIlter REsponses) and use it to localize and recognize objects of interests embedded in complex scenes. It is inspired by the visual processing in the ventral stream (V1/V2 → V4 → TEO). Recognition and localization of objects embedded in complex scenes is important for many computer vision applications. Most existing methods require prior segmentation of the objects from the background which on its turn requires recognition. An S-COSFIRE filter is automatically configured to be selective for an arrangement of contour-based features that belong to a prototype shape specified by an example. The configuration comprises selecting relevant vertex detectors and determining certain blur and shift parameters. The response is computed as the weighted geometric mean of the blurred and shifted responses of the selected vertex detectors. S-COSFIRE filters share similar properties with some neurons in inferotemporal cortex, which provided inspiration for this work. We demonstrate the effectiveness of S-COSFIRE filters in two applications: letter and keyword spotting in handwritten manuscripts and object spotting in complex scenes for the computer vision system of a domestic robot. S-COSFIRE filters are effective to recognize and localize (deformable) objects in images of complex scenes without requiring prior segmentation. They are versatile trainable shape detectors, conceptually simple and easy to implement. The presented hierarchical shape representation contributes to a better understanding of the brain and to more robust computer vision algorithms.
Article
Full-text available
We propose a computational model of a simple cell with push-pull inhibition, a property that is observed in many real simple cells. It is based on an existing model called Combination of Receptive Fields or CORF for brevity. A CORF model uses as afferent inputs the responses of model LGN cells with appropriately aligned center-surround receptive fields, and combines their output with a weighted geometric mean. The output of the proposed model simple cell with push-pull inhibition, which we call push-pull CORF, is computed as the response of a CORF model cell that is selective for a stimulus with preferred orientation and preferred contrast minus a fraction of the response of a CORF model cell that responds to the same stimulus but of opposite contrast. We demonstrate that the proposed push-pull CORF model improves signal-to-noise ratio (SNR) and achieves further properties that are observed in real simple cells, namely separability of spatial frequency and orientation as well as contrast-dependent changes in spatial frequency tuning. We also demonstrate the effectiveness of the proposed push-pull CORF model in contour detection, which is believed to be the primary biological role of simple cells. We use the RuG (40 images) and Berkeley (500 images) benchmark data sets of images with natural scenes and show that the proposed model outperforms, with very high statistical significance, the basic CORF model without inhibition, Gabor-based models with isotropic surround inhibition, and the Canny edge detector. The push-pull CORF model that we propose is a contribution to a better understanding of how visual information is processed in the brain as it provides the ability to reproduce a wider range of properties exhibited by real simple cells. As a result of push-pull inhibition a CORF model exhibits an improved SNR, which is the reason for a more effective contour detection.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Conference Paper
The recognition of gender from face images is an important application, especially in the fields of security, marketing and intelligent user interfaces. We propose an approach to gender recognition from faces by fusing the decisions of SVM classifiers. Each classifier is trained with different types of features, namely HOG (shape), LBP (texture) and raw pixel values. For the latter features we use an SVM with a linear kernel and for the two former ones we use SVMs with histogram intersection kernels. We come to a decision by fusing the three classifiers with a majority vote. We demonstrate the effectiveness of our approach on a new dataset that we extract from FERET. We achieve an accuracy of 92.6 %, which outperforms the commercial products Face++ and Luxand.
Article
Applications such as human–computer interaction, surveillance, biometrics and intelligent marketing would benefit greatly from knowledge of the attributes of the human subjects under scrutiny. The gender of a person is one such significant demographic attribute. This paper provides a review of facial gender recognition in computer vision. It is certainly not a trivial task to identify gender from images of the face. We highlight the challenges involved, which can be divided into human factors and those introduced during the image capture process. A comprehensive survey of facial feature extraction methods for gender recognition studied in the past couple of decades is provided. We appraise the datasets used for evaluation of gender classification performance. Based on the results reported, good performance has been achieved for images captured under controlled environments, but certainly there is still much work that can be done to improve the robustness of gender recognition under real-life environments.