ArticlePDF Available

Vision-based Human Gender Recognition: A Survey

Authors:

Abstract and Figures

Gender is an important demographic attribute of people. This paper provides a survey of human gender recognition in computer vision. A review of approaches exploiting information from face and whole body (either from a still image or gait sequence) is presented. We highlight the challenges faced and survey the representative methods of these approaches. Based on the results, good performance have been achieved for datasets captured under controlled environments, but there is still much work that can be done to improve the robustness of gender recognition under real-life environments.
Content may be subject to copyright.
Vision-based Human Gender Recognition: A Survey
Choon Boon Ng, Yong Haur Tay, Bok Min Goi
Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia.
{ngcb,tayyh,goibm}@utar.edu.my
Abstract. Gender is an important demographic attribute of people. This paper
provides a survey of human gender recognition in computer vision. A review of
approaches exploiting information from face and whole body (either from a still
image or gait sequence) is presented. We highlight the challenges faced and
survey the representative methods of these approaches. Based on the results,
good performance have been achieved for datasets captured under controlled
environments, but there is still much work that can be done to improve the ro-
bustness of gender recognition under real-life environments.
Keywords: Gender recognition, gender classification, sex identification, sur-
vey, face, gait, body.
1 Introduction
Identifying demographic attributes of humans such as age, gender and ethnicity using
computer vision has been given increased attention in recent years. Such attributes
can play an important role in many applications such as human-computer interaction,
surveillance, content-based indexing and searching, biometrics, demographic studies
and targeted advertising.
Studies have shown that a human can easily differentiate between a male and fe-
male (above 95% accuracy from faces [1]). However, it is a challenging task for com-
puter vision. Nevertheless, such attribute classification problems have not been as
well studied compared to the more popular problem of individual recognition. In this
paper, we survey the methods used for human gender recognition in images and vid-
eos using computer vision techniques.
In many applications, the system is non-intrusive or should not require the human
subject’s cooperation, physical contact or attention. Using human parts such as iris,
hand or fingerprint would require some cooperation from the human and thus limit its
applicability. We focus our attention on easily observable characteristics of a human.
Most researchers have relied on facial analysis as a means of determining gender,
while some work have been reported on using the whole body, either from a still im-
age or using gait sequences. Also, we concentrate on approaches which make use of
2-D (rather than the more costly 3-D) data in the form of still image or videos. Audio
cues such as voice are not considered in our survey as we are interested in methods
using computer vision.
In general, a pattern recognition problem such as gender recognition, when tackled
with a supervised learning technique, can be broken down into several steps which are
object detection, preprocessing, feature extraction and classification.
In the detection phase, given an image, the human subject or face region is detected
and the image is cropped. This will be followed by some preprocessing, for example
to normalize against variations in scale and illumination. A widely used method for
face detection is by Viola and Jones [2], which has an OpenCV implementation. The
benchmark for human detection is based on using Histogram of Oriented Gradients
(HOG) [3]. In the case of gait analysis, many methods use a binary silhouette of the
human which is extracted using background subtraction.
In feature extraction, representative descriptors of the image are found and selec-
tion of the most discriminative features may be made. In some cases when the number
of features is too high, dimension reduction can be applied. As this step is perhaps the
most important to achieve high recognition accuracy, we will provide a more detailed
review in later sections.
Lastly, the classifier is trained and validated with a dataset. Gender recognition is a
within-object classification problem [4]. The subject is to be classified as either male
or female, therefore a binary classifier is used. Examples of classifiers that have been
widely used to perform gender recognition are Support Vector Machine (SVM), Ada-
boost, neural networks and Bayesian classifier. From our survey, SVM is the most
widely used face gender classifier (usually using a non-linear kernel such as the radial
basis function), followed by boosting approaches such as Adaboost. Nearest neighbor
classifier and Markov models are also popular for gait-based gender classifiers.
The rest of this paper is organized as follows: In Section 2, potential applications
for gender recognition is identified. Section 3, 4 and 5 review the aspects of gender
recognition by face, gait and body, respectively. This is followed by concluding re-
marks in Section 6.
2 Applications
We identify several potential application areas where gender recognition would be
useful. They are listed down as follows:
1. Human-computer interaction systems. More sophisticated human-computer inte-
raction systems can be built if they are able to identify a human’s attribute such as
gender. The system can be made more human-like and respond appropriately. A
simple scenario would be a robot interacting with a human; it would require the
knowledge of gender to address the human appropriately (e.g. as Mr. or Miss).
2. Surveillance systems. In smart surveillance systems, it can assist in restricting
areas to one gender only, such as in a train coach or hostel. Automated surveillance
systems may also choose to pay more attention or assign a higher threat level to a
specific gender.
3. Content-based indexing and searching. With the widespread use of consumer elec-
tronic devices such as cameras, a large amount of photos and videos are being pro-
duced. Indexing or annotating information such as the number of people in the im-
age or video, their age and gender will become easier with automated systems us-
ing computer vision. On the other hand, for content-based searching such as look-
ing for a photo of a person, identifying gender as a preprocessing step will reduce
the amount of search required in the database.
4. Biometrics. In biometric systems using face recognition, the time for searching the
face database can be cut down and separate face recognizers can be trained for
each gender to improve accuracy [5].
5. Demographic collection. Demographic studies systems aim to collect statistics of
customers, including demographic information such as gender, for example, walk-
ing into a store or looking at a billboard. A system using computer vision can be
used to automate the task.
6. Targeted advertising. An electronic billboard system is used to present advertise-
ments on flat panel displays. Targeted advertising is used to display advertisement
relevant to the person looking at the billboard based on attributes such as gender.
For example, the billboard may choose to show ads of wallets when a male is de-
tected, or handbags in the case of female. In Japan, vending machines that use age
and gender information of customer to recommend drinks have seen increased
sales [6].
3 Gender Recognition by Face
Facial images are probably the most common biometric characteristic used by humans
to make a personal recognition [7]. The face region, which may include external fea-
tures such as the hair and neck region, is used to make gender identification.
3.1 Challenges
The image of a person’s face exhibits many variations which may affect the ability of
a computer vision system to recognize the gender. We can categorize these variations
as being caused by the human or the image capture process.
Human factors are due to the characteristics of a person, such as age, ethnicity and
facial expressions (neutral, smiling, closed eyes etc.), and the accessories being worn
(such as eye glasses and hat). Factors due to the image capture process are the per-
son’s head pose, lighting or illumination, and image quality (blurring, noise, low reso-
lution). Head pose refers to the orientation of the head relative to the view of the im-
age capturing device. The human head is limited to three degrees of freedom, as de-
scribed by the pitch, roll and yaw angles [8].
The impact of age and ethnicity on the accuracy of gender classification has been
observed. Benabdelkader and Griffin [9], after testing their classifier with a set of
12,964 face images, found that a disproportionately large number of elderly females
and young males were misclassified. In empirical studies by Guo et al. [10] using
several classification method on a large face database, it was found that gender classi-
fication accuracy was significantly affected by age, with adult faces having higher
accuracies than young or senior faces. In [11], when a generic gender classifier
trained for all ethnicities was tested on a specific ethnicity, the result was not as good
as a classifier trained specifically for that ethnicity.
3.2 Preprocessing
After the face is segmented from the image, some preprocessing may be applied. It
helps to reduce the sensitivity of the classifier to variations such as illumination, pose
and detection inaccuracies. Graf and Wichmann [12] pointed that cues such as bright-
ness and size will be learnt by the classifier such as SVM to produce artificially better
performance.
Preprocessing that may be applied to the face image include:
Normalize for contrast and brightness (e.g. using histogram equalization)
Removal of external features such as hair and neck region
Geometric alignment (either manually or using automatic methods)
Downsizing to reduce the number of pixels
For efficiency, it is preferable that the face image does not undergo alignment as it
requires significant time [13]. In a study by Mäkinen and Raisomo [14], it was found
that automatic alignment methods did not increase gender classification rate while
manual alignment increased the classification rate a little. They concluded that auto-
matic alignment methods need to be improved. Also, alignment is best done before
downsizing. If alignment is not done, deliberately adding misaligned faces to the
training data seems to help make the classifier robust to face misalignments [15].
3.3 Facial feature extraction
We broadly categorize feature extraction methods for face gender classification into
geometric-based and appearance-based methods, following [9][16]. The former is
based on measurements of facial landmarks. Geometric relationships between these
points are maintained but other useful information may be thrown away [9] and the
process of extracting the point locations need to be accurate [17]. Appearance-based
methods are based on some operation or transformation performed on the pixels of an
image. This can be done at the global (holistic) or local level. At the local level, the
face may be divided into defined regions such as eyes, nose and mouth or regularly
spaced windows. In appearance-based methods, the geometric relationships are natu-
rally maintained [9], which is advantageous when the gender discriminative features
are not exactly known. However, they are sensitive to variations in appearance (due to
view, illumination, expression, etc.) [9] and the large number of features [17]. All
methods mentioned in the following, other than those under fiducial distances, can be
categorized as appearance-based.
Fiducial distances. Important points on the face that mark features of the face, such
as nose, mouth, hair, ears and eyes are called facial landmarks or fiducial points.
Fiducial distances are the distances between these points. Psychophysical studies
using human subject established the importance of these distances in discriminating
gender. Brunelli and Poggio [18] used 18 point-to-point distances to train a hyper
basis function network classifier. Fellous [19] selected 40 points to calculate 22 nor-
malized vertical and horizontal fiducial distances. These points were extracted ma-
nually from images of frontal faces by a human operator. From these distances, five
dimensions were derived using discriminant analysis and used to classify gender.
Mozzafari et al. [20] used the aspect ratio of face ellipse fitting and rms distance be-
tween the ellipse and face contour as geometric features to complement their appear-
ance-based method.
Pixel intensity values. Pixel intensity values are used directly as input to train a clas-
sifier such as neural network (e.g. in the early works of [21] [22] [23]) or support
vector machine (SVM). As a preprocessing step, the images (after cropping the head),
are usually normalized to compensate for geometric and lighting variations, and final-
ly down-sampled to lower sizes. Gutta et al. [24] used a mixture of experts consisting
of ensembles of radial basis functions (RBFs) combined with inductive decision trees
for classification of 64 x 72 pixel face images. Moghaddam and Yang [25] used 21 x
12 pixel images to train an SVM classifier. Gaussian RBF kernel was found to give
the best performance for the SVM. Their results were considered as state-of-the-art
for some time. Pyramidal neural network architecture [26] and shunting inhibitory
convolutional neural network [27] have also been used as classifier. These work used
down-sampled frontal face images.
Baluja and Rowley [28] proposed a fast method that matched the accuracy of SVM
classifier. Simple pixel comparison operations were used as features for weak clas-
sifiers which were combined using AdaBoost to achieve performance 50 times faster.
The face images were normalized to 20 x 20.
Using pixel intensity values result in a large number of features which increases
proportionately to the size of the image. Dimension reduction methods such as Prin-
cipal Component Analysis (PCA) obtain a representation of an image in reduced di-
mension space. Early studies on gender recognition [22][21] used PCA. Genetic Al-
gorithm (GA) was used by Sun et al. [29] to remove eigenvectors that did not
seem to encode gender information. Castrillón et al. [30] used the PCA representa-
tion to train a SVM and used majority vote for temporal fusion in video sequences
obtained from webcam recording. A study by Buchala et al. [31] found that different
components of PCA encode different properties of the face such as gender, ethnicity
and age. Bui et al. [32] combined the vectors obtained from pixels of the whole face
image, local face regions and the gradient image. The vectors are combined to form a
single vector on which PCA was then applied.
Two-dimensional PCA (2DPCA) [33] has also been used for dimension reduction.
Lu and Shi [34] applied 2DPCA on three facial regions to obtain features. SVM was
used on each region and the classification result was based on consensus decision.
Another dimension reduction method, Independent Component Analysis (ICA),
was studied by Jain et al. [35]. Curvilinear Component Analysis (CCA) was proposed
by Demartines and Herault [36] to reduce the dimensionality of nonlinear data. Bu-
chala et al.[37] showed that CCA reduced the dimension of face image data more
effectively compared to PCA with comparable gender classification rate.
Rectangle features. Viola and Jones[38] introduced rectangle features for rapid face
detection. Figure 1 shows the example of these rectangles, which are also known as
Haar-like features. The sum of the pixels which lie within the white rectangles are
subtracted from the sum of pixels in the grey rectangles to obtain the value of a rec-
tangle feature. Integral image representation can be used to compute the rectangle
features rapidly. Adaboost is used to select the features and produce a perceptron
classifier. Shakhnarovich et al. [13] used these features for fast gender and ethnicity
classification of videos in real-time. Xu et al. [39] combined it with fiducial distances.
Fig. 1. Rectangle features [38]
Local binary patterns. Ojala et al. [40] introduced local binary patterns (LBP) for
grayscale and rotation invariant texture classification. Each pixel in an image is la-
beled by applying the LBP operator, which thresholds the pixel’s local neighborhood
at its grayscale value into a binary pattern. The local neighborhood is a circular sym-
metric set of any radius and number of pixels. A subset of the patterns, called “uni-
form patterns”, is defined to be patterns with at most 2 bitwise transitions (0/1 or 1/0).
They detect microstructures such as edge, corners and spot. The histogram of these
patterns is then used as a feature to describe texture. Lian and Lu [41] used LBP with
SVM for multi-view gender classification while Yang and Ai [42] applied it for clas-
sifying age, gender and ethnicity. Alexandre et al. [43] combined LBP with intensity
and shape feature (histogram of edge directions) in a multi-scale fusion approach,
while Ylioinas et al. [44] combined it with contrast information. Shan [45] used Ada-
boost to learn discriminative LBP histogram bins.
Other variants inspired by LBP have been proposed, such as Local Gabor Binary
Mapping Pattern [46][47][48], centralized Gabor gradient binary pattern [49], Local
Directional Pattern [50] and Interlaced Derivative Pattern [51].
Scale Invariant Feature Transform (SIFT). SIFT features are invariant to image
scaling, translation and rotation, and partially invariant to illumination changes and
affine projection [52]. Using these descriptors, objects can be reliably recognized even
from different views or under occlusion. The advantage of using invariant features
such as SIFT is that the preprocessing stage, including accurate face alignment, is not
required [53]. Demirkus et al. [54] exploited these characteristics, using a Markovian
model to classify face gender from unconstrained video in natural scenes. Wang et al.
[55] extracted SIFT descriptors at regular image grid points and combined it with
global shape contexts of the face, adopting Adaboost for classification. In another
work, they combined the SIFT descriptors with Gabor features [56].
Gabor wavelets. Research in neurophysiology has shown that Gabor filters fit the
spatial response profile of certain neurons in the visual cortex of the mammalian
brain. Lee [57] derived a family of 2-D Gabor wavelets for image representation. A
Gabor wavelet is defined by frequency, orientation and scale. Wiskott et al. [58] used
Gabor wavelets to label the nodes of an elastic graph which was used to represent the
face. Lian et al. [59] following the method by Hosoi et al [60], used Gabor wavelets
from different facial points located using retina sampling. Leng and Wang [61] ex-
tracted Gabor wavelets of five different scales and eight orientations from each pixel
of the image as features, which were then selected using Adaboost. Scalzo et al. [62]
extracted a large set of features using Gabor and Laplace filters which are used in a
feature fusion framework of which the structure was determined by genetic algorithm.
Gabor filters have also been used to obtain the simple cell units in biologically in-
spired features (BIF) which was proposed by Riesenhuber and Poggio [63] for object
recognition and later extended by Meyers and Wolf [64] for face processing. This
model contains simple (S) and complex (C) cell units arranged in hierarchical layers
of S1, C1, S2 and C2, with an S2FF layer for face processing. Guo et al. [10] found
that, for face gender recognition, the C2 and S2 layers degraded performance.
Others. Other facial representations that have been used include a generic patch-
based representation [4], regression function [65], multiscale filter banks in the BIF
model [66], Ranking Labels [67], subspace of random crops [68], Discrete Cosine
Transform [69][70], wavelets of Radon transform [71] and intensity statistics such as
mean, variance, skew and kurtosis [15].
For work using non-visible light images, experiment results of [72] indicate the
possibility of performing gender classification using near infrared images although
the performance was slightly inferior to visible spectrum images.
External cues. Features external to the face region such as hair, neck region and
clothes are also cues used by humans to identify gender. Ueki and Kobayashi [73]
integrated color and edge features from the face and neck regions, since the former
contains factors such as neck size, jawline and clothing type. Li et al. [16] included
features from hair and upper body clothing. Gallagher and Chen [74] used social con-
text information based on position of a person’s face in a group of people to help infer
gender in photographs.
3.4 Face Datasets
Table 1 summarizes several publicly available datasets that have been used for eva-
luating gender recognition. These datasets tend to be those collected for the use in
face recognition or detection evaluation. Some researchers take only a subset of the
datasets (excluding unsuitable images), or in order to obtain a large amount of images,
combine several datasets, including using their own collection (for example, collected
from the web). One common practice is to use a face detector to obtain face crops.
However, this may cause the data to be biased, for example if the detector successful-
ly detects only frontal and near-frontal faces. None of these public datasets were de-
signed specifically for gender recognition evaluation. As for private datasets, exam-
ples are WIT-DB [73], BUAA-IRIP [61], UCL [75], YGA [10] and BCMI[16].
Table 1. Public Face Datasets
Dataset No. of
images
No. of unique
Individuals
Controlled variations
AR [76] >4000 126 (70m,56f)
#
X,L,O
XM2VTS [77] 5900 295 P,L
FERET [78] 14126 1199 P,L,X
BioID [79] 1521 23 L, face size, background
CMU-PIE [80] 41368 68
#
P,L,X
FRGC [81] 50000 688 L,X, background
UND Biometrics-B [82] 33287 487 L,X
MORPH-2 [83] 55285 13660 (46767m, 8518f)
#
age
LFW [84] 13233 5749 (10256m, 2977f) Uncontrolled
CAS-PEAL-R1 [85] 30900 1040 (595m, 445f)
#
P,X,L,O
Images of Groups [74] 5080 < 28231
#
Uncontrolled
Notes on the table:-
Under No. of unique individuals, the breakdown of male and female faces is given in brack-
ets, where known; for example, 500m, 500f refers to 500 male and female faces each.
#
indicates gender is labeled.
The controlled variations are indicated as follows:
P – pose, view L – lighting, illumination X – expression O – occlusion
FERET is a widely used dataset for evaluation of face recognition algorithms, and has
also been used by many researchers for face gender recognition. It contains 14,126
images of 1199 individuals [78]. The faces have a variety pose, and some variation in
illumination and expression. Baluja et al. [28] noted that good results were achieved
with this database because the images are noise-free, have consistent lighting, and
without background clutter. Also, earlier researchers may have made the mistake of
including the same person in both the training and test set. The gender information is
not labeled, although for a subset of images (with 212 males and 199 females), it has
been made available by Mäkinen and Raisomo [14].
The CAS-PEAL face database is a large scale Chinese face database which con-
tains 99,594 images of 1040 individuals (595 males and 445 females) with vary-
ing pose, expression, accessories and lighting [85]. A subset of the database CAS-
PEAL-R1 which contains 30,900 images is available for research purpose by request.
The images were taken in a controlled environment with lighting and cameras placed
at various angles. Various accessories (glasses and hats) were worn by the subjects
and they were also asked to make various expressions. Different backgrounds were
also used to capture the effect of change in white balance of the camera. Gender in-
formation is contained in the image file name.
LFW (Labeled Faces in the Wild) was compiled to aid the study of unconstrained
face recognition. The dataset contains faces that show a large range of variation typi-
cally encountered in everyday life, exhibiting natural variability in factors such as
pose, lighting, race, accessories, occlusions, and background [84]. The number of
males outnumbers females, with some individual having appearing more than once.
In many datasets, the images are not annotated with gender information. Therefore
researchers had to manually label the ground truth using visual inspection, either by
themselves or with the help of others.
As a conclusion, no large, publicly available dataset specifically designed for the
problem of face gender recognition has been established. Recently, the MORPH-2
[83] and LFW datasets has been proposed as the standard for controlled and uncon-
trolled face gender classification, respectively [86]. Dago-casas et al. [87] recommend
Gallagher’s Images of Groups dataset over LFW for uncontrolled conditions, as the
former is more gender-balanced.
3.5 Evaluation and results
A list of representative works on face gender recognition is compiled in Table 2. The
table compares the features and classifier used, training and test dataset used and av-
erage total classification rate, as reported by the authors. The dataset characteristic in
terms of the controlled variety of images are also indicated.
The average classification rate (also referred to as classification accuracy or recog-
nition rate) obtained from the results of cross-validation is usually reported. The clas-
sification rate is the ratio of correctly classified test samples to the total number of test
samples. Five-fold cross validation is often used. Some researchers may test their
method on different databases for generalization ability (i.e. train with dataset A, test
with dataset B.) It is good practice to ensure faces of the same individual do not ap-
pear on both the training and test set. This is to prevent the classifier from recognizing
individual faces rather than gender [28]. Some researchers also ensure the same num-
ber (or ratio) of male and female faces are kept in the training and test sets.
Because of the different datasets and parameters used for evaluation, a straight
comparison between the methods is difficult. Some researchers use frontal face im-
ages only, while other may include non-frontal faces (variation in pose or view). The
variation (e.g. age, expression, illumination, etc.) that are present in the datasets also
differ. We have provided a review of some of the datasets in the previous section. It is
noted that the FERET dataset is the most often used. Even then, the subset of images
used varies between researchers.
Table 2. Face gender recognition
First Author,
Year
Feature
extraction
Classifier
Training data
Ave.
Acc.%
Dataset variety
Gutta, 2000
[24]
Pixel
values
RBF +
Decision tree
FERET- fa, fb
1906m 1100f
k-fold CV 96 F,E
Moghaddam,
2002 [25]
Pixel
values
SVM-RBF FERET
1044m 711f
5-CV 96.62 F
Shakhnarovich,
2002 [13]
Haar-like Adaboost
Web images
5-CV
Video seqs.
79
90
P (<30˚), A,E,L
Sun, 2002 [29] PCA
with GA
SVM UNR
300 m 300f
3-CV 95.3 E,X,L,S
Castrillon,
2003 [30]
PCA SVM+ tem-
poral fusion
Video frames
798m 231f
8123m
1755f
98.57 U,F
Buchala, 2005
[88]
PCA SVM –RBF Mix (FERET,
AR, BioID)
200m 200f
5-CV 92.25 F
Jain, 2005 [35] ICA SVM FERET
100m 100f
FERET
150m 150f
95.67 F,S
Baluja, 2006
[28]
Pixel
comp.
Adaboost FERET-fa,fb
1495m 914f
5-CV
94.3 F,S
Lapedriza,
2006 [66]
BIF multi
scale filt.
Jointboost FRGC 3440t
FRGC 1886t
10-CV
10-CV
96.77
91.72
Uniform background
Cluttered background
Lian, 2006 [41]
LBP histo-
gram
SVM-
polynomial
CAS-PEAL
1800m 1800f
CAS-PEAL
10784t
94.08 P (up to 30˚ yaw &
pitch)
Fok, 2006 [26] Pixel
values
Convolutional
neural net.
FERET - fa
1152m 610f
5-CV 97.2 F
Yang, 2007
[42]
LBP
histogram
Real Ada-
boost
Chinese shots
4696m, 3737f
5-CV
FERET
3540t
PIE 696t
96.32
93.3
91.1
U(X,O)
F,X,O
F,X,O
Makinen, 2008
[5]
Various -
pixels,
LBPH,
Haar-like
Various
(ANN, SVM,
Adaboost) in
combination
FERET
304m 304f
web images
1523m 1523f
FERET
76m 76f
web images
381m 381f
92.86
83.14
F,S
F
Leng, 2008
[61]
Gabor Fuzzy SVM FERET
160m 140f
CAS-PEAL
400m 400f
BUAA-IRIP
150m 150f
5-CV
5-CV
5-CV
98
89
93
F
F
F
Xu, 2008 [39] Haar-like,
fiducial
distances
SVM-RBF Mix ( FERET,
AR,Web)
500m 500f
5-CV 92.38 F,E,A,L,S
Xia, 2008 [46] LGBMP
hist.
SVM-RBF CAS-PEAL
1800m 1800f
CAS-PEAL
10784t
94.96 P (up to 30˚ yaw &
pitch)
Scalzo, 2008
[62]
Gabor &
Laplace
kernel spec-
tral regression
UNR
200 m 200f
3-CV 96.2 E,X,L,S
Zafeiriou, 2008
[89]
Pixel
values
SVM variant XM2VTS
1256m 1104f
5-CV 97.14 S
Aghajanian,
2009 [4]
Patch-
based
Bayesian
Web images
16km 16kf
Web images
500m 500f
89 U
Li, 2009 [70] DCT Spatial
GMM
YGA 6096t YGA 1524t 92.5 F,A,S
Lu, 2009 [34]
2D PCA SVM-RBF FERET
400m 400f
CAS PEAL
300m, 300f
5-CV
CAS-PEAL
1800t
94.85
95.33
F
F,X
Demirkus,
2010 [54]
SIFT
Bayesian
FERET
1780m 1780f
Video seqs.
(15m 15f)
90 U
(P,X,O,L)
Wang, 2010
[56]
SIFT,
Gabor
Adaboost
Mix (FERET,
CAS-PEAL ,
Yale, I2R)
4659t
10-CV ~97 F,X,L,O
Lee, 2010 [65] regression
function
SVM FERET-fa
1158m, 615f
Web images
3000 t
5-CV
Web images
3000t
98.8
88.1
F
A,E
Alexandre,
2010 [43]
Intensity,
hist. of
edge dir.,
LBP
SVM-linear
FERET
152m 152f
UND set B
130m 130f
FERET
60m 47f
UND set B
171m 56f
99.07
91.19
F,S
F,S
Li, 2011 [16] LBP,
(+ hair &
clothing
features)
SVM
FERET
227m 227f
BCMI
821m 821f
FERET
114m 114f
BCMI
274m 274f
95.8
95.3
F
F
[47] LGBP SVM-RBF CAS-PEAL
2142m 2142f
CAS-PEAL
2023m 996f
~91-97
per set
P (up to 67˚ yaw ), S
Zheng, 2011
[48]
LGBP-
LDA
SVMAC CAS-PEAL
2706m 2706f
(of 9 sets)
FERET
282m 282f
BCMI
361m 361f
CAS-PEAL
2175m
1164f
FERET
307m 121f
BCMI
168m 155f
99.8
per set
99.1
99.7
P (up to 30˚ yaw &
pitch), S
F
F
Shan, 2012
[45]
LBP hist.
bins
SVM-RBF LFW
4500m 2943f
5-CV 94.81 F,U,S
Notes on the table:-
Training data and testing data gives information on the dataset from which the images were taken for
training and testing the classifier, respectively. The breakdown of male and female faces is also given; for
example, 500m 500f refers to 500 male and female faces each. Where the breakdown could not be deter-
mined or was not given, the total faces used are given (e.g. 1000t.)
When the classification rate or accuracy is based on cross-validation result, this is indicated in the testing
data field; for example, 5-CV refers to five-fold cross validation, and the average rate from validation
results are given in the Ave. Acc. field. If classification rate for a separate or different test set is given, this
is used and the dataset is indicated.
Under dataset variety, the variations controlled or the variety available in the dataset, as mentioned by the
authors, are indicated as follows:
F – frontal only A – age E– ethnicity P – pose, view L – lighting, illumination
X – expression O – occlusion U – uncontrolled
S – indicates the same individual does not appear on both training and test set
For face images from FERET dataset, the best result is obtained by Zheng et al.
[48] and Alexandre et al. [43], with a classification rate of 99.1%. However, only
frontal faces from the dataset were used. Zheng et al. [48] achieved near 100% for
pose variations up to 30˚ yaw and pitch on the CAS-PEAL dataset. However, separate
classifiers had to be trained for each pose. For images taken in uncontrolled environ-
ments, Shan [45] obtained 94.8% on the LFW dataset which contains frontal and near
frontal faces.
In order to standardize the evaluation of facial analysis techniques, including the
gender classification problem, an international collaborative effort named BeFIT [90]
has promoted benchmarking activities by proposing standard evaluation protocols and
datasets. The MORPH-II and LFW datasets were proposed for constrained and un-
constrained face gender classification, respectively. Five-fold cross validation should
be used, with images of individual subjects only in one fold at a time. Distribution of
age, gender and ethnicity in the folds should be similar as the whole dataset. More
robust evaluation metrics (in addition to the traditionally-used metric mentioned
above) were recommended TPR (true positive rate), TNR (true negative rate) and
ACR (average correct rate, defined as the average of TPR and TNR). To deal with
datasets having imbalanced gender, AUC (area under the receiver operator characte-
ristic curve) should be used [86]. It will be interesting to compare the state-of-the-art
methods based on this recently established protocol.
3.6 3-D approaches
In this section, we briefly review 3-D approaches. Some studies have shown that
gender information is also contained in the depth profile of the face [12] [91] and that
gender classification is more effective with three dimensional head structure than with
image intensity information [92]. The shape information in 3-D face data (in the form
of points or meshes) from head scans is exploited by several researchers. Han et al.
[93] calculated the ratio of the surface area and volume of prominent facial features
(eyes, nose, mouth, cheek) in comparison to the whole face. Toderici et al. [94] used
Haar wavelet coefficients derived from the geometry- image representation of a fitted
deformable model. Lu et al. [95] combined registered range data (3-D points from
head scans) with intensity images, using SVM as classifier, to show better results
compared to each individual modality.
Appearance-based methods use 2-D images projected from the 3-D data obtained
from scanners. Tariq et al. [96] and Yang et al. [97] used the contour of the face pro-
file. Shen et al. [98] found that LBP was more effective than the raw pixels of the 3-D
images. Hu et al. [99] extracted 15 facial landmarks using 3-D information and then
obtained five facial regions to train SVM classifiers for each region.
These approaches using 3-D data have the disadvantage of requiring expensive
scanners and high computational complexity [100]. A 2.5D representation based on
facial surface normals, called facial needle-maps [101], can be retrieved from 2D
images and contain 3-D facial shape information from a fixed view point [100]. The
method requires a statistical face model constructed from laser scan data.
4 Gender Recognition by Gait
Gait is defined to be the coordinated, cyclic combination of movements that result in
human locomotion [102]. This would include walking, running, jogging and climbing
stairs. However, in computer vision research, it is usually restricted to walking. The
main advantages of using gait as a biometric are that it is non-obtrusive and can be
captured at a distance [103], in public places, without requiring cooperation or even
awareness of the subject [102]. The use of gait as a biometric is considered relatively
young history when compared to methods that use voice, finger prints or faces [102].
Classifying gender based on gait can be useful in some situations such as when the
face is not clear, not visible or heavily occluded.
4.1 Challenges
Many factors affect the gait of a person, such as load, footwear, walking surface, in-
jury, fatigue, drunkenness, mood, and change with time. Video-based analysis of gait
would also need to contend with the person’s clothing, camera view, speed of walking
and background clutter. For example, Hu et al [104] obtained lower gender classifica-
tion rates when the subject is carrying a bag or wearing overcoat. Makihara [105]
found that better results could be achieved if the age group was restricted to young
adults. The problem of view dependence has also been studied by many researchers
[105][106][107][108][109][110][111].
In a video sequence of a person walking, the gait cycle can be referred to as the
time interval between two consecutive left/right mid-stances [103]. Thus, a sequence
could contain one or more gait cycles. Gait is thus a dynamic biometric that contains
additional temporal and frequency information.
4.2 Feature extraction
Early work on gait analysis used point lights attached to the body’s joints. It was
found that based on the motion of the point lights during walking, identity and gender
of a person could be identified.[112][113][114] The reader can refer to [115] for a
survey on these early works. In this section, we review the representation and features
that have been used for gait-based gender recognition.
Human gait representation can be divided into appearance-based (model-free) or
model-based [116][117]. Appearance-based approaches have lower computational
cost while model-based approaches suffer from difficulty in extracting the features
robustly as they rely on accurate estimation of joints [117][118] and require high
quality gait sequences [103] where the body parts need to be tracked in each frame.
Moreover, model-based approaches ignore width information of the human body
[118]. However, they are view and scale invariant [103].
Model-based. Yoo et al., [119] obtained 2D stick figures from the body contour,
guided by anatomical knowledge. The sequence of stick figures from one gait cycle
was taken as a gait signature (Figure 2). Temporal, spatial and kinematic parameters,
and moments were considered as features and selected using a statistical distance
measure.
Fig. 2. Gait signature [119]
Appearance-based. In many methods, a silhouette of the walking human is obtained
first from the images in a gait sequence. Lee and Grimson [120] divided each human
silhouette into 7 regions and fitted ellipses into each region (Figure 3). The mean and
standard deviation of the ellipse centroid, major axis orientation and aspect ratio of
major and minor axis, together with the centroid height of the whole silhouette, was
taken across time to form the gait average appearance features. The features were then
selected using ANOVA. The advantage of the feature is robustness to silhouette
noise. However, it will be affected by viewpoint, clothing and gait changes [120].
Felez et al. [121] improvised by using a different regionalization of 8 parts to obtain
more realistic ellipses and meaningful feature space. However, such regional features
are vulnerable to deformed silhouettes caused by frequent occlusion [118]. Thus,
equal partitions formed by 2x2 and 4x4 grids were used by Hu et al. [118]. The ellipse
fit parameters were fused with the stance indexes as spatial and temporal features
respectively to train a mixed conditional random field (MRCF).
Fig. 3. Partitioned silhouette and the fitted ellipses [120]
Zhang and Wang [110] used frieze patterns of Liu et al. [122] to study multi-view
gender classification, in which they analyzed and compared the class separability of
these features from different view angles Fisher linear discriminant (FLD) analysis.
A frieze pattern is a two-dimensional pattern that repeats along one dimension (Figure
4). The gait representation is generated by projecting the silhouette along its columns
and rows, then stacking these 1-D projections over time [122]. These patterns enable
viewpoint estimation and can be used in model-based analysis for locating body parts
in video frames.
Fig. 4. Frieze patterns [122]
Shan et al. [123] showed that the Gait Energy Image (GEI) by Han & Bhanu [124]
was an effective representation for gender recognition. They fused gait with face fea-
tures using canonical correlation analysis (CCA) to improve performance. A GEI
[124] represents human motion in a single image while preserving temporal informa-
tion. It is obtained by averaging the silhouette images in one or more gait cycles to
produce a grayscale image (Figure 5), thus saving on storage and computational cost.
GEIs are also robust to silhouette noise in individual frames. Liu and Sarkar [125]
proposed a similar representation, averaged on one gait cycle, which they called the
average gait image (AGI).
Fig. 5. Two examples of GEI [124]
Yu et al. [117] divided the GEI into 5 different components, with each given a
weight based on the results from psychophysical experiments. Li et al. [126] parti-
tioned the AGI into 7 components corresponding to body parts, while Chen et al.
[109] used 8 components based on their consideration of walking patterns. Chang et
al.[107] obtained the GEI by estimating from a whole gait sequence, thus eliminating
the need to detect the gait cycle frequency. Hu and Wang [127] proposed a novel gait
pattern called Gait Principal Component Image (GPCI) which was obtained using
PCA. Lu and Tan [106] obtained the difference GEI from different views and intro-
duced uncorrelated discriminant simplex analysis (USDA) to project the GEIs into
lower dimensional feature subspace to increase view-invariant gender discrimination.
Signal transformation methods such as Radon, Fourier and wavelet transforms
have also been used. Chen et al. [128] applied Radon transform on the human sil-
houettes in a gait cycle and then used Relevant Component Analysis (RCA) for fea-
ture transformation. Oskuie and Faez [129] applied Radon Transform on the Mean
Gait Energy Image of Chen et al. [130] and then extracted the Zernike moments. Ma-
kihara et al. [105] used frequency-domain features obtained from the silhouette using
Discrete Fourier Transform (DFT) and reduced the dimension using Singular Value
Decomposition. Handri et al. [131] applied wavelet decomposition using Daubechies
function on the silhouette contour width and found it performed better than Fourier
Transform.
Some methods work directly on the image of the human subject instead of extract-
ing the silhouette. Chang and Wu [108] used DCT coefficients as texture features to
train a classifier based on embedded hidden Markov models. Hu et al. [104] applied
Gabor filter banks of 3 different scales and 6 orientations on the image to extract the
features. Maximization of Mutual Information (MMI) was used to learn the discri-
minative low dimensional representation. The Gabor-MMI feature vectors were used
to train two Gaussian Mixture Model-Hidden Markov Models (GMM-HMMs) to
perform classification.
There have also been work based on fusion of face and gait features. [123][132].
Both of these work use frontal face and side view of the gait to extract features, which
implies that 2 cameras would be needed in a real-world implementation.
4.3 Gait datasets
Datasets designed for evaluation of gait recognition were used for gender recognition.
Table 3 gives a summary of these datasets.
Table 3. Gait datasets
Dataset No. of
sequences
No. of unique
individuals
Controlled variations Remarks
SOTON Large
Gait [133]
>600 >100 View (front, side), Scene (outdoor,
indoor, treadmill)
only 15
females [117]
Human ID
[134]
1870 122 (85 males,
37 females)
View, shoe, walking surface, carry
briefcase, elapsed time
CASIA Gait
Set B [135]
13640 124 (93 males,
31 females)
View (11 azimuths), Walking status
(normal, carry bag, wear overcoat)
Gender
labeled
BUAA-IRIP
Gait [132]
4800 60 (32 males,
28 females)
View (7 azimuths) private
OU-ISIR [105] >4200 168 (88 males,
80 females)
View (2 heights, 12 azimuths and
overhead), Age
only a por-
tion in public
The Human ID dataset from University of South Florida contains 1870 sequences of
122 subjects. Each subject was asked to walk multiple circuits around an ellipse in
outdoors, with the last circuit taken as the dataset. However, the set is imbalanced in
gender, with more 85 males and only 37 females.
The Institute of Automation, Chinese Academy of Sciences (CASIA) produced the
CASIA Gait Database for gait recognition research, consisting of 3 sets- Dataset A,
Dataset B and Dataset C. Dataset A is a small database with 20 persons only, while
Dataset C are images taken using thermal infrared camera. Dataset B has 124 persons
of mostly Asians (except 1 European), and more males and females (93 vs. 31).
The BUAA-IRIP gait dataset was introduced by Zhang et al. [110] to study multi
view gender recognition using gait. It contains an almost equal number of male and
females, but the dataset is currently not publicly available. Recently, Osaka Universi-
ty introduced the OU-ISIR dataset which has 168 subjects of almost the same number
of males and females. The subjects cover a large range of age groups, from 4 to 75
years old. Currently, only a portion of the dataset is available for download, in the
form of silhouette sequences.
As a summary, compared to face datasets, gait datasets are currently smaller in the
number of subjects, perhaps due to acquisition costs. There is a need for publicly
available datasets with a larger number of subjects, balanced in gender, and also with
gait sequences captured in uncontrolled environments.
4.4 Evaluation and results
Table 4 shows a list of works on gender recognition based on gait. The table com-
pares the feature extraction methods, classifier, training and test dataset, and average
total classification rate, as reported by the authors. The dataset characteristic in terms
of the controlled variety of sequences are also indicated. Generally, the average cor-
rect classification rate obtained from the results of cross-validation is reported.
Table 4. Gait-based gender recognition
First
Author, Year
Feature
Extraction
Clas
sifier
Training data
Ave.
Acc.%
Dataset
variety
Lee, 2002
[120]
Ellipse fittings SVM Private 14m 10f (194)
*evenly split
84.5% N
Yoo, 2005
[119]
2D stick figures SVM-
polynomial
SOTON
84m 16f
10-CV 96% N
Huang, 2007
[111]
ellipse fittings SVM CASIA B
25m 25f (300)
CASIA B
5m 5f(60)
89.5% M (0˚, 90˚,
180˚)
Shan, 2008
[123]
GEI +
PCA +LDA
Nearest
neighbour
CASIA B
88m 31f (2380)
5-CV 94.5 N
Chen, 2009
[128]
Radon transform
silhouette +
Relevant com-
ponent analysis
Mahalano-
bis distance
IRIP
32m 28f (300)
LOO-CV
95.7
N
Chen, 2009
[109]
AGI Euclidean
distance
IRIP 32m 28f
(300 per angle)
LOO-CV 93.3
(~73-92,
per view)
M(0˚-180)
Yu, 2009
[117]
GEI SVM CASIA B
31m 31f (372)
31-CV
95.97
N
Chang, 2009
[107]
GEI+
PCA+LDA
Fisher
boost
CASIA B
93m 31f (8856)
124-CV
Videos
2m 2f(32)
96.79
84.38
M(0˚-180˚)
M(U)
Hu, 2009
[127]
GPCI k-NN IRIP
32m 28f (300)
LOO-CV 92.33 N
Chang, 2010
[108]
DCT EHMM CASIA B
25m 25f
5-CV 94 M(0˚-180˚)
Lu, 2010
[136]
GEI + UDSA Nearest
neighbour
CASIA B
31m 31f (4092)
LOO-CV 83-93
(per view)
M(0˚-180˚)
Felez, 2010
[121]
Ellipse fittings SVM-
linear
CASIA B
93m 31f (744)
10-CV 94.7 N
Hu, 2010
[104]
Gabor + MMI GMM-
HMM
CASIA B
31m 31f (372)
31-CV 96.77 N
Hu, 2011
[118]
ellipse fittings &
stance indexes
MRCF
CASIA B
31m 31f (372)
IRIP
32m 28f (300)
31-CV
LOO-CV
98.39
98.33
N
N
Handri, 2011
[131]
Wavelet trx. of
silhoutte contour
width + Modest
adaboost
k-NN Private
29m 14f (>172)
LOO-CV 94.3 N, A
Makihara,
2011 [105]
DFT of sil-
houette + SVD
k-NN OU-ISIR
20m 20f
20-CV ~70-80
(per view)
M (0˚-360˚,
overhead)
Oskuie, 2011
[129]
RTMGEI +
Zernike momts.
SVM
CASIA B 93m 31f
CASIA B 93m 31f
98.5
98.94
N
N, W, C
Notes on the table:-
Refer to notes for Table 2 for information regarding Training data , Test data and the Ave. Acc. fields.
Under Test data, the figure in the bracket is the total number of sequences used.
LOO-CV refers to leave-one-out cross validation.
Under dataset variety, the variations controlled are indicated as follows:
N – side view only M– multi-view (the range of angles are also given)
A – various age W – wearing overcoat C– carrying bag
For the CASIA gait dataset, Hu et al. [118] reported state of the art performance of
98.39% using side view sequences only. Oskuie and Faez [129] achieved slightly
higher result of 98.5% but their evaluation method is slightly different. They achieve
best result 98.94% when clothing and load variations are included. Chang et al. [107]
evaluated their method on real-time videos to achieve 84.38% after using the CASIA
dataset for training. Chang and Wu [108] achieved 94% average accuracy for multi-
view sequences without requiring prior knowledge of the view angle.
For the IRIP gait dataset, Hu et al. [118] reported state of the art performance of
98.33% using side view sequences only. For multiview sequences, Chen et al. [109]
achieved 93.3% from fusion of views, which would require using a camera for each
view. This would increase the computing complexity of the system and limit real-
world application [136].
In cross database testing, Yu et al. [117] found that the performance decreases. For
example, training with the CASIA dataset, only 87.15% accuracy could be achieved
with the SOTON dataset, and 87.9% vice versa. Both dataset have different ethnicity
of subjects, as well as clothing and capture conditions.
As a conclusion, gait-based gender recognition can achieve high classification rate
in controlled datasets, especially with a single view, usually side view. There is a need
for more investigation into generalization ability (cross database testing) and perfor-
mance for datasets containing larger number of subjects with sequences taken under
unconstrained environments.
5 Gender Recognition by Body
Here, we refer to the use of the static human body (either partially or as a whole) in an
image to infer gender, as opposed to using the face region only. Results from a se-
quence of several images may be fused. Classifying gender based on the human body,
like gait, is useful in situations where using the face is not possible or preferred, for
example, insufficient resolution or back view [137].
5.1 Challenges
Gender recognition based on human body is challenging in several aspects. To infer
the gender of a person, humans use not only body shape and hairstyle, but additional
cues such as type of clothes and accessories [138]. However, people of the same
gender may choose different styles of clothes [139]. On the other hand, the clothing
styles worn by both males and females may be somewhat similar. The same is true for
hairstyles. The classifier should also be robust to variation, pose, articulation and
occlusion of the person. Images of a person may also be taken under different illumi-
nation and background clutter. Perhaps due to difficulty caused by such variety, there
are few work found in literature on body-based gender recognition.
5.2 Feature extraction
The first attempt to recognize gender from full body images was by Cao et al. [139].
The human image is first centered and aligned so the height is normalized as a pre-
processing step, and then partitioned into patches corresponding to some parts of the
body. Each part was represented using Histogram of Oriented Gradients (HOG) fea-
ture, which was previously developed for human detection in images [3].
To compute HOG features, the image gradient is first obtained by using a 1-D
mask [-1,0,1] in horizontal and vertical direction. Next, the image is divided into cells
and the orientation histogram of gradients is computed for each cell. The cells are
grouped into larger overlapping blocks and each block is normalized for contrast. The
histograms in a block are concatenated to form a feature vector. The HOG feature is
able to capture local shape information from the gradient structure with easily control-
lable degree of invariance to translations or rotations [3]
Collins et al. [140] proposed their descriptor called PixelHOG (PiHOG) using
dense HOG features computed from a custom edge map. In addition, color informa-
tion was captured using a histogram computed based on the hue and saturation value
of the pixels. Their descriptor was found to perform better when compared to those
based on Pyramid Histogram of Orientation Gradients [141] and Pyramid Histogram
of Words [142].
Bourdev et al. [138] used a set of patches they call poselet for inferring attributes
of people in unconstrained environments. HOG features, color histogram and skin
features were used to represent the poselets. The poselets were used to train attribute
classifiers which were combined together to infer gender using context information.
Their approach is robust to variations in pose and occlusion, but requires a training
dataset with detailed annotations of keypoints of the human body.
Biologically-inspired features (BIF) were used for human body gender recognition
by Guo et al. [137]. Only C1 features were used, as it was found that C2 features de-
graded performance (as in the case of face gender recognition). Various manifold
learning techniques was applied on the features. Best results were obtained by first
classifying the view (front, back, or mixed) using BIF with PCA, and followed by the
gender classifier. PCA was more effective for mixed view, while LSDA (Locally
Sensitive Discriminant Analysis) for front and back view.
5.3 People Datasets
Table 5 summarizes several publicly available datasets that has been used. The
MIT and VIPeR datasets contain images of upright people only. The more challeng-
ing Attributes of People dataset contains images taken from the H3D dataset [143]
and the PASCAL 2010 trainval set for the person category (using the high resolution
versions from Flickr where available). The gender is labeled for 5760 individuals but
for the rest it remains unspecified.
Table 5. Public people datasets
Dataset No. of
images
No. of unique
individuals
Views
MIT CBCL [144] 924 < 924 Front, back
VIPeR [145] 1264 632 Front, back , side, diagonal
PASCAL VOC – Person [146] 4015 9218 Uncontrolled
Attributes of People [138] 8035 8035 (5760 gender
labeled)
Uncontrolled
5.4 Evaluation and results
Table 6 summarizes the results obtained from works on body-based gender recogni-
tion.
Table 6. Body-based gender recognition
First Author,
Year
Feature
Extraction
Classifi
er
Training
data
Test
data
Ave.
Acc.%
Dataset variety
Cao, 2008
[139]
HOG Adaboost
variant
MIT-CBCL
600m 288f
5-CV 75 View
(frontal, back)
Collins, 2009
[140]
PiHOG,
colour
SVM-
linear
MIT-CBCL
123m 123f
VIPeR
292m 291f
5-CV
5-CV
76
80.62
View (frontal)
View (frontal)
Guo, 2009
[137]
BIF+PCA/
LSDA
SVM-
linear
MIT-CBCL
600m 288f
5-CV 80.6 View
(frontal, back)
Bourdev, 2011
[138]
HOG,
colour
histogram,
skin pixels
SVM Attributes of People
3395m 2365f (split
between training,
validation & test sets)
82.4 Unconstrained
Bourdev et al. [138] achieved 82.4 % accuracy with unconstrained images of
people, but the ratio of male and female in their dataset is not balanced. Collins et al.
[140] achieved 80.6 % accuracy on a more balanced but small dataset with frontal
view only. From these results, there is still room for improvement, although it is not
clear how much.
5.5 3-D methods
Balan and Black [147] proposed a method to estimate the 3-D shape of a person from
images of the person wearing clothing. Multiples poses are required from views of
four synchronized cameras and a model of human body shapes learned from a data-
base of range scans is used to infer 3-D body shape. Using a dataset with a small
number of subjects, gender was classified using body shape with 94.3% accuracy. In a
further work, Guan et al. [148] [149] proposed methods to estimate body shape from a
single image. However, the approximate viewing direction and pose are assumed to
be known and would be vulnerable to occlusions [148].
Wuhrer and Rioux [150] proposed a posture invariant technique to classify gender
based on human body shape using triangular meshes obtained from laser range scan-
ners. Geodesic distances between landmarks on the body were used as features for
SVM classifier which achieved at least 93% classification accuracy.
Tang et al. [151] investigated gender recognition using 3-D human body shapes
from laser scanning. Mesh normal distribution, curvature information (including mean
and Gaussian curvatures), circle-based Fourier descriptors were used as features.
When these features were combined and SVM with RBF kernel used as classifier,
98.3% accuracy was achieved on a dataset of 1225 males and 1259 females in stand-
ing posture.
6 Conclusion
In this paper, we have presented a comprehensive survey on human gender recogni-
tion using computer vision-based methods, focusing on 2-D approaches. A lot of
work has been done utilizing facial information, with comparatively less exploiting
the whole body, whether from motion sequences such as gait or still image. We have
also highlighted the challenges and confounding factors, as well as provide a review
of the commonly-used features. Face-based gender recognition can be categorized
into geometric-based and appearance-based methods, with the latter dominating for
the past decade. Good performance has been achieved for images of frontal faces. For
multi-view situations (involving both frontal and non-frontal faces), there is room for
improvement, especially in uncontrolled conditions, as required in many practical
applications. The impact of age on classification rate has been identified in several
studies, and more work can be done to improve on this.
In situations when facial analysis is unsuitable, we can turn to using the whole
body. Current gait-based gender recognition, whether model-based or appearance-
based, depend on the availability of one or more complete gait sequences. High classi-
fication rate have been achieved with controlled datasets, especially with side views.
Investigation of the generalization ability of the methods (through cross database
testing) is called for. Performance for datasets containing larger number of subjects
with sequences taken under unconstrained environments is not yet established. To this
end, the suitable dataset need to be collected and made available. Some work has also
been done based on the static human body (either partially or as a whole) in an image
to infer gender.
References
1. V. Bruce et al., “Sex discrimination: how do we tell the difference between male and fe-
male faces?,” Perception, vol. 22, no. 2, pp. 131-152, 1993.
2. P. Viola and M. J. Jones, “Robust Real-Time Face Detection,” International Journal of
Computer Vision, vol. 57, no. 2, pp. 137-154, May 2004.
3. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Com-
puter Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Confe-
rence on, 2005, vol. 1, pp. 886–893.
4. J. Aghajanian, J. Warrell, S. J. D. Prince, J. L. Rohn, and B. Baum, “Patch-based within-
object classification,” in 2009 IEEE 12th International Conference on Computer Vision,
2009, pp. 1125-1132.
5. E. Mäkinen and R. Raisamo, “An experimental comparison of gender classification me-
thods,” Pattern Recognition Letters, vol. 29, no. 10, pp. 1544-1556, Jul. 2008.
6. “Vending machines recommend based on face recognition,” Biometric Technology To-
day, vol. 2011, no. 1, p. 12, Jan. 2011.
7. A. K. Jain, A. Ross, and S. Prabhakar, “An Introduction to Biometric Recognition,” IEEE
Transactions on Circuits and Systems, vol. 14, no. 1, pp. 4-20, 2004.
8. E. Murphy-Chutorian and M. M. Trivedi, “Head pose estimation in computer vision: A
survey,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no.
4, pp. 607–626, 2009.
9. C. Benabdelkader and P. Griffin, “A Local Region-based Approach to Gender Classifica-
tion From Face Images,” in Computer Vision and Pattern Recognition-Workshops, 2005.
CVPR Workshops. IEEE Computer Society Conference on, 2005, p. 52.
10. G. Guo, C. R. Dyer, Y. Fu, and T. S. Huang, “Is gender recognition affected by age?,” in
Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Confe-
rence on, 2009, pp. 2032–2039.
11. W. Gao and H. Ai, “Face gender classification on consumer images in a multiethnic envi-
ronment,” Advances in Biometrics, pp. 169–178, 2009.
12. A. B. A. Graf and F. A. Wichmann, “Gender classification of human faces,” Biologically
Motivated Computer Vision, pp. 491-500, 2002.
13. G. Shakhnarovich, P. Viola, and B. Moghaddam, “A unified learning framework for real
time face detection and classification,” in Proceedings of Fifth IEEE International Con-
ference on Automatic Face Gesture Recognition, 2002, pp. 16-23.
14. E. Makinen and R. Raisamo, “Evaluation of gender classification methods with automat-
ically detected and aligned faces,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 30, no. 3, pp. 541–547, 2008.
15. M. Mayo and E. Zhang, “Improving face gender classification by adding deliberately mi-
saligned faces to the training data,” in Image and Vision Computing New Zealand, 2008.
IVCNZ 2008. 23rd International Conference, 2008, pp. 1–5.
16. B. Li, X.-C. Lian, and B.-L. Lu, “Gender classification by combining clothing, hair and
facial component classifiers,” Neurocomputing, pp. 1-10, 2011.
17. H. Kim, D. Kim, and Z. Ghahramani, “Appearance-based gender classification with
Gaussian processes,” Pattern Recognition Letters, vol. 27, pp. 618-626, 2006.
18. R. Brunelli and T. Poggio, “Face recognition: features versus templates,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 1042-1052,
1993.
19. J. M. Fellous, “Gender discrimination and prediction on the basis of facial metric infor-
mation,” Vision Research, vol. 37, no. 14, pp. 1961–1973, 1997.
20. S. Mozaffari, H. Behravan, and R. Akbari, “Gender Classification Using Single Frontal
Image Per Person: Combination of Appearance and Geometric Based Features,” in 2010
20th International Conference on Pattern Recognition, 2010, pp. 1192-1195.
21. B. A. Golomb, D. T. Lawrence, and T. J. Sejnowski, “Sexnet: A neural network identi-
fies sex from human faces,” Advances in Neural Information Processing Systems, vol. 3,
pp. 572-577, 1991.
22. H. Abdi, D. Valentin, and B. Edelman, “More about the difference between men and
women: evidence from linear neural network and the principal-component approach,”
Perception, vol. 24, no. 1993, pp. 539–539, 1995.
23. S. Tamura, H. Kawai, and H. Mitsumoto, “Male/female identification from 8x6 very low
resolution face images by neural network,” Pattern Recognition, vol. 29, no. 2, pp. 331-
335, 1996.
24. S. Gutta, J. Huang, and P. Jonathon, “Mixture of experts for classification of gender, eth-
nic origin, and pose of human faces,” Neural Networks, IEEE Transactions on, vol. 11,
no. 4, pp. 948-960, 2000.
25. B. Moghaddam and M. H. Yang, “Learning gender with support faces,” Pattern Analysis
and Machine Intelligence, IEEE Transactions on, vol. 24, no. 5, pp. 707–711, 2002.
26. S. L. Phung and A. Bouzerdoum, “A pyramidal neural network for visual pattern recog-
nition.,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 329-43, Mar. 2007.
27. T. H. C. Fok and A. Bouzerdoum, “A Gender Recognition System using Shunting Inhibi-
tory Convolutional Neural Networks,” in The 2006 IEEE International Joint Conference
on Neural Network Proceedings, 2006, pp. 5336-5341.
28. S. Baluja and H. A. Rowley, “Boosting sex identification performance,” International
Journal of Computer Vision, vol. 71, no. 1, pp. 111–119, 2007.
29. Z. Sun, G. Bebis, X. Yuan, and S. J. Louis, “Genetic feature subset selection for gender
classification: A comparison study,” in Applications of Computer Vision, 2002.(WACV
2002). Proceedings. Sixth IEEE Workshop on, 2002, pp. 165–170.
30. M. Castrillon, O. Deniz, D. Hernandez, and A. Dominguez, “Identity and gender recogni-
tion using the encara real-time face detector,” in Conferencia de la Asociacin Espaola
para la Inteligencia Artificial, 2003, vol. 3.
31. S. Buchala, N. Davey, and T. Gale, “Principal component analysis of gender, ethnicity,
age, and identity of face images,” in Proc. IEEE ICMI, 2005.
32. L. Bui, D. Tran, X. Huang, and G. Chetty, “Face Gender Recognition Based on 2D Prin-
cipal Component Analysis and Support Vector Machine,” in Network and System Securi-
ty (NSS), 2010 4th International Conference on, 2010, pp. 579-582.
33. J. Yang, D. Zhang, A. F. Frangi, and J.-yu Yang, “Two-dimensional PCA: a new ap-
proach to appearance-based face representation and recognition.,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131-7, Jan. 2004.
34. L. Lu and P. Shi, “A novel fusion-based method for expression-invariant gender classifi-
cation,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE Interna-
tional Conference on, 2009, pp. 1065-1068.
35. A. Jain, J. Huang, and S. Fang, “Gender identification using frontal facial images,” in
Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, 2005, p. 4.
36. P. Demartines and J. Herault, “Curvilinear component analysis: a self-organizing neural
network for nonlinear mapping of data sets.,” IEEE Transactions on Neural Networks,
vol. 8, no. 1, pp. 148-54, Jan. 1997.
37. S. Buchala, N. Davey, and T. M. Gale, “Analysis of linear and nonlinear dimensionality
reduction methods for gender classification of face images,” International Journal of
Systems Science, vol. 36, no. 14, pp. 931–942, 2005.
38. P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple fea-
tures,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of
the 2001 IEEE Computer Society Conference on, 2001, vol. 1, pp. I-511-518.
39. Z. Xu, L. Lu, and P. Shi, “A hybrid approach to gender classification from face images,”
in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008, pp.
1–4.
40. T. Ojala and M. Pietikainen, “Multiresolution gray-scale and rotation invariant texture
classification with local binary patterns,” Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 24, no. 7, pp. 971-987, 2002.
41. H. C. Lian and B. L. Lu, “Multi-view gender classification using local binary patterns
and support vector machines,” Advances in Neural Networks-ISNN 2006, pp. 202–209,
2006.
42. Z. Yang and H. Ai, “Demographic classification with local binary patterns,” Advances in
Biometrics, pp. 464–473, 2007.
43. L. A. Alexandre, “Gender recognition: A multiscale decision fusion approach,” Pattern
Recognition Letters, vol. 31, no. 11, pp. 1422-1427, 2010.
44. J. Ylioinas, A. Hadid, and M. Pietikäinen, “Combining contrast information and local bi-
nary patterns for gender classification,” Image Analysis, pp. 676–686, 2011.
45. C. Shan, “Learning local binary patterns for gender classification on real-world face im-
ages,” Pattern Recognition Letters, vol. 33, no. 4, pp. 431-437, Mar. 2012.
46. B. Xia, H. Sun, and B.-liang Lu, “Multi-view Gender Classification based on Local Ga-
bor Binary Mapping Pattern and Support Vector Machines,” in Neural Networks, 2008.
IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International
Joint Conference on, 2008, pp. 3388-3395.
47. T.-X. Wu, X.-C. Lian, and B.-L. Lu, “Multi-view gender classification using symmetry
of facial images,” Neural Computing and Applications, pp. 1–9, May 2011.
48. J. Zheng and B.-liang Lu, “A support vector machine classifier with automatic confi-
dence and its application to gender classification,” Neurocomputing, vol. 74, no. 11, pp.
1926-1935, May 2011.
49. X. Fu, G. Dai, C. Wang, and L. Zhang, “Centralized Gabor gradient histogram for facial
gender recognition,” in Natural Computation (ICNC), 2010 Sixth International Confe-
rence on, 2010, vol. 4, no. Icnc, pp. 2070–2074.
50. T. Jabid, M. Hasanul Kabir, and O. Chae, “Gender Classification using Local Directional
Pattern (LDP),” in Pattern Recognition (ICPR), 2010 20th International Conference on,
2010, pp. 2162–2165.
51. A. Shobeirinejad and Y. Gao, “Gender Classification Using Interlaced Derivative Pat-
terns,” in 2010 20th International Conference on Pattern Recognition, 2010, pp. 1509-
1512.
52. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International
Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
53. R. N. Rojas-Bello, L. F. Lago-Fernandez, G. Martinez-Munoz, and M. A. Sdnchez-
Montanes, “A comparison of techniques for robust gender recognition,” in 2011 18th
IEEE International Conference on Image Processing, 2011, pp. 561-564.
54. M. Demirkus, M. Toews, J. J. Clark, and T. Arbel, “Gender classification from uncon-
strained video sequences,” in Computer Vision and Pattern Recognition Workshops
(CVPRW), 2010 IEEE Computer Society Conference on, 2010, pp. 55–62.
55. J. G. Wang, J. Li, W. Y. Yau, and E. Sung, “Boosting dense SIFT descriptors and shape
contexts of face images for gender recognition,” in Computer Vision and Pattern Recog-
nition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, 2010, pp. 96–
102.
56. J. G. Wang, J. Li, C. Y. Lee, and W. Y. Yau, “Dense SIFT and Gabor descriptors-based
face representation with applications to gender recognition,” in Control Automation Ro-
botics & Vision (ICARCV), 2010 11th International Conference on, 2010, no. December,
pp. 1860–1864.
57. T. Lee, “Image representation using 2D Gabor wavelets,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 18, no. 10, pp. 959-971, 1996.
58. W. Laurenz, J.-M. F. Fellous, N. Kruger, and C. von der Malsburg, “Face recognition
and gender determination,” in Proceedings of the International Workshop on Automatic
Face and Gesture Recognition,, 1995, pp. 92-97.
59. H. Lian, B. Lu, and E. Takikawa, “Gender recognition using a min-max modular support
vector machine,” Advances in Natural Computation, pp. 433-436, 2005.
60. S. Hosoi, E. Takikawa, and M. Kawade, “Ethnicity estimation with facial images,” in Au-
tomatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International
Conference on, 2004, pp. 195–200.
61. X. M. Leng and Y. D. Wang, “Improving generalization for gender classification,” in
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, 2008, pp.
1656–1659.
62. F. Scalzo, G. Bebis, M. Nicolescu, L. Loss, and A. Tavakkoli, “Feature fusion hierarchies
for gender classification,” in Pattern Recognition, 2008. ICPR 2008. 19th International
Conference on, 2008, no. 2, pp. 1–4.
63. M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,”
Nature Neuroscience, vol. 2, no. 11, pp. 1019-25, Nov. 1999.
64. E. Meyers and L. Wolf, “Using Biologically Inspired Features for Face Processing,” In-
ternational Journal of Computer Vision, vol. 76, no. 1, pp. 93-104, Jul. 2007.
65. P. H. Lee, J. Y. Hung, and Y. P. Hung, “Automatic Gender Recognition Using Fusion of
Facial Strips,” in Pattern Recognition (ICPR), 2010 20th International Conference on,
2010, pp. 1140-1143.
66. A. Lapedriza and M. Marin-Jimenez, “Gender recognition in non controlled environ-
ments,” in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on,
2006, vol. 3, pp. 834-837.
67. Y. Andreu, R. Mollineda, and P. Garcia-Sevilla, “Gender recognition from a partial view
of the face using local feature vectors,” Pattern Recognition and Image Analysis, pp.
481-488, 2009.
68. W. S. Chu, C. R. Huang, and C. S. Chen, “Identifying gender from unaligned facial im-
ages by set classification,” in Pattern Recognition (ICPR), 2010 20th International Con-
ference on, 2010, pp. 2636–2639.
69. M. Nazir, M. Ishtiaq, A. Batool, and M. A. Jaffar, “Feature selection for efficient gender
classification,” in Proceedings of the 11th WSEAS international conference on neural
networks and 11th WSEAS international conference on evolutionary computing and 11th
WSEAS international conference on Fuzzy systems, 2010, pp. 70–75.
70. Z. Li and X. Zhou, “Spatial gaussian mixture model for gender recognition,” in Image
Processing (ICIP), 2009 16th IEEE International Conference on, 2009, pp. 45-48.
71. P. Rai and P. Khanna, “Gender classification using Radon and Wavelet Transforms,” in
Industrial and Information Systems (ICIIS), 2010 International Conference on, 2010, pp.
448–451.
72. A. Ross and C. Chen, “Can Gender Be Predicted from Near-Infrared Face Images?,” Im-
age Analysis and Recognition, pp. 120-129, 2011.
73. K. Ueki and T. Kobayashi, “Gender Classification Based on Integration of Multiple Clas-
sifiers Using Various Features of Facial and Neck Images,” Information and Media
Technologies, vol. 3, no. 2, pp. 479-485, 2008.
74. A. C. Gallagher and T. Chen, “Understanding images of groups of people,” in 2009 IEEE
Conference on Computer Vision and Pattern Recognition, 2009, pp. 256-263.
75. S. J. D. Prince and J. Aghajanian, “Gender classification in uncontrolled settings using
additive logistic models,” in Image Processing (ICIP), 2009 16th IEEE International
Conference on, 2009, pp. 2557-2560.
76. A. M. Martinez and R. Benavante, “The AR face database,” CVC Technical Report,
1998.
77. K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “XM2VTSdb: The Extended
M2VTS Database,” in Proceedings 2nd Conference on Audio and Video-base Biometric
Personal Verification (AVBPA99), 1999.
78. P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERET evaluation methodolo-
gy for face-recognition algorithms,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 22, no. 10, pp. 1090-1104, 2000.
79. O. Jesorsky, K. Kirchberg, and R. Frischholz, “Robust Face Detection Using the Haus-
dorff Distance,” in Audio-and Video-Based Biometric Person Authentication, 2001, pp.
90-95.
80. T. Sim, S. Baker, and M. Bsat, “The CMU Pose, Illumination, and Expression (PIE) da-
tabase,” in Proceedings of Fifth IEEE International Conference on Automatic Face Ges-
ture Recognition, 2002, pp. 53-58.
81. P. Phillips, P. Flynn, and T. Scruggs, “Overview of the face recognition grand chal-
lenge,” Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer so-
ciety conference on, vol. 1, pp. 947-954, 2005.
82. P. Flynn, K. Bowyer, and P. Phillips, “Assessment of time dependency in face recogni-
tion: An initial study,” in Audio-and Video-Based Biometric Person Authentication,
2003, p. 1057.
83. K. Ricanek and T. Tesafaye, “Morph: A longitudinal image database of normal adult
age-progression,” in Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th In-
ternational Conference on, 2006, pp. 341-345.
84. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled Faces in the Wild: A
Database for Studying Face Recognition in Unconstrained Environments,” University of
Massachusetts Amherst Technical Report 07, vol. 49, no. 07-49, pp. 1-11, 2007.
85. W. Gao, B. Cao, S. Shan, and X. Chen, “The CAS-PEAL large-scale Chinese face data-
base and baseline evaluations,” Systems, Man and Cybernetics, Part A: Systems and Hu-
mans, IEEE Transactions on, vol. 38, no. 1, pp. 149-161, 2008.
86. T. Gehrig, M. Steiner, and H. K. Ekenel, “Draft: Evaluation Guidelines for Gender Clas-
sification and Age Estimation,” 2011.
87. P. Dago-Casas, D. Gonzalez-Jimenez, L. L. Yu, and J. L. Alba-Castro, “Single-and
cross-database benchmarks for gender classification under unconstrained settings,” Com-
puter Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp.
2152-2159, 2011.
88. S. Buchala, M. J. Loomes, N. Davey, and R. J. Frank, “The role of global and feature
based information in gender classification of faces: a comparison of human performance
and computational models,” International Journal of Neural Systems, vol. 15, pp. 121-
128, 2005.
89. S. Zafeiriou, A. Tefas, and I. Pitas, “Gender determination using a support vector ma-
chine variant,” in 16th European Signal Processing Conference (EUSIPCO-2008), Lau-
sanne, Switzerland, 2008, no. Eusipco, pp. 2-6.
90. “BeFIT - Benchmarking Facial Image Analysis Technologies.” [Online]. Available:
http://fipa.cs.kit.edu/412.php. [Accessed: 15-Mar-2012].
91. V. Bruce, T. Valentine, and A. Baddeley, “The basis of the 3/4 view advantage in face
recognition,” Applied Cognitive Psychology, vol. 1, no. 2, pp. 109-120, 1987.
92. A. J. O’Toole, T. Vetter, N. F. Troje, and H. H. Bulthoff, “Sex classification is better
with three-dimensional head structure than with image intensity information,” Percep-
tion, vol. 26, p. 75, 1997.
93. X. Han, H. Ugail, and I. Palmer, “Gender classification based on 3D face geometry fea-
tures using SVM,” in CyberWorlds, 2009. CW’09. International Conference on, 2009,
pp. 114–118.
94. G. Toderici, S. M. O’Malley, G. Passalis, T. Theoharis, and I. A. Kakadiaris, “Ethnicity-
and Gender-based Subject Retrieval Using 3-D Face-Recognition Techniques,” Interna-
tional Journal of Computer Vision, vol. 89, no. 2-3, pp. 382-391, Apr. 2010.
95. X. Lu, H. Chen, and A. Jain, “Multimodal facial gender and ethnicity identification,” Ad-
vances in Biometrics, pp. 554–561, 2005.
96. U. Tariq, Y. Hu, and T. S. Huang, “Gender and ethnicity identification from silhouetted
face profiles,” in Image Processing (ICIP), 2009 16th IEEE International Conference on,
2009, pp. 2441-2444.
97. W. Yang, A. Sethuram, E. Patternson, K. Ricanek, and C. Sun, “Gender Classification
Using the Profile,” Advances in Neural Networks–ISNN 2011, pp. 288–295, 2011.
98. H. Shen, L. Ma, and Q. Zhang, “Gender categorization based on 3D faces,” in 2010 2nd
International Conference on Advanced Computer Control, 2010, pp. 617-620.
99. Y. Hu, J. Yan, and P. Shi, “A fusion-based method for 3d facial gender classification,” in
Computer and Automation Engineering (ICCAE), 2010 The 2nd International Confe-
rence on, 2010, vol. 5, pp. 369–372.
100. J. Wu, W. A. P. Smith, and E. R. Hancock, “Gender discriminating models from facial
surface normals,” Pattern Recognition, vol. 44, no. 12, pp. 2871-2886, Dec. 2011.
101. J. Wu, W. Smith, and E. Hancock, “Gender classification based on facial surface nor-
mals,” in 2008 19th International Conference on Pattern Recognition, 2008, pp. 1-4.
102. J. E. Boyd and J. J. Little, “Biometric Gait Recognition,” Biometrics, pp. 19-42, 2005.
103. V. Boulgouris, D. Hatzinakos, and K. N. Plataniotis, “Gait Recognition: A challenging
signal processing technology for biometric identification,” IEEE Signal Processing Mag-
azine, no. November, pp. 78-90, 2005.
104. M. Hu, Y. Wang, Z. Zhang, and Y. Wang, “Combining Spatial and Temporal Informa-
tion for Gait Based Gender Classification,” in 2010 20th International Conference on
Pattern Recognition, 2010, pp. 3679-3682.
105. Y. Makihara, H. Mannami, and Y. Yagi, “Gait analysis of gender and age using a large-
scale multi-view gait database,” Computer Vision - ACCV 2010, pp. 440-451, 2011.
106. J. Lu and Y.-P. Tan, “Uncorrelated discriminant simplex analysis for view-invariant gait
signal computing,” Pattern Recognition Letters, vol. 31, no. 5, pp. 382-393, Apr. 2010.
107. P.-C. Chang, M.-C. Tien, J.-L. Wu, and C.-S. Hu, “Real-time Gender Classification from
Human Gait for Arbitrary View Angles,” in 2009 11th IEEE International Symposium on
Multimedia, 2009, pp. 88-95.
108. C. Y. Chang and T. H. Wu, “Using gait information for gender recognition,” in Intelli-
gent Systems Design and Applications (ISDA), 2010 10th International Conference on,
2010, pp. 1388–1393.
109. L. Chen, Y. Wang, and Y. Wang, “Gender Classification Based on Fusion of Weighted
Multi-View Gait Component Distance,” in 2009 Chinese Conference on Pattern Recog-
nition, 2009, pp. 1-5.
110. D. Zhang and Y. Wang, “Investigating the separability of features from different views
for gait based gender classification,” in Pattern Recognition, 2008. ICPR 2008. 19th In-
ternational Conference on, 2008, pp. 3-6.
111. G. Huang and Y. Wang, “Gender classification based on fusion of multi-view gait se-
quences,” in Computer Vision – ACCV 2007, 2007, pp. 462–471.
112. G. Johansson, “Visual motion perception.,” Scientific American, vol. 232, no. 6, p. 76,
1975.
113. J. E. Cutting and L. T. Kozlowski, “Recognizing friends by their walk: Gait perception
without familiarity cues,” Bulletin of the Psychonomic Society, vol. 9, no. 5, pp. 353-356,
1977.
114. L. T. Kozlowski and J. E. Cutting, “Recognizing the sex of a walker from a dynamic
point-light display,” Attention, Perception, & Psychophysics, vol. 21, no. 6, pp. 575-580,
1977.
115. J. W. Davis and H. Gao, “An expressive three-mode principal components model for
gender recognition,” Journal of Vision, vol. 4, no. 5, pp. 362-377, 2004.
116. M. S. Nixon and J. N. Carter, “Automatic Recognition by Gait,” Proceedings of the
IEEE, vol. 94, no. 11, pp. 2013-2024, Nov. 2006.
117. S. Yu, T. Tan, K. Huang, K. Jia, and X. Wu, “A study on gait-based gender classifica-
tion.,” IEEE Transactions on Image Processing, vol. 18, no. 8, pp. 1905-10, Aug. 2009.
118. M. Hu, Y. Wang, Z. Zhang, and D. Zhang, “Gait-based gender classification using mixed
conditional random field.,” IEEE Transactions on Systems, Man, and Cybernetics. Part
B, Cybernetics, vol. 41, no. 5, pp. 1429-39, Oct. 2011.
119. J. H. Yoo, D. Hwang, and M. Nixon, “Gender classification in human gait using support
vector machine,” in Advanced Concepts for Intelligent Vision Systems, 2005, pp. 138–
145.
120. L. Lee and W. Grimson, “Gait analysis for recognition and classification,” Proceedings
of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp.
155-162, 2002.
121. R. Martin-Felez, R. A. Mollineda, and J. S. Sanchez, “Towards a More Realistic Appear-
ance-Based Gait Representation for Gender Recognition,” in 2010 20th International
Conference on Pattern Recognition, 2010, pp. 3810-3813.
122. Y. Liu, R. Collins, and Y. Tsin, “Gait sequence analysis using frieze patterns,” Computer
Vision—ECCV 2002, pp. 733-736, Apr. 2006.
123. C. Shan, S. Gong, and P. W. McOwan, “Fusing gait and face cues for human gender rec-
ognition,” Neurocomputing, vol. 71, no. 10-12, pp. 1931-1938, Jun. 2008.
124. J. Han and B. Bhanu, “Individual recognition using gait energy image.,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 316-22, Feb.
2006.
125. Z. Liu and S. Sarkar, “Simplest Representation Yet for Gait Recognition: Averaged Sil-
houette,” Pattern Recognition, no. 130768.
126. X. Li, S. J. Maybank, S. Yan, D. Tao, and D. Xu, “Gait Components and Their Applica-
tion to Gender Recognition,” IEEE Transactions on Systems, Man, and Cybernetics, Part
C (Applications and Reviews), vol. 38, no. 2, pp. 145-155, Mar. 2008.
127. M. Hu and Y. Wang, “A New Approach for Gender Classification Based on Gait Analy-
sis,” in 2009 Fifth International Conference on Image and Graphics, 2009, pp. 869-874.
128. L. Chen, Y. Wang, Y. Wang, and D. Zhang, “Gender Recognition from Gait Using Ra-
don Transform and Relevant Component Analysis,” in Emerging Intelligent Computing
Technology and Applications: 5th International Conference on Intelligent Computing,
ICIC 2009 Ulsan, South Korea, September 16-19, 2009 Proceedings, 2009, pp. 92-101.
129. F. B. Oskuie and K. Faez, “Gender Classification Using a Novel Gait Template: Radon
Transform of Mean Gait Energy Image,” in Proceedings of the 8th International Confe-
rence on Image Analysis and Recognition-Volume Part II, 2011, pp. 161-169.
130. X.-tao Chen, Z.-hui Fan, H. Wang, and Z.-qing Li, “Automatic Gait Recognition Using
Kernel Principal Component Analysis,” Science And Technology, 2010.
131. S. Handri, S. Nomura, and K. Nakamura, “Determination of Age and Gender Based on
Features of Human Motion Using AdaBoost Algorithms,” International Journal of So-
cial Robotics, vol. 3, no. 3, pp. 233-241, Jan. 2011.
132. D. Zhang and Y. H. Wang, “Gender recognition based on fusion on face and gait infor-
mation,” in Machine Learning and Cybernetics, 2008 International Conference on, 2008,
vol. 1, no. July, pp. 62–67.
133. J. D. Shutler, M. G. Grant, M. S. Nixon, and J. N. Carter, “On a large sequence-based
human gait database,” in Proceedings of the 4th International Conference on Recent Ad-
vances in Soft Computing,, 2002, pp. 66-71.
134. S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer, “The huma-
nID gait challenge problem: data sets, performance, and analysis.,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162-77, Feb. 2005.
135. S. Yu, D. Tan, and T. Tan, “A Framework for Evaluating the Effect of View Angle,
Clothing and Carrying Condition on Gait Recognition,” in 18th International Conference
on Pattern Recognition (ICPR’06), 2006, pp. 441-444.
136. J. Lu and Y.-P. Tan, “Gait-based human age estimation,” in 2010 IEEE International
Conference on Acoustics, Speech and Signal Processing, 2010, vol. 5, no. 4, pp. 1718-
1721.
137. G. Guo, G. Mu, and Y. Foo, “Gender from body: A biologically-inspired approach with
manifold learning,” in Proceedings of the 9th Asian Conference on Computer Vision-
Volume Part III, 2009, no. 1, pp. 236–245.
138. L. Bourdev, S. Maji, and J. Malik, “Describing People: A Poselet-Based Approach to
Attribute Classification,” in Computer Vision (ICCV), 2011 IEEE International Confe-
rence on, 2011, pp. 1543-1550.
139. L. Cao, M. Dikmen, Y. Fu, and T. S. Huang, “Gender recognition from body,” in Pro-
ceeding of the 16th ACM International Conference on Multimedia, 2008, pp. 725–728.
140. M. Collins, J. Zhang, and P. Miller, “Full body image feature representations for gender
profiling,” in 2009 IEEE 12th International Conference on Computer Vision Workshops,
ICCV Workshops, 2009, pp. 1235-1242.
141. A. Bosch and A. Zisserman, “Representing shape with a spatial pyramid kernel,” in Pro-
ceedings of the 6th ACM international conference on Image and video retrieval, vol.
2007, pp. 401-408.
142. S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid
Matching for Recognizing Natural Scene Categories,” in 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06), vol. 2,
pp. 2169-2178.
143. L. Bourdev and J. Malik, “Poselets: Body part detectors trained using 3D human pose
annotations,” in Computer Vision, 2009 IEEE 12th International Conference on, 2009,
pp. 1365–1372.
144. M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio, “Pedestrian detection us-
ing wavelet templates,” in Proceedings of IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, 1997, pp. 193-199.
145. D. Gray, S. Brennan, and H. Tao, “Evaluating appearance models for recognition, reac-
quisition, and tracking,” in Performance Evaluation of Tracking and Surveillance
(PETS). IEEE International Workshop on,, 2007.
146. M. Everingham, L. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Vis-
ual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol.
88, no. 2, pp. 303-338, Sep. 2009.
147. A. Balan and M. Black, “The naked truth: Estimating body shape under clothing,” in Eu-
ropean Conf. on Computer Vision, 2008, pp. 15–29.
148. P. Guan, O. Freifeld, and M. Black, “A 2d human body model dressed in eigen clothing,”
Computer Vision–ECCV 2010, pp. 285–298, 2010.
149. P. Guan, A. Weiss, A. O. Balan, and M. J. Black, “Estimating human shape and pose
from a single image,” in Computer Vision, 2009 IEEE 12th International Conference on,
2009, pp. 1381–1388.
150. S. Wuhrer and M. Rioux, “Posture invariant gender classification for 3D human models,”
in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion Workshops, 2009, pp. 33-38.
151. J. Tang, X. Liu, H. Cheng, and K. M. Robinette, “Gender Recognition Using 3-D Human
Body Shapes,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applica-
tions and Reviews), vol. 41, no. 6, pp. 898-908, Nov. 2011.
... Several computer vision researchers have conducted various studies to address the image-based facial representation such as gender [16,18,33], age [14,16,18,36], and ethnicity [13,33]. More specifically, some research work has tried to look into other races through traditional methods [15,35]. ...
... Several computer vision researchers have conducted various studies to address the image-based facial representation such as gender [16,18,33], age [14,16,18,36], and ethnicity [13,33]. More specifically, some research work has tried to look into other races through traditional methods [15,35]. ...
Article
Full-text available
Although significant advances have been made recently in the field of ethnics recognition through face recognition, there is still a lack of studies of ethnics recognition through facial recognition. This study is concerned with ethnics recognition through facial representation using a few images used as samples for any selected group of ethnics using a deep neural network with a Variational Feature Learning (VFL) loss function that has been used to increase the performance accuracy during the evaluation process. The output of a deep neural network is an embedding of 128 bytes for each face image in each group of ethnics. After that, all embeddings of every face in each group of ethnics pass to a machine learning classification method like a Support Vector Machine (SVM). We achieved state-of-the-art ethnic recognition. The system achieved a classification accuracy of 97.3% on a collected group of image dataset collected from three different countries.
... Therefore, the research of using computer vision technology to automatically identify the gender from an individual's face has We greatly acknowledge the financial supports by the Natural Science Foundation of Chongqing (cstc2021jcyj-msxmX1068), the Xi'an Science and Technology Program (No. 21RGSF0011) and the Fundamental Research Funds for the Central Universities (No. QTZX23107). gained great attention in the past decades [19]. Many breakingthrough methods have emerged and great achievements have been made, leading to successful real-world applications in many social and economic aspects, public security, demographical research, personalized advertising, criminal investigation, content indexing, to name a few. ...
Conference Paper
Full-text available
The gender is an important soft biometric trait of a person, upon which many social and economic functionalities depend or rely. Automatic recognition of the gender of individuals is crucial to many real-world applications, such as public video surveillance, gender-specific access and services, human-computer interaction, demographical studies. Up until now, most research works on gender recognition focus on the visible light, leaving the problem of infrared-based gender recognition scarcely studied. Thus, this research work studies IR facial images-based gender recognition, using deep learning models. We design a CNN model on top of ResNet using a multiscale technique (named as IR-GenderNet). Then, we compare the performance of gender recognition between the visible light and IR bands. We also compare the performance of IR-GenderNet against other methods in the literature that are either traditional or deep learning-based. Experimental results demonstrate that gender recognition using IR is not just feasible but even more beneficial than visible light. It is also proven that the proposed model of IR-GenderNet performs better than all other methods for the problem of IR-based gender recognition. Lastly, cross-dataset experiments also justify the generalization ability of IR-based gender recognition using the proposed algorithm.
... Today, most purchases are made online and by automatically recognizing the gender of an image, the system can automatically select and recommend products that are of interest to the customers, like music recommender systems which are able to automatically detect the user's musical preferences and create a playlist [1]. Existing overview-articles for algorithms related B Mohammadreza Sheikh Fathollahi mr.fathollahi73@gmail.com 1 Department of Electrical and Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran to gender estimation include the works of Ng et al. [2], Khan et al. [3], and Bekios-Calfa et al. [4]. ...
Article
Full-text available
Nowadays gender classification which plays a vital role in face recognition systems is one of the main matters in computer vision. It is difficult to classify the gender from facial images when dealing with unconstrained images in a cross-dataset protocol. In this work, we propose two convolutional neural networks where one of the networks used the central difference convolution layer and another network used the vanilla convolution layer. The system was trained with the Casia WebFace dataset and tested on two cross-datasets, labeled faces in the wild (LFW) and FEI dataset. It is worth mentioning that the experimental results show the power and effectiveness of the proposed method. This method obtains a classification rate of 97.79% for the LFW dataset and 99.10% for the FEI dataset.
... This is a way of automatically creating a set of attributes without having to exhaustively hand-label attributes on a large dataset. Prior to this, there were decades of research on gender and age recognition from face images (Fu, Guo, and Huang 2010) (Ng, Tay, and Goi 2012). ...
Article
Attributes, or mid-level semantic features, have gained popularity in the past few years in domains ranging from activity recognition to face verification. Improving the accuracy of attribute classifiers is an important first step in any application which uses these attributes. In most works to date, attributes have been considered independent of each other. However, attributes can be strongly related, such as heavy makeup and wearing lipstick as well as male and goatee and many others. We propose a multi-task deep convolutional neural network (MCNN) with an auxiliary network at the top (AUX) which takes advantage of attribute relationships for improved classification. We call our final network MCNN-AUX. MCNN-AUX uses attribute relationships in three ways: by sharing the lowest layers for all attributes, by sharing the higher layers for spatially-related attributes, and by feeding the attribute scores from MCNN into the AUX network to find score-level relationships. Using MCNN-AUX rather than individual attribute classifiers, we are able to reduce the number of parameters in the network from 64 million to fewer than 16 million and reduce the training time by a factor of 16. We demonstrate the effectiveness of our method by producing results on two challenging publicly available datasets achieving state-of-the-art performance on many attributes.
... To solve the problem of correspondence so that they can analyze images that share common features or objects, researchers in the field of remote sensing develop local key points, descriptions of their features, and correspondence relating to the local points [1,2]. The problem of correspondence is highly relevant in that it can be related to the registration or stitching of images [3,4], object recognition [5][6][7], object tracking [8], stereo vision [9], and so on. ...
Article
Full-text available
When using drone-based aerial images for panoramic image generation, the unstableness of the shooting angle often deteriorates the quality of the resulting image. To prevent these polluting effects from affecting the stitching process, this study proposes deep learning-based outlier rejection schemes that apply the architecture of the generative adversarial network (GAN) to reduce the falsely estimated hypothesis relating to a transform produced by a given baseline method, such as the random sample consensus method (RANSAC). To organize the training dataset, we obtain rigid transforms to resample the images via the operation of RANSAC for the correspondences produced by the scale-invariant feature transform descriptors. In the proposed method, the dis-criminator of GAN makes a pre-judgment of whether the estimated target hypothesis sample produced by RANSAC is true or false, and it recalls the generator to confirm the authenticity of the discriminator’s inference by comparing the differences between the generated samples and the target sample. We have tested the proposed method for drone-based aerial images and some mis-cellaneous images. The proposed method has been shown to have relatively stable and good per-formances even in receiver-operated tough conditions.
... While it is not yet clear exactly which features of human movement allow for such information to be decoded, computational analysis of complex movement offers a way to explore how information about individual differences are encoded in subtle ways (e.g., how different joints move in relation to another in a particular dimension) that make it difficult to identify with the naked eye. Computational analysis of gait has been used to identify individuals 7 , to classify walkers according to gender 8,9 , and to identify individual differences of personality 10,11 and emotion 12 . ...
Article
Full-text available
Movement is a universal response to music, with dance often taking place in social settings. Although previous work has suggested that socially relevant information, such as personality and gender, are encoded in dance movement, the generalizability of previous work is limited. The current study aims to decode dancers’ gender, personality traits, and music preference from music-induced movements. We propose a method that predicts such individual difference from free dance movements, and demonstrate the robustness of the proposed method by using two data sets collected using different musical stimuli. In addition, we introduce a novel measure to explore the relative importance of different joints in predicting individual differences. Results demonstrated near perfect classification of gender, and notably high prediction of personality and music preferences. Furthermore, learned models demonstrated generalizability across datasets highlighting the importance of certain joints in intrinsic movement patterns specific to individual differences. Results further support theories of embodied music cognition and the role of bodily movement in musical experiences by demonstrating the influence of gender, personality, and music preferences on embodied responses to heard music.
Chapter
This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database. KeywordsSignal processingGender recognitionAcoustic featuresMel-Frequency Cepstral CoefficientsZero-crossing rateSupport Vector Machine
Conference Paper
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
The localization of human faces in digital images is a fundamental step in the process of face recognition. This paper presents a shape comparison approach to achieve fast, accurate face detection that is robust to changes in illumination and background. The proposed method is edge-based and works on grayscale still images. The Hausdorff distance is used as a similarity measure between a general face model and possible instances of the object within the image. The paper describes an efficient implementation, making this approach suitable for real-time applications. A two-step process that allows both coarse detection and exact localization of faces is presented. Experiments were performed on a large test set base and rated with a new validation measurement.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Conference Paper
The localization of human faces in digital images is a fundamental step in the process of face recognition. This paper presents a shape comparison approach to achieve fast, accurate face detection that is robust to changes in illumination and background. The proposed method is edge-based and works on grayscale still images. The Hausdorff distance is used as a similarity measure between a general face model and possible instances of the object within the image. The paper describes an efficient implementation, making this approach suitable for real-time applications. A two-step process that allows both coarse detection and exact localization of faces is presented. Experiments were performed on a large test set base and rated with a new validation measurement.
Article
Vision-based human motion analysis is a very active research area in recent years. This paper presents a gait recognition method that based on kernel two-dimensional principal component analysis (K2D-PCA), and establishes the feature image of gait sequence. Experience is done in CASIA database and results show that the method has a high recognition rate and robust to the noise and missing data.
Conference Paper
This paper presents progress toward an integrated, robust, real-time face detection and demographic analysis system. Faces are detected and extracted using the fast algorithm proposed by P. Viola and M.J. Jones (2001). Detected faces are passed to a demographic (gender and ethnicity) classifier which uses the same architecture as the face detector. This demographic classifier is extremely fast, and delivers error rates slightly better than the best-known classifiers. To counter the unconstrained and noisy sensing environment, demographic information is integrated across time for each individual. Therefore, the final demographic classification combines estimates from many facial detections in order to reduce the error rate. The entire system processes 10 frames per second on an 800-MHz Intel Pentium III