Article

Facial Age Estimation by Adaptive Label Distribution Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Lack of sufficient and complete training data is one of the most prominent challenges in the problem of facial age estimation. Due to appearance similarity of the faces at close ages, the face images at the neighboring ages may be utilized while learning a particular age. As a result, the training images for each age are boosted without actually increase the total number of training images. This is achieved by assigning a label distribution instead of a single label of the chronological age to each face image. The label distribution should accord with the tendency of facial aging, which might be significantly different at different ages, e.g., the facial appearance during childhood and senior age generally changes faster than that during middle age. In this paper, two adaptive label distribution learning (ALDL) algorithms, IIS-ALDL and BFGS-ALDL, are proposed to automatically learn the label distributions adapted to different ages. Experimental results show that the ALDL algorithms perform remarkably better than the compared state-of-the-art algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... MLL assigns training instances with a set of labels rather than a single label to solve the problem of label ambiguity. Geng et al. [5,21] labeled each facial image with multiple age labels followed a label distribution, showing the advantages dramatically over the methods with single-label. ...
... For MLL problem, each image would be allowed to be labeled by multiple labels. Following the works [5,21], each face image is labeled by an age label distribution that is a set of possibilities which represent the description degrees corresponding to each label. One face image x includes some possible discrete age labels L = {l 1 , ..., l k } in our aging model. ...
... One face image x includes some possible discrete age labels L = {l 1 , ..., l k } in our aging model. Let P (l c , l i ) denotes the probability of the possible label l i corresponding to the chronological age l c , where There are some works [5,21] that allow each face image to be labeled by a Gaussian distribution for age estimation. Following those works, We also model the aging problem with the Gaussian distribution which is used to calculate the integral of an age interval. ...
Conference Paper
In this paper, we propose a novel approach based on a single convolutional neural network (CNN) for age estimation. In our proposed network architecture, we first model the randomness of aging with the Gaussian distribution which is used to calculate the Gaussian integral of an age interval. Then, we present a soft softmax regression function used in the network. The new function applies the aging modeling to compute the function loss. Compared with the traditional softmax function, the new function considers not only the chronological age but also the interval nearby true age. Moreover, owing to the complex of Gaussian integral in soft softmax function, a look up table is built to accelerate this process. All the integrals of age values are calculated offline in advance. We evaluate our method on two public datasets: MORPH II and Cross-Age Celebrity Dataset (CACD), and experimental results have shown that the proposed method has gained superior performances compared to the state of the art.
... The larger the value is, the more important the label. In past research, label distribution learning has been applied in facial age estimation [7,8], head pose estimation [9], facial expression recognition [10], text sentiment classification [11], and so on. ...
... The other parameter is the inclusion threshold , which is set as 0.55 in our experiments. The label ratio is selected [21,10,19,3,4,6,8,16,14,5,23,20,12,2,1,22,9,18,7,15] Yeast_spo [7,15,3,5,14,1,16,23,11,10,9,21,8,20,4,6,2,12,22] Yeast_spo5 [7,15,4,3,5,9,10,16,1,19,12,2,6,14,22,21,8,20,17,18,23] Yeast_spoem [7,15,3,5,14,1,23,16,8,12,9,20,10,21,22,11 [24,22,34,12,31,19,18,3,32,1,14,4,7,15,33,26,21,9,20,13,2,27,8,29,25] Canberra Chebyshev Clark Kullback_Leibler Cosine Intersection as 0.1. All algorithms are run on a personal computer with Windows 10 and Core(TM) I5-4590S CPU 3.0 GHz, and 8 GB memory. ...
... The other parameter is the inclusion threshold , which is set as 0.55 in our experiments. The label ratio is selected [21,10,19,3,4,6,8,16,14,5,23,20,12,2,1,22,9,18,7,15] Yeast_spo [7,15,3,5,14,1,16,23,11,10,9,21,8,20,4,6,2,12,22] Yeast_spo5 [7,15,4,3,5,9,10,16,1,19,12,2,6,14,22,21,8,20,17,18,23] Yeast_spoem [7,15,3,5,14,1,23,16,8,12,9,20,10,21,22,11 [24,22,34,12,31,19,18,3,32,1,14,4,7,15,33,26,21,9,20,13,2,27,8,29,25] Canberra Chebyshev Clark Kullback_Leibler Cosine Intersection as 0.1. All algorithms are run on a personal computer with Windows 10 and Core(TM) I5-4590S CPU 3.0 GHz, and 8 GB memory. ...
Article
Full-text available
Label distribution learning, as a new learning paradigm under the machine learning framework, is widely applied to address label ambiguity. However, most existing label distribution learning methods require complete supervised information, which is obtained through costly and laborious efforts to label the data. In reality, the annotation information may be incomplete and traditional methods cannot directly deal with the incomplete data. Hence, a new theoretical framework is proposed to handle the limited labeled data, which is called the local rough set. In addition, label distribution learning also experiences the “curse of dimensionality” problem, and it is essential to adopt some pre-processing methods, such as feature selection, to reduce the data dimensionality. Nevertheless, few feature selection algorithms are designed for handling label distribution data. Motivated by this, a model based on local rough set and neighborhood granularity, which can effectively and efficiently work with incompletely labeled data, is introduced in this paper. Furthermore, a local rough set-based incomplete label distribution feature selection algorithm is proposed to reduce the data dimensionality. Experimental results on 12 real-world label distribution datasets indicate that the proposed method outperforms the global rough set in computational efficiency and achieves better classification performance than the other five methods.
... Thus classification doesn't coincide with subjective perception problem. Recently, Label Distribution Learning (LDL) algorithms were proposed [5]- [10] successively. In LDL, each label is associated with a description degree value which describes how likely the instance belongs to the label and all the values on each label form a distribution of an instance. ...
... AA-BP [9] used a three layers back-propagation (BP) neural network, the number of input units was the dimension of feature vector and each last layer unit outputted a description degree of each label. For 'Specialized Algorithm', SA-IIS algorithm was designed for facial age estimation [9], label distributions were generated from single label by the Gaussian distribution, it optimized the Kullback-Leibler loss function of maximum entropy model to predict distribution, SA-BFGS [10] was based on SA-IIS and improved the optimization method. Condition Probability Neural Network (CPNN) [9] was based on a three layers neural networks, both features and labels were input into the neural networks, it outputted probabilities of each label. ...
... We compared our proposed algorithm with several state-ofthe-art LDL algorithms, including PT-Bayes [5], PT-SVM [5], AA-kNN [8], AA-BP [9], SA-IIS [9], SA-BFGS [10] and CNNR [20]. Features are from VGGnet and reduced to 300-dimension.The experimental results are shown in Table Ⅲ. ...
Article
Full-text available
Image emotion analysis attracts considerable attention with the increasing demanding of opinion mining in social networks. Emotion evoked by an image is always ambiguous for emotion’s subjectivity. Different from previous researches on image emotion classification, Label Distribution Learning framework which assigns a set of labels with degree value to an instance, describes emotions more explicitly. However, in our study, we find that some labels have co-occurrence relation with others and all the labels together appear some structural forms. To make use of these relations as complementary information to the holistic distribution of labels, we analysis the correlations among emotion labels and then propose a method based on Structural Learning framework, which learns a mapping from images to the distribution labels with correlations. On the other hand, images usually contain some emotion-unrelated contents, to extract features that can represent image emotion at utmost, we propose a cropping method to select the emotional region from the images with the help of Fully Convolutional Networks. Extensive experiments on two widely used datasets show the advantages of our methods.
... A growing body of work has used this approach, e.g., to predict beauty in images [31] and rate movies [14], leading to more nuanced prediction results and insights. There is also evidence that, even in situations where ground truth exists but is difficult to obtain, predicting label distributions is more informative and accurate than aggregating the opinions of multiple labelers into a single (set of) discrete choice(s) [16]. ...
... Geng pioneered the systematic study of label distribution learning [13], where the objects to be predicted are probabilities of labels/classes. He and colleagues studied applications of LDL in many settings, some of which are related to predicting population-level distributions [14,16,31] while others are not [12,15]. Several of these studies acknowledge the difficulty of obtaining valid label distributions that represent the underlying beliefs of human annotators; in fact, most of them were based on data and labels originally collected for the purpose of conventional (i.e., non-probabilistic labels) supervised learning problems. ...
Conference Paper
Machine learning problems are often subjective or ambiguous. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. In supervised learning, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus can preserve the diversity that conventional learning hides or ignores. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. Our results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level.
... Distribution learning is a learning method proposed to solve the problem of label ambiguity [10], which has been utilized in a number of recognition tasks, such as head pose estimation [12,8], and age estimation [41,20]. Geng et al. [13,11] proposed two adaptive label distribution learning (ALDL) algorithms, i.e. IIS-ALDL and BFGS-ALDL, to iteratively learn the estimation function parameters and the label distributions variance. ...
... We select images of several persons at different ages with their adapted variances in Fig. 4(a). As [11] mentioned, the age variances of younger or older people tend to be smaller than those of middle age. And the variances vary between people in the same age group. ...
Chapter
Full-text available
Estimating age from a single facial image is a classic and challenging topic in computer vision. One of its most intractable issues is label ambiguity, i.e., face images from adjacent age of the same person are often indistinguishable. Some existing methods adopt distribution learning to tackle this issue by exploiting the semantic correlation between age labels. Actually, most of them set a fixed value to the variance of Gaussian label distribution for all the images. However, the variance is closely related to the correlation between adjacent ages and should vary across ages and identities. To model a sample-specific variance, in this paper, we propose an adaptive variance based distribution learning (AVDL) method for facial age estimation. AVDL introduces the data-driven optimization framework, meta-learning, to achieve this. Specifically, AVDL performs a meta gradient descent step on the variable (i.e. variance) to minimize the loss on a clean unbiased validation set. By adaptively learning proper variance for each sample, our method can approximate the true age probability distribution more effectively. Extensive experiments on FG-NET and MORPH II datasets show the superiority of our proposed approach to the existing state-of-the-art methods.
... But this method ignored the texture information of images. Label distribution is applied to age estimation in [11] and [12]. The learning method of adaptive label distribution makes a face image contribute to not only the learning of its chronological age, but also the learning of its neighboring ages. ...
... KNN algorithm is used to calculate the distance between the testing sample and each age class. The calculation method is defined as follows: å (11) Where r N denotes the number of the r-th class training sample in a S . The distance between the testing sample and r N training samples is sorted in ascending order. ...
Conference Paper
Full-text available
Feature extraction and estimation method are two key components of age estimation. This paper proposes a novel age estimation method based on Multi Levels Gaussian Mixture Model (ML-GMM) and double layers estimation model. In the feature extraction phase, ML-GMM is used to construct different GMMs for different level features, which can well reflect the global and local age feature of facial images. In the estimation phase, double layers estimation model based on SVM-KNN is proposed. The first layer roughly divides age groups by using SVM. The second layer adopts KNN theory to find K images of consecutive age which have minimum sum of distance with the testing sample. The specific age is obtained by weighting these K age values. This paper performs a lot of experiments on mixed age database of FG-NET and MORPH-II databases. The mean absolute errors of age estimation are 3.22 years. Experimental results show that the proposed method is more effective than other methods of state-of-the-art for age estimation of facial images. It can extract more rich and complete age feature, improve the generalization ability of age estimation and reduce the mean absolute errors.
... At present, the parameter model for LDL is mainly based on Kullback-Leibler divergence [12]. Different models can be used to train the parameters, such as maximum entropy [13] or logistic regression [14], although there is no particular evidence to support their use. To some extent, the LDL process ignores the linear relationship between the features and the label distribution. ...
... The continued efforts of researchers have led to various LDL algorithms being proposed [9,13,15]. There are three main design strategies in the literature [10]: problem transformation, algorithm adaptation, and specialized algorithm design. ...
Article
Full-text available
Multilabel learning that focuses on an instance of the corresponding related or unrelated label can solve many ambiguity problems. Label distribution learning (LDL) reflects the importance of the related label to an instance and offers a more general learning framework than multilabel learning. However, the current LDL algorithms ignore the linear relationship between the distribution of labels and the feature. In this paper, we propose a regularized sample self-representation (RSSR) approach for LDL. First, the label distribution problem is formalized by sample self-representation, whereby each label distribution can be represented as a linear combination of its relevant features. Second, the LDL problem is solved by L2 -norm least-squares and L2,1 -norm least-squares methods to reduce the effects of outliers and overfitting. The corresponding algorithms are named RSSR-LDL2 and RSSR-LDL21. Third, the proposed algorithms are compared with four state-of-the-art LDL algorithms using 12 public datasets and five evaluation metrics. The results demonstrate that the proposed algorithms can effectively identify the predictive label distribution and exhibit good performance in terms of distance and similarity evaluations.
... LDL problems arise in various domains including biology [3], sociology [43,44], security [6], image processing [33,45,46] or multimedia [5]. It appears, for instance, in gene expression domain [3], e.g. ...
... In the case of LDL formulation of this task, a distribution of ratings can be modeled. Other real-life applications of LDL include head pose estimation [45,48], facial landmark detection [49], crowd counting in public video surveillance [6] or facial age estimation [44,50]. Furthermore, multilabel classification tasks can be considered as special cases of LDL, so all kinds of problems defined as multilabel learning tasks can, in principle, be transformed to a more general LDL framework. ...
Article
Full-text available
Label Distribution Learning (LDL) is a new learning paradigm with numerous applications in various domains. It is a generalization of both standard multiclass classification and multilabel classification. Instead of a binary value, in LDL, each label is assigned a real number which corresponds to a degree of membership of the object being classified to a given class. In this paper a new neural network approach to Label Distribution Learning (Duo-LDL), which considers pairwise class dependencies, is introduced. The method is extensively tested on 15 well-established benchmark sets, against 6 evaluation measures, proving its competitiveness to state-of-the-art non-neural LDL approaches. Additional experimental results on artificially generated data demonstrate that Duo-LDL is especially effective in the case of most challenging benchmarks, with extensive input feature representations and numerous output classes.
... After the concept of label distribution learning was proposed, some literatures have proposed some effective label enhancement methods. e existing label enhancement algorithms are roughly divided into three types [22], i.e., fuzzy theory-based method [31,32], graph model-based method [33,34], and prior knowledge-based method [35,36]. We give a brief introduction to these 3 kinds of label enhancement methods as follows. ...
... In the application of facial age estimation, the lack of facial images with definite age labels makes traditional age prediction algorithms inefficient. Based on the prior knowledge of the facial similarity between adjacent human ages, Geng et al. [36] recovered label distribution from ground-truth age and its adjacent ages and proposed an adaptive label distribution learning model to learn the human age ambiguity. Furthermore, Zhang et al. [37] presented a developed prior assumption of facial age correlation, which limits age label distribution that only covers a reasonable number of neighboring ages. ...
Article
Full-text available
Emotion Distribution Learning (EDL) is a recently proposed multiemotion analysis paradigm, which identifies basic emotions with different degrees of expression in a sentence. Different from traditional methods, EDL quantitatively models the expression degree of the corresponding emotion on the given instance in an emotion distribution. However, emotion labels are crisp in most existing emotion datasets. To utilize traditional emotion datasets in EDL, label enhancement aims to convert logical emotion labels into emotion distributions. This paper proposed a novel label enhancement method, called Emotion Wheel and Lexicon-based emotion distribution Label Enhancement (EWLLE), utilizing the affective words’ linguistic emotional information and the psychological knowledge of Plutchik’s emotion wheel. The EWLLE method generates separate discrete Gaussian distributions for the emotion label of sentence and the emotion labels of sentiment words based on the psychological emotion distance and combines the two types of information into a unified emotion distribution by superposition of the distributions. The extensive experiments on 4 commonly used text emotion datasets showed that the proposed EWLLE method has a distinct advantage over the existing EDL label enhancement methods in the emotion classification task.
... After features extracted by local image descriptors, classification or regression methods would be employed to obtain the predicted age, such as BIF+SVM [8], BIF+SVR [8], BIF+CCA [12]. More recently, Geng et al. [6], [37] allowed each face image labeled with a label distribution rather than a single age label, where both the real age and its adjacent ages would contribute to the learning. The work in [38], [39] also integrate the idea of label distribution into deep learning framework and achieve promising performance. ...
Article
Full-text available
Different ages are closely related especially among the adjacent ages because aging is a slow and extremely non-stationary process with much randomness. To explore the relationship between the real age and its adjacent ages, an age group-n encoding (AGEn) method is proposed in this paper. In our model, adjacent ages are grouped into the same group and each age corresponds to n groups. The ages grouped into the same group would be regarded as an independent class in the training stage. On this basis, the original age estimation problem can be transformed into a series of binary classification sub-problems. And a deep Convolutional Neural Networks (CNN) with multiple classifiers is designed to cope with such sub-problems. Later, a Local Age Decoding (LAD) strategy is further presented to accelerate the prediction process, which locally decodes the estimated age value from ordinal classifiers. Besides, to alleviate the imbalance data learning problem of each classifier, a penalty factor is inserted into the unified objective function to favor the minority class. To compare with state-of-the-art methods, we evaluate the proposed method on FG-NET, MORPH II, CACD and Chalearn LAP 2015 Databases and it achieves the best performance.
... There are also some age classification methods developed on the basis of SVM and SVR [48][49][50][51]. In addition, X.Geng et al. proposed a number of age classification methods, primarily using LDL (Label Distribution Learning) age classification method [52][53][54][55]. Recently, the DCNN has been widely used for face age identification [56][57][58][59][60][61][62][63]. ...
Article
Full-text available
Face age estimation, a computer vision task facing numerous challenges due to its potential applications in identity authentication, human–computer interface, video retrieval and robot vision, has been attracting increasing attention. In recent years, the deep convolutional neural networks (DCNN) have achieved state-of-the-art performance in age classification of face images. We propose a deep hybrid framework for age classification by exploiting DCNN as the raw feature extractor along with several effective methods, including fine-tuning the DCNN into a fine-tuned deep age feature extraction (FDAFE) model, introducing a new method of feature extracting, applying the maximum joint probability classifier to age classification and a strategy to incorporate information from face images more effectively to improve estimation capabilities further. In addition, we pre-process the original image to represent age information more accurately. Based on the discriminative and compact framework, state-of-the-art performance on several face image data sets has been achieved in terms of classification accuracy.
... With the application of deep learning, people also use the CNN model to extract the features of the image and classify the face age. In recent years, X Geng et al. has proposed a number of age classification methods, mainly Label Distribution Learning (label distribution learning) age classification method [12][13][14][15]. The CNN is widely used for face age identification, such as [16][17][18][19]. ...
... In order to use these historical contexts, we propose to synthesize historical knowledge into the image label via the label distribution learning (LDL) [4,[11][12][13][14][15]40] which is further employed to generate a proper label distribution in each modality. Multiple label distributions are finally encapsulated into our learning framework which can significantly assist visual feature learning in CNN thus leading to largely improved classification performance. ...
Conference Paper
Analyzing and categorizing the style of visual art images, especially paintings, is gaining popularity owing to its importance in understanding and appreciating the art. The evolution of painting style is both continuous, in a sense that new styles may inherit, develop or even mutate from their predecessors and multi-modal because of various issues such as the visual appearance, the birthplace, the origin time and the art movement. Motivated by this peculiarity, we introduce a novel knowledge distilling strategy to assist visual feature learning in the convolutional neural network for painting style classification. More specifically, a multi-factor distribution is employed as soft-labels to distill complementary information with visual input, which extracts from different historical context via label distribution learning. The proposed method is well-encapsulated in a multi-task learning framework which allows end-to-end training. We demonstrate the superiority of the proposed method over the state-of-the-art approaches on Painting91, OilPainting, and Pandora datasets.
... According to the paper [9], we set AA-BP [9], PT-SVM [3], BFGS-LLD [1] as the best settings. For the GMML-kLDL algorithm, we set k=30 (in k-nearest neighbor classifier), K from 1 to 20, the step size is 1. t is from 0.1 to 1, the step is 0.1, M 0 is the identity matrix, λ = 0.1 Table 4 and Table 3 show the experimental results of several algorithms on different evaluation indicators on multiple datasets. ...
Chapter
Full-text available
Label distribution learning is an extend multi-label learning paradigm, especially it can preserve the significance of the labels and the related information among the labels. Many studies have shown that label distribution learning has important applications in label ambiguity. However, some classification information in the labels is not effectively utilized. In this paper, we use the classification information in the labels, and combine with the geometric mean metric learning to learn a new metric in the feature space. Under the new metric, the similar samples of the label space are as close as possible, and dissimilar samples are as far as possible. Finally, the GMML-kLDL model is proposed by combining the classification information in the labels and the neighbor information in the features. The experimental results show that the model is effective in label distribution learning and can effectively improve the prediction performance.
... Geng et al. [14,13] proposed the AGES algorithm based on the subspace trained on a data structure called aging pattern vector. This algorithm was further extended to Adaptive Label Distribution Learning (ALDL) by Geng et al. [10]. Yang et al. [22] tried to combine the LDL with deep learning methods called Deeply Label Distribu-tion Learning (DLDL) to deal with the apparent age estimation problem. ...
Article
Full-text available
In this paper, a novel type of saliency region detection method is proposed based on the recurrent learning of context. It aims to find the image regions that can represent the main content. It is different with previous definitions the goal of which is to either find fixation points or seek the dominant object. The regions should own semantic information, thus being a challenging task for computer vision, especially when the imaging quality is poor with complicated background clutter and uncontrolled viewing conditions. To improve attribute recognition given small-sized training data with poor-quality images, we formulate a joint recurrent learning model for exploring context and correlation, based on which salient region can be detected. Moreover, by the way of incorporating semantic information of image contents, an object oriented pooling strategy is proposed to further improve the performance. We conduct experiments on several challenging publically available saliency detection datasets and it demonstrates the effectiveness of our proposed saliency region detection method.
Article
As a recently arisen framework, Label Distribution Learning (LDL) is one of the most appropriate machine learning paradigms to solve the label ambiguity problems. Due to the high cost, it is intractable to directly collect annotated distribution-level data. Therefore, Label Enhancement (LE) is proposed to obtain the label distribution for training LDL model by mining the information hidden in the logical labels. Accordingly, LE is usually taken as the pre-processing of LDL algorithm to learn with logical labels in previous methods. These two-stage learning methods may reduce the performance of LDL. To this end, we propose a unified framework called L2 which simultaneously conducts Label Enhancement and Label Distribution Learning on samples and logical labels to fully exploit the implicit information for learning optimal LDL model. Specifically, the recovery of label distribution benefits from not only the optimization of the conventional LE objective function but also the feedback of LDL loss. What’s more, the recovered distribution labels can be directly applied to the supervision of LDL training in an end-to-end way. Extensive experiments illustrate that L2 can correctly recover the distribution-level data from the logical labels, and the trained LDL model can perform favorably against state-of-the-art LDL algorithms with the recovered distribution data.
Article
Label Distribution Learning (LDL) is a popular scenario for solving label ambiguity problems by learning the relative importance of each label to a particular instance. Nevertheless, the label is often incomplete due to the difficulty in annotating label distribution. In this mixing label case with complete and incomplete labels, it is often expected that the learning method can achieve better performance than the baseline method merely utilizing complete labeled data. However, the usage of incomplete labeled data may degrade the performance in real applications. Therefore, it is vital to design a safe incomplete LDL method, which will not deteriorate the performance when exploiting incomplete labeled data. To tackle this important but rarely studied problem, we propose a Safe Incomplete LDL method (SILDL), which learns a classifier that can prevent incomplete labeled instances from worsening the performance. Concretely, we learn predictions from multiple incomplete supervised learners and design an efficient solving algorithm by formulating it as a convex quadratic program. Theoretically, we prove that SILDL can obtain the maximal performance gain against the best one of the multiple baseline methods with mild conditions. Extensive experimental results validate the safeness of the proposed approach and show improvements in performance.
Conference Paper
This paper proposes an age estimation algorithm by refining the label distribution in a deep learning framework. There are two tasks during the training period of our algorithm. The first one finds the optimal parameters of supervised deep CNN by given the label distribution of the training sample as the ground truth, while the second one estimates the variances of label distribution to fit the output of the CNN. These two tasks are performed alternatively and both of them are treated as the supervised learning tasks. The AlexNet and ResNet-50 architectures are adopted as the classifiers and the Gaussian form of the label distribution is assumed. Experiments show that the accuracy of age estimation can be improved by refining label distribution.
Chapter
How to accurately estimate facial age is still an intractable task because of insufficiency of training data. In this paper, a hybrid model is proposed to estimate facial age by means of extreme learning machine (ELM) and label distribution support vector regressor (LDSVR). In the proposed method, the bio-inspired features are adopted to estimate the facial age due to its prominent performance. In order to improve the accuracy and decrease the computation burden, the ratio of feature’s between-category to within-category sums of squares (BW) is designed as a criterion to select features. To define the category of each sample, the training data is divided into several sets according to age group. Different virtual class labels are assigned to the samples of each set, respectively. Given the reduced data, a multiple-input-single-output ELM regression model is established to estimate the facial ages. Moreover, a label distribution support vector regressor is adopted to estimate facial age based on a multiple-input-multiple-output regression model. After obtaining the outputs of ELM and LDSVR, a linear weighting strategy is devised to compute the final estimation of facial age. Experimental results on a well known facial image database demonstrates the feasibility and efficiency of the proposed hybrid model.
Conference Paper
Annotating age classes for humans’ facial images according to their appearance is very challenging because of dynamic person-specific ageing pattern, and thus leads to a set of unreliable apparent age labels for each image. For utilising ambiguous label annotations, an intuitive strategy is to generate a pseudo age for each image, typically the average value of manually-annotated age annotations, which is thus fed into standard supervised learning frameworks designed for chronological age estimation. Alternatively, inspired by the recent success of label distribution learning, this paper introduces a novel concept of ambiguous label distribution for apparent age estimation, which is developed under the following observations that (1) soft labelling is beneficial for alleviating the suffering of inaccurate annotations and (2) more reliable annotations should contribute more. To achieve the goal, label distributions of sparse age annotations for each image are weighted according to their reliableness and then combined to construct an ambiguous label distribution. In the light, the proposed learning framework not only inherits the advantages from conventional learning with label distribution to capture latent label correlation but also exploits annotation reliableness to improve the robustness against inconsistent age annotations. Experimental evaluation on the FG-NET age estimation benchmark verifies its effectiveness and superior performance over the state-of-the-art frameworks for apparent age estimation.
Article
Label Distribution Learning (LDL) is a machine learning paradigm which is recently proposed to deal with the more ambiguity label. This paradigm assigns the distribution-level label to an instance so that it can exploit the relative importance of every candidate labels to a particular instance. Previous studies always concentrate on the methods under strong supervision, which requires a large number of tagged training data. In real-world applications, it is usually difficult to collect numerical precise labels owing to the large costs in labor and time spent on the label annotation. To this end, this paper proposes a novel algorithm named Semi-Supervised Label Distribution Learning with Co-regularization (S2LDL-CO). To benefit from all available information, ensemble of two different models is utilized to deal with the labeled and unlabeled data, respectively. More specifically, the co-regularization framework is adopted to combine these two different models, which can process both the labeled and unlabeled data with good robustness and consistency. What’s more, manifold regularization and l2,1-norm are also added into the objective function, which can fully exploit the implicit information in instances. Finally, the well-designed objective function is optimized by an Alternating Direction of Method of Multipliers (ADMM) algorithm. Experimental results tested on thirteen benchmark datasets illustrate its effectiveness over several state-of-the-art methods.
Article
Label distribution learning (LDL) is a general learning framework, which assigns a distribution over a set of labels to an instance rather than a single label or multiple labels. Current LDL methods have either restricted assumptions on the expression form of the label distribution or limitations in representation learning. This paper presents label distribution learning forests (LDLFs) - a novel label distribution learning algorithm based on differentiable decision trees, which have several advantages: 1) Decision trees have the potential to model any general form of label distributions by the mixture of leaf node predictions. 2) The learning of differentiable decision trees can be combined with representation learning, e.g., to learn deep features in an end-to-end manner. We define a distribution-based loss function for forests, enabling all the trees to be learned jointly, and show that an update function for leaf node predictions, which guarantees a strict decrease of the loss function, can be derived by variational bounding. The effectiveness of the proposed LDLFs is verified on two LDL problems, including age estimation and crowd opinion prediction on movies, showing significant improvements to the state-of-the-art LDL methods.
Article
As an important and challenging problem in computer vision, face age estimation is typically cast as a classification or regression problem over a set of face samples with respect to several ordinal age labels, which have intrinsically cross-age correlations across adjacent age dimensions. As a result, such correlations usually lead to the age label ambiguities of the face samples. Namely, each face sample is associated with a latent label distribution that encodes the cross-age correlation information on label ambiguities. Motivated by this observation, we propose a totally data-driven label distribution learning approach to adaptively learn the latent label distributions. The proposed approach is capable of effectively discovering the intrinsic age distribution patterns for cross-age correlation analysis on the basis of the local context structures of face samples. Without any prior assumptions on the forms of label distribution learning, our approach is able to flexibly model the sample-specific context aware label distribution properties by solving a multi-task problem, which jointly optimizes the tasks of age-label distribution learning and age prediction for individuals. Experimental results demonstrate the effectiveness of our approach.
Article
Age estimation from face images is an important yet difficult task in computer vision. Its main difficulty lies in how to design aging features that remain discriminative in spite of large facial appearance variations. Meanwhile, due to the difficulty of collecting and labeling datasets that contain sufficient samples for all possible ages, the age distributions of most benchmark datasets are often imbalanced, which makes this problem more challenge. In this work, we try to solve these difficulties by means of the mainstream deep learning techniques. Specifically, we use a convolutional neural network which can learn discriminative aging features from raw face images without any handcrafting. To combat the sample imbalance problem, we propose a novel cumulative hidden layer which is supervised by a point-wise cumulative signal. With this cumulative hidden layer, our model is learnt indirectly using faces with neighboring ages and thus alleviate the sample imbalance problem. In order to learn more effective aging features, we further propose a comparative ranking layer which is supervised by a pair-wise comparative signal. This comparative ranking layer facilitates aging feature learning and improves the performance of the main age estimation task. In addition, since one face can be included in many different training pairs, we can make full use of the limited training data. It is noted that both of these two novel layers are differentiable, so our model is end-to-end trainable. Extensive experiments on the two of the largest benchmark datasets show that our deep age estimation model gains notable advantage on accuracy when compared against existing methods.
Article
In the environment of smart cities, human facial age estimation has become an important research topic due to its wide applications. Although a variety of methods are proposed to depict facial biologic attributes, the underlying relationships between facial appearance and its biological aging process have not been fully explored. Although relationship learning methods have been constructed in terms of modeling either the facial representations or the facial attributes, none of them model the relationships simultaneously. In this paper, we propose a unified model to explore these relationships simultaneously, coined as JREAE. In JREAE, two covariance matrices are symmetrically constructed to capture the underlying correlations from both aspects of input facial features and output age labels. In this way, the potential relationships of both feature and label are not only modeled definitely, but also explored adaptively from the training data, which is significantly different from those methods that define the relationships manually. Then, we extend the JREAE model with deep convolutional architecture (deep-JREAE) for more powerful discrimination. In addition, we present optimization algorithms to solve the proposed models with theoretical convergence and complexity proof. Finally, through extensive experiments, we not only validate the effectiveness and superior of the proposed methods in performance, but also visualize and analyze the resulting relationships.
Article
In this paper, we propose a knowledge distillation approach with two teachers for facial age estimation. Due to the nonstationary patterns of facial aging process, the relative order of age labels provides more reliable information than exact age values for facial age estimation. Thus the first teacher is a novel ranking method capturing the ordinal relation among age labels. Specifically, it formulates the ordinal relation learning as a task of recovering the original ordered sequences from shuffled ones. The second teacher adopts a same model as the student that treats facial age estimation as a multi-class classification task. The proposed method leverages the intermediate representations learned by the first teacher and the softened outputs of the second teacher as supervisory signals to improve the training procedure and final performance of the compact student for facial age estimation. Hence, the proposed knowledge distillation approach is capable of distillating the ordinal knowledge from the ranking model and the dark knowledge from the multi-class classification model into a compact student, which facilitates the implementation of facial age estimation on platforms with limited memory and computation resources, such as mobile and embedded devices. Extensive experiments involving several famous datasets for age estimation have demonstrated the superior performance of our proposed method over several existing state-of-the-art methods.
Article
Label distribution learning (LDL) is an interpretable and general learning paradigm that has been applied in many real-world applications. In contrast to the simple logical vector in single-label learning (SLL) and multi-label learning (MLL), LDL assigns labels with a description degree to each instance. In practice, two challenges exist in LDL, namely, how to address the dimensional gap problem during the learning process of LDL and how to exactly recover label distributions from existing logical labels, i.e., Label Enhancement (LE). For most existing LDL and LE algorithms, the fact that the dimension of the input matrix is much higher than that of the output one is always ignored and it typically leads to the dimensional reduction owing to the unidirectional projection. The valuable information hidden in the feature space is lost during the mapping process. To this end, this study considers bidirectional projections function which can be applied in LE and LDL problems simultaneously. More specifically, this novel loss function not only considers the mapping errors generated from the projection of the input space into the output one but also accounts for the reconstruction errors generated from the projection of the output space back to the input one. This loss function aims to potentially reconstruct the input data from the output data. Therefore, it is expected to obtain more accurate results. Experiments on several real-world datasets are carried out to demonstrate the superiority of the proposed method for both LE and LDL. Specifically, BD-LE achieves optimal performance in 85.71% cases and renders sub-optimal in 13.09% cases. BD-LDL ranks 1st in 90.48% cases across six evaluation measurements. Compared with the baseline algorithms, the bidirectional projection methods can outperform the best baselines over 7.38% and 9.98% on average for LE and LDL, respectively.
Article
Age estimation plays an important role in humancomputer interaction system. The lack of large number of facial images with definite age label makes age estimation algorithms inefficient. Deep label distribution learning (DLDL) which employs convolutional neural networks (CNN) and label distribution learning to learn ambiguity from ground-truth age and adjacent ages, has been proven to outperform current state-of-the-art framework. However, DLDL assumes a rough label distribution which covers all ages for any given age label. In this paper, a more practical label distribution paradigm is proposed: we limit age label distribution that only covers a reasonable number of neighboring ages. In addition, we explore different label distributions to improve the performance of the proposed learning model. We employ CNN and the improved label distribution learning to estimate age. Experimental results show that compared to the DLDL, our method is more effective for facial age recognition.
Article
Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. To solve this problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world LDL datasets show clear advantages of GLLE over several existing LE algorithms. Furthermore, experimental results on eleven multi-label learning datasets validate the advantage of GLLE over the state-of-the-art multi-label learning approaches.
Article
Label distribution learning (LDL) is proposed for solving the label ambiguity problem in recent years, which can be seen as an extension of multi-label learning. To improve the performance of label distribution learning, some existing algorithms exploit label correlations in a global manner that assumes the label correlations are shared by all instances. However, the instances in different groups may share different label correlations, and few label correlations are globally applicable in real-world tasks. In this paper, two novel label distribution learning algorithms are proposed by exploiting label correlations on local samples, which are called GD-LDL-SCL and Adam-LDL-SCL, respectively. To utilize the label correlations on local samples, the influence of local samples is encoded, and a local correlation vector is designed as the additional features for each instance, which is based on the different clustered local samples. Then the label distribution for an unseen instance can be predicted by exploiting the original features and the additional features simultaneously. Extensive experiments on some real-world data sets validate that our proposed methods can address the label distribution problems effectively and outperform state-of-the-art methods.
Article
Label distribution learning (LDL) is one of the paradigms for dealing with label ambiguity, and it can learn the relative importance of each label to a particular instance. Most of the existing LDL approaches require strong supervision information, which always involves high cost of the data-labeling process. To this end, we propose a novel transductive weakly supervised LDL algorithm based on matrix completion, which simultaneously explores the discriminant knowledge of both training instances and testing instances. Moreover, the manifold regularization is exploited to capture the samples’ relevance to enhance the performance of matrix completion. Experimental results on real-world data sets with various missing percentages validate the effectiveness of our method.
Chapter
In this paper, we construct a multi-task deep learning model to simultaneously predict people number and the level of crowd density. Motivated by the success of applying “ambiguous labelling” to age estimation problem, we also manage to employ this strategy to the people counting problem. We show that it is a reasonable strategy since people counting problem is similar to the age estimation problem. Also, by applying “ambiguous labelling”, we are able to augment the size of training dataset, which is a desirable property when applying to deep learning model. In a series of experiment, we show that the “ambiguous labelling” strategy can not only improve the performance of deep learning but also enhance the prediction ability of traditional computer vision methods such as Random Projection Forest with hand-crafted features.
Article
Age estimation is a challenging research topic in recent years. Existing approaches usually use only appearance features for age estimation. Personalized aging patterns, i.e., sequences of personal features, which have been shown as an important factor for improving age estimation accuracy, however, are not considered in their researches. We propose a novel model named recurrent age estimation (RAE), to make full use of appearance features as well as personalized aging patterns. RAE uses the CNN-LSTM architecture. Convolutional neural networks (CNNs)are trained to extract discriminative appearance features from face images, and long short-term memory networks (LSTMs)are employed to learn personalized aging patterns from sequences of personal features. Furthermore, we integrate the label distribution learning (LDL)scheme into LSTMs to exploit ambiguity from the real age and adjacent ages. The superiority of the RAE compared with existing approaches is shown by experimental results.
Conference Paper
Emotion recognition from facial expressions is an interesting and challenging problem and has attracted much attention in recent years. Substantial previous research has only been able to address the ambiguity of “what describes the expression”, which assumes that each facial expression is associated with one or more predefined affective labels while ignoring the fact that multiple emotions always have different intensities in a single picture. Therefore, to depict facial expressions more accurately, this paper adopts a label distribution learning approach for emotion recognition that can address the ambiguity of “how to describe the expression” and proposes an emotion distribution learning method that exploits label correlations locally. Moreover, a local low-rank structure is employed to capture the local label correlations implicitly. Experiments on benchmark facial expression datasets demonstrate that our method can better address the emotion distribution recognition problem than state-of-the-art methods.
Article
Although researches of facial attributes analysis have been launched for decades, estimation of chronological age attribute remains a big challenge. Previous researchers have found that some facial attributes (e.g., gender and race attributes) have close connections with the age attribute and make age estimation under specific condition decided by various combinations of those age-related attributes should be more reasonable. In this paper, we propose a generic framework based on convolutional neural network (CNN) which can consider different conditions for age estimation and jointly output age and age-related facial attributes in the end. Compared with conventional methods, it is more efficient and universal. Besides, we view age estimation as a special multi-class ordinal classification problem and use a losses combination function to optimize the predicted probability distribution of individual age classes. These operations further improve the performance of age estimation. Finally, the proposed method achieves start-of-the-art results on both controlled and wild face datasets.
Article
Label distribution learning (LDL) is a new machine learning paradigm to solve label ambiguity and has drawn increasing attention in recent years. The importance of all labels needs to be considered under the LDL settings. A series of approaches have been proposed to deal with the LDL problem by considering the correlation of labels or instances. However, none of them focuses on finding interpretable bases to reduce the dimensions of the feature space. Inspired by the semi-nonnegative matrix factorization (semi-NMF) method, we propose a new LDL learning framework to deal with the problem through learning nonnegative components. The key insight is to explore the bases, each of which represents a class, through the label distribution and to transform the input matrix into a coefficient matrix of the space constructed by the bases. Consequently, a maximum entropy model can be adopted to learn the label distribution from the coefficient matrix. Experimental results on real-world datasets comparing our method with several state-of-the-art methods validate the performance of our approach.
Conference Paper
Full-text available
A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalar-valued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
Conference Paper
Full-text available
In this paper, we propose an ordinal hyperplane ranking algorithm called OHRank, which estimates human ages via facial images. The design of the algorithm is based on the relative order information among the age labels in a database. Each ordinal hyperplane separates all the facial images into two groups according to the relative order, and a cost-sensitive property is exploited to find better hyperplanes based on the classification costs. Human ages are inferred by aggregating a set of preferences from the ordinal hyperplanes with their cost sensitivities. Our experimental results demonstrate that the proposed approach outperforms conventional multiclass-based and regression-based approaches as well as recently developed ranking-based age estimation approaches.
Conference Paper
Full-text available
One of the main difficulties in facial age estimation is the lack of sufficient training data for many ages. Fortunately, the faces at close ages look similar since aging is a slow and smooth process. Inspired by this observation, in this paper, instead of considering each face image as an example with one label (age), we regard each face image as an example associated with a label distribution. The label distribution covers a number of class labels, representing the degree that each label describes the example. Through this way, in addition to the real age, one face image can also contribute to the learning of its adjacent ages. We propose an algorithm named IIS-LLD for learning from the label distributions, which is an iterative optimization process based on the maximum entropy model. Experimental results show the advantages of IIS-LLD over the traditional learning methods based on single-labeled data. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Full-text available
In this paper, we present an automatic web image mining system towards building a universal human age estimator based on facial information, which is applicable to all ethnic groups and various image qualities. First, a large (∼391k) yet noisy human aging im- age dataset is crawled from the photo sharing website Flickr and Google image search engine based on a set of human age related text queries. Then, within each image, several human face detectors of different implementations are used for robust face detection, and all the detected faces with multiple responses are considered as the multiple instances of a bag (image). An outlier removal step with Principal Component Analysis further refines the image set to about 220k faces, and then a robust multi-instance regressor learning al- gorithm is proposed to learn the kernel-regression based human age estimator under the scenarios with possibly noisy bags. The proposed system has the following characteristics: 1) no manual human age labeling process is required, and the age information is automatically obtained from the age related queries, 2) the derived human age estimator is universal owing to the diversity and rich- ness of Internet images and thus has good generalization capabil- ity, and 3) the age estimator learning process is robust to the noises existing in both Internet images and corresponding age labels. This automatically derived human age estimator is extensively evaluated on three popular benchmark human aging databases, and without taking any images from these benchmark databases as training sam- ples, comparable age estimation accuracies with the state-of-the-art results are achieved.
Conference Paper
Full-text available
In this paper, we take the human age and pose estimation problems as examples to study automatic designing regressor from training samples with uncertain nonnegative labels. First, the nonnegative label is predicted as the square norm of a matrix, which is bilinearly transformed from the nonlinear mappings of the candidate kernels. Two transformation matrices are then learned for deriving such a matrix by solving a semi definite programming (SDP) problem, in which the uncertain label of each sample is expressed as two inequality constraints. The objective function of SDP controls the ranks of these two matrices, and consequently automatically determines the structure of the regressor. The whole framework for automatic designing regressor from samples with uncertain nonnegative labels has the following characteristics: 1) SDP formulation makes full use of the uncertain labels, instead of using conventional fixed labels; 2) regression with matrix norm naturally guarantees the nonnegativity of the labels, and greater prediction capability is achieved by integrating the squares of the matrix elements, which act as weak regressors; and 3) the regressor structure is automatically determined by the pursuit of simplicity, which potentially promotes the algorithmic generalization capability. Extensive experiments on two human age databases, FG-NET and Yamaha, as well as the Pointing'04 pose database, demonstrate encouraging estimation accuracy improvements over conventional regression algorithms.
Article
Full-text available
An interior-point method for nonlinear programming is presented. It enjoys the flexibility of switching between a line search method that computes steps by factoring the primal-dual equations and a trust region method that uses a conjugate gradient iteration. Steps computed by direct factorization are always tried first, but if they are deemed ineffective, a trust region iteration that guarantees progress toward stationarity is invoked. To demonstrate its effectiveness, the algorithm is implemented in the Knitro [6,28] software package and is extensively tested on a wide selection of test problems.
Article
Full-text available
A large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called marginal Fisher analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional linear discriminant analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions
Conference Paper
Full-text available
Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limitedmemory variable metric algorithm outperformed the other choices.
Article
Full-text available
While recognition of most facial variations, such as identity, expression and gender, has been extensively studied, automatic age estimation has rarely been explored. In contrast to other facial variations, aging variation presents several unique characteristics which make age estimation a challenging task. This paper proposes an automatic age estimation method named AGES (AGing pattErn Subspace). The basic idea is to model the aging pattern, which is defined as the sequence of a particular individual' s face images sorted in time order, by constructing a representative subspace. The proper aging pattern for a previously unseen face image is determined by the projection in the subspace that can reconstruct the face image with minimum reconstruction error, while the position of the face image in that aging pattern will then indicate its age. In the experiments, AGES and its variants are compared with the limited existing age estimation methods (WAS and AAS) and some well-established classification methods (kNN, BP, C4.5, and SVM). Moreover, a comparison with human perception ability on age is conducted. It is interesting to note that the performance of AGES is not only significantly better than that of all the other algorithms, but also comparable to that of the human observers.
Conference Paper
Full-text available
We propose a craniofacial growth model that characterizes growth related shape variations observed in human faces during formative years. The model draws inspiration from the ‘revised’ cardioidal strain transformation model proposed in psychophysical studies related to craniofacial growth. The model takes into account anthropometric evidences collected on facial growth and hence is in accordance with the observed growth patterns in human faces across years. We characterize facial growth by means of growth parameters defined over facial landmarks often used in anthropometric studies. We illustrate how the age-based anthropometric constraints on facial proportions translate into linear and non-linear constraints on facial growth parameters and propose methods to compute the optimal growth parameters. The proposed craniofacial growth model can be used to predict one’s appearance across years and to perform face recognition across age progression. This is demonstrated on a database of age separated face images of individuals under 18 years of age.
Conference Paper
Full-text available
This paper details MORPH a longitudinal face database developed for researchers investigating all facets of adult age-progression, e.g. face modeling, photo-realistic animation, face recognition, etc. This database contributes to several active research areas, most notably face recognition, by providing: the largest set of publicly available longitudinal images; longitudinal spans from a few months to over twenty years; and, the inclusion of key physical parameters that affect aging appearance. The direct contribution of this data corpus for face recognition is highlighted in the evaluation of a standard face recognition algorithm, which illustrates the impact that age-progression, has on recognition rates. Assessment of the efficacy of this algorithm is evaluated against the variables of gender and racial origin. This work further concludes that the problem of age-progression on face recognition (FR) is not unique to the algorithm used in this work
Article
Full-text available
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing
Article
Full-text available
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
Article
One of the main difficulties in facial age estimation is that the learning algorithms cannot expect sufficient and complete training data. Fortunately, the faces at close ages look quite similar since aging is a slow and smooth process. Inspired by this observation, instead of considering each face image as an instance with one label (age), this paper regards each face image as an instance associated with a label distribution. The label distribution covers a certain number of class labels, representing the degree that each label describes the instance. Through this way, one face image can contribute to not only the learning of its chronological age, but also the learning of its adjacent ages. Two algorithms, named IIS-LLD and CPNN, are proposed to learn from such label distributions. Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.
Conference Paper
Extensive recent studies on human faces reveal significant potential applications of automatic age estimation via face image analysis. Due to the temporal features of age progression, aging face images display sequential pattern of low-dimensional distribution. Through manifold analysis of face pictures, we developed a novel age estimation framework. The manifold learning methods are applied to find a sufficient embedding space and model the low-dimensional manifold data with a multiple linear regression function. Experimental results on a large size age database demonstrate the effectiveness of the framework. To our best knowledge, this is the first work involving the manifold ways of age estimation.
We investigate the biologically inspired features (BIF) for human age estimation from faces. As in previous bio-inspired models, a pyramid of Gabor filters are used at all positions of the input image for the S<sub>1</sub> units. But unlike previous models, we find that the pre-learned prototypes for the S<sub>2</sub> layer and then progressing to C<sub>2</sub> cannot work well for age estimation. We also propose to use Gabor filters with smaller sizes and suggest to determine the number of bands and orientations in a problem-specific manner, rather than using a predefined number. More importantly, we propose a new operator ldquoSTDrdquo to encode the aging subtlety on faces. Evaluated on the large database YGA with 8,000 face images and the public available FG-NET database, our approach achieves significant improvements in age estimation accuracy over the state-of-the-art methods. By applying our system to some Internet face images, we show the robustness of our method and the potential of cross-race age estimation, which has not been explored by any studies before.
Conference Paper
Age Speciflc Human-Computer Interaction (ASHCI) has vast potential applications in daily life. However, automatic age estimation technique is still underdeveloped. One of the main reasons is that the aging efiects on human faces present several unique characteristics which make age estimation a challenging task that requires non-standard classiflcation approaches. According to the speciality of the facial aging efiects, this paper proposes the AGES (AGing pattErn Sub- space) method for automatic age estimation. The basic idea is to model the aging pattern, which is deflned as a sequence of personal aging face images, by learning a representative subspace. The proper aging pattern for an unseen face im- age is then determined by the projection in the subspace that can best reconstruct the face image, while the position of the face image in that aging pattern will indicate its age. The AGES method has shown encouraging performance in the comparative experiments either as an age estimator or as an age range estimator.
Human age estimation has recently become an active research topic in computer vision and pattern recognition, because of many potential applications in reality. In this paper we propose to use the kernel partial least squares (KPLS) regression for age estimation. The KPLS (or linear PLS) method has several advantages over previous approaches: (1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using different techniques; (2) the KPLS can find a small number of latent variables, e.g., 20, to project thousands of features into a very low-dimensional subspace, whichmayhavegreatimpactonreal-timeapplications;and (3) the KPLS regression has an output vector that can contain multiple labels, so that several related problems, e.g., age estimation, gender classification, and ethnicity estimation can be solved altogether. This is the first time that the kernel PLS method is introduced and applied to solve a regression problem in computer vision with high accuracy. Experimentalresults on a very large databaseshow that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.
Article
Human age, as an important personal trait, can be directly inferred by distinct patterns emerging from the facial appearance. Derived from rapid advances in computer graphics and machine vision, computer-based age synthesis and estimation via faces have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as forensic art, electronic customer relationship management, security control and surveillance monitoring, biometrics, entertainment, and cosmetology. Age synthesis is defined to rerender a face image aesthetically with natural aging and rejuvenating effects on the individual face. Age estimation is defined to label a face image automatically with the exact age (year) or the age group (year range) of the individual face. Because of their particularity and complexity, both problems are attractive yet challenging to computer-based application system designers. Large efforts from both academia and industry have been devoted in the last a few decades. In this paper, we survey the complete state-of-the-art techniques in the face image-based age synthesis and estimation topics. Existing models, popular algorithms, system performances, technical difficulties, popular face aging databases, evaluation protocols, and promising future directions are also provided with systematic discussions.
Article
This paper is a summary of findings of adult age-related craniofacial morphological changes. Our aims are two-fold: (1) through a review of the literature we address the factors influencing craniofacial aging, and (2) the general ways in which a head and face age in adulthood. We present findings on environmental and innate influences on face aging, facial soft tissue age changes, and bony changes in the craniofacial and dentoalveolar skeleton. We then briefly address the relevance of this information to forensic science research and applications, such as the development of computer facial age-progression and face recognition technologies, and contributions to forensic sketch artistry.
Article
Recently, extensive studies on human faces in the human-computer interaction (HCI) field reveal significant potentials for designing automatic age estimation systems via face image analysis. The success of such research may bring in many innovative HCI tools used for the applications of human-centered multimedia communication. Due to the temporal property of age progression, face images with aging features may display some sequential patterns with low-dimensional distributions. In this paper, we demonstrate that such aging patterns can be effectively extracted from a discriminant subspace learning algorithm and visualized as distinct manifold structures. Through the manifold method of analysis on face images, the dimensionality redundancy of the original image space can be significantly reduced with subspace learning. A multiple linear regression procedure, especially with a quadratic model function, can be facilitated by the low dimensionality to represent the manifold space embodying the discriminative property. Such a processing has been evaluated by extensive simulations and compared with the state-of-the-art methods. Experimental results on a large size aging database demonstrate the effectiveness and robustness of our proposed framework.
Article
Estimating human age automatically via facial image analysis has lots of potential real-world applications, such as human computer interaction and multimedia communication. However, it is still a challenging problem for the existing computer vision systems to automatically and effectively estimate human ages. The aging process is determined by not only the person's gene, but also many external factors, such as health, living style, living location, and weather conditions. Males and females may also age differently. The current age estimation performance is still not good enough for practical use and more effort has to be put into this research direction. In this paper, we introduce the age manifold learning scheme for extracting face aging features and design a locally adjusted robust regressor for learning and prediction of human ages. The novel approach improves the age estimation accuracy significantly over all previous methods. The merit of the proposed approaches for image-based age estimation is shown by extensive experiments on a large internal age database and the public available FG-NET database.
Article
The process of aging causes significant alterations in the facial appearance of individuals. When compared with other sources of variation in face images, appearance variation due to aging displays some unique characteristics. Changes in facial appearance due to aging can even affect discriminatory facial features, resulting in deterioration of the ability of humans and machines to identify aged individuals. We describe how the effects of aging on facial appearance can be explained using learned age transformations and present experimental results to show that reasonably accurate estimates of age can be made for unseen images. We also show that we can improve our results by taking into account the fact that different individuals age in different ways and by considering the effect of lifestyle. Our proposed framework can be used for simulating aging effects on new face images in order to predict how an individual might look like in the future or how he/she used to look in the past. The methodology presented has also been used for designing a face recognition system, robust to aging variation. In this context, the perceived age of the subjects in the training and test images is normalized before the training and classification procedure so that aging variation is eliminated. Experimental results demonstrate that, when age normalization is used, the performance of our face recognition system can be improved