Conference Paper

Automatic face annotation by multilinear AAM with Missing Values

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

It has been shown that multilinear subspace analysis is a powerful tool to overcome difficulties posed by viewpoint, illumination and expression variations in Active Appearance Model(AAM). However, the Higher Order Singular Value Decomposition (HOSVD) in multilinear analysis requires training samples to build the training tensor, which include face images under all different variations. It is hard to obtain such a complete training tensor in practical applications. In this paper, we propose a multilinear AAM which can be generated from an incomplete training tensor using Multilinear Subspace Analysis with Missing Values (M2SA). Also, the 2D appearance is used for training appearance tensor directly to reduce the memory requirements. Experimental results on the Multi-PIE face database show the efficiency of the proposed method.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... During the past decades, the most widely used feature learning and dimensionality reduction methods include Principal Component Analysis (PCA) [12] and Linear Discriminant Analysis (LDA) [13]. In addition, some other techniques such as manifold learning [14], sparse representation [15][16][17], low rank representation [18], tensor decomposition [19,20] and Nonnegative Matrix Factorization (NMF) [21,22] have also been widely used for high dimensional data processing and compression. ...
Article
Full-text available
Autoencoders have been successfully used to build deep hierarchical models of data. However, a deep architecture usually needs further supervised fine-tuning to obtain better discriminative capacity. To improve the discriminative capacity of deep hierarchical features, this paper proposes a new deterministic autoencoder, trained by a label consistency constraints algorithm that injects discriminative information to the network. We introduce the center loss as label consistency constraints to learn the hidden features of data and add it to the Sparse AutoEncoder to form a new autoencoder, namely Label Consistency Constrained Sparse AutoEncoders (LCCSAE). Specifically, the center loss learns the center of each class, and simultaneously penalizes the distances between the features and their corresponding class centers. In the end, autoencoders are stacked to form a deep architecture of LCCSAE for image classification tasks. To validate the effectiveness of LCCSAE, we compare it with other autoencoders in terms of the deeply learned features and the subsequent classification tasks on MNIST and CIFAR-bw datasets. Experimental results demonstrate the superiority of LCCSAE over other methods.
... Issues relating to both, model construction and model fitting, have been investigated [11][12][13][14][15][16] . To extend the capacity of 2D models so as to capture different modes of shape and appear- ance variations, the early frontal face 2D morphable models have been generalised to multiview [17,18] and most recently multilin- ear models [19,20] . However, as in the deep learning network case, the development of these multimodal 2D face models is severely limited by the lack of training data. ...
Article
3D Morphable Face Models (3DMM) have been used in pattern recognition for some time now. They have been applied as a basis for 3D face recognition, as well as in an assistive role for 2D face recognition to perform geometric and photometric normalisation of the input image, or in 2D face recognition system training. The statistical distribution underlying 3DMM is Gaussian. However, the single-Gaussian model seems at odds with reality when we consider different cohorts of data, e.g. Black and Chinese faces. Their means are clearly different. This paper introduces the Gaussian Mixture 3DMM (GM-3DMM) which models the global population as a mixture of Gaussian subpopulations, each with its own mean. The proposed GM-3DMM extends the traditional 3DMM naturally, by adopting a shared covariance structure to mitigate small sample estimation problems associated with data in high dimensional spaces. We construct a GM-3DMM, the training of which involves a multiple cohort dataset, SURREY-JNU, comprising 942 3D face scans of people with mixed backgrounds. Experiments in fitting the GM-3DMM to 2D face images to facilitate their geometric and photometric normalisation for pose and illumination invariant face recognition demonstrate the merits of the proposed mixture of Gaussians 3D face model.
... For example, AAM uses the intensity difference between an input image and a texture model as the loss for model optimisation. These models and their extensions have achieved great success in facial landmarking for faces with controlled appearance variations [1,9,20,21,23,33,40]. However, for faces in the wild, the PCA-based shape model may miss some shape details and in consequence it is not able to represent complex shapes faithfully. ...
Conference Paper
Full-text available
We present a framework for robust face detection and landmark localisation of faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark Localisation Competition'. The framework has four stages: face detection, bounding box aggregation, pose estimation and landmark localisation. To achieve a high detection rate, we use two publicly available CNN-based face detectors and two proprietary detectors. We aggregate the detected face bounding boxes of each input image to reduce false positives and improve face detection accuracy. A cascaded shape regressor, trained using faces with a variety of pose variations, is then employed for pose estimation and image pre-processing. Last, we train the final cascaded shape regressor for fine-grained landmark localisation, using a large number of training samples with limited pose variations. The experimental results obtained on the 300W and Menpo benchmarks demonstrate the superiority of our framework over state-of-the-art methods.
... T-AAM has avoided this problem by focusing on relatively slight pose deviations (up to 22.5 • ) from the frontal pose. 4) The classical T-AAM fitting algorithm is gradient-descentbased (Lee and Kim, 2009;Feng et al., 2012), and relies on the estimation of the state of each variation mode for a new face. It can easily be trapped in local minima, especially when the state estimation of a face is inaccurate. ...
Article
Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. In contrast with the classical Tensor-based AAM (T-AAM), the proposed UT-AAM has four advantages: First, for each type of face information, namely shape and texture, we construct a tensor model capturing all relevant appearance variations. This unified tensor model contrasts with the variation-specific models of T-AAM. Second, a strategy for dealing with self-occluded faces is proposed to obtain consistent shape and texture representations of faces across large pose variations. Third, our UT-AAM is capable of constructing the model from an incomplete training dataset, using tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting. With these improvements, the utility of UT-AAM in practice is considerably enhanced in comparison with the classical T-AAM. As an example, we demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise a large number of virtual samples. Experimental results obtained using the Multi-PIE and 300-W face datasets demonstrate the merits of the proposed approach.
... It would be time consuming if the appearance is formed as a (184 * 194) × 1 vector. Therefore, 2D appearance representation proposed in [Feng et al. 2012] is referenced to speed up the computation. Following the procedures described in section 2, we produce the comparison between original images and de-identified ones as shown in Fig. 4. ...
Conference Paper
This paper addresses the problem of privacy protection in face synthesis. We propose a new face synthesis approach based on tensor decomposition. By using the mathematical properties of tensor analysis, we decompose a face image into multiple factors so that the synthesis process could concentrate only on privacy related information. Then, we generate a new face image by altering the privacy related factors and keeping the other ones untouched. Compared to previous algorithms, our approach has the advantage in producing a synthetic face image without the risk of privacy leaking. We conduct the experiments in different datasets and factors to show the flexibility of the proposed approach. After gaining the synthesis images, we apply the automatic recognition algorithms for both expressions and faces to them. The experiment results demonstrate the effectiveness of our approach.
... The representative FLD methods making the early milestones include Active Shape Model (ASM) [8], Active Appearance Model (AAM) [7] and Constrained Local Model (CLM) [10]. These algorithms and their extensions have achieved excellent FLD results in constrained scenarios [17]. As a result, the current trend is to develop a more robust FLD for unconstrained faces that are rich in appearance variations. ...
Conference Paper
Full-text available
We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate facial landmarks. Then, an online dynamic model selection method is used to choose appropriate domain-specific CSRs for further landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant mechanism, using fuzzy set sample weighting for attention-controlled domain-specific model training. Moreover, we advocate data augmentation with a simple but effective 2D profile face generator, and context-aware feature extraction for better facial feature representation. Experimental results obtained on challenging datasets demonstrate the merits of our DAC-CSR over the state-of-the-art.
... To address the above issues, in this paper, we develop a method to extend an existing dictionary using a generative 3DMM. As compared to 2D generative models such as active appearance models (AAM) [16], [24], a 3DMM is capable of generating diverse face instances with arbitrary pose and illumination variations. It has been already widely used in some computer vision applications. ...
Article
The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for pose-invariant collaborative-representation-based face classification. To this end, we first fit a 3DMM to the 2D face images of a dictionary to reconstruct the 3D shape and texture of each image. The 3D faces are used to render a number of virtual 2D face images with arbitrary pose variations to augment the training data, by merging the original and rendered virtual samples {\color{black}to create} an extended dictionary. Second, to reduce the information redundancy of the extended dictionary and improve the sparsity of reconstruction coefficient vectors using collaborative-representation-based classification (CRC), we exploit an on-line elimination scheme to optimise the extended dictionary by identifying the most representative training samples for a given query. The final goal is to perform pose-invariant face classification using the proposed dictionary integration method and the on-line pruning strategy under the CRC framework. Experimental results obtained for a set of well-known face datasets demonstrate the merits of the proposed method, especially its robustness to pose variations.
... The landmarks of a face are important for many practical applications, such as face alignment, face tracking, face recognition and 3D face reconstruction from 2D images. Popular facial landmark detection algorithms include Active Shape Model (ASM) [13], Active Appearance Model (AAM) [12,19], Constrained Local Model (CLM) [14] and cascaded-regression-based methods [15,18]. In recent years, cascaded-regressionbased methods have become very popular because they can provide accurate landmarks for unconstrained faces in the wild [62,9,18,17]. ...
Conference Paper
3D Morphable Face Models (3DMM) have been used in face recognition for some time now. They can be applied in their own right as a basis for 3D face recognition and analysis involving 3D face data. However their prevalent use over the last decade has been as a versatile tool in 2D face recognition to normalise pose, illumination and expression of 2D face images. It has the generative capacity to augment the training and test databases for various 2D face processing related tasks. It can be used to expand the gallery set for pose-invariant face matching. For any 2D face image it can furnish complementary information, in terms of its 3D face shape and texture. It can also aid multiple frame fusion by providing the means of registering a set of 2D images. A key enabling technology for this versatility is 3D face model to 2D face image fitting. In this paper recent developments in 3D face modelling and model fitting will be overviewed, and their merits in the context of diverse applications illustrated on several examples, including pose and illumination invariant face recognition, and 3D face reconstruction from video.
... The main motivations for the proposed algorithm are clarified as follows: (1) until now, AAMs have only been applied to 2D medical images, which can not effectively segment the lung volumes with adjacent lesions; (2) some tensor-based 3D AAMs [33] used in face recognition lack an efficient combination algorithm of shape and appearance parameters, which affects the convergence of AAM and increases the computation complexity. ...
Article
Full-text available
An Active Appearance Model (AAM) is a computer vision model which can be used to effectively segment lung fields in CT images. However, the fitting result is often inadequate when the lungs are affected by high-density pathologies. To overcome this problem, we propose a Higher-order Singular Value Decomposition (HOSVD)-based Three-dimensional (3D) AAM. An evaluation was performed on 310 diseased lungs form the Lung Image Database Consortium Image Collection. Other contemporary AAMs operate directly on patterns represented by vectors, i.e., before applying the AAM to a 3D lung volume,it has to be vectorized first into a vector pattern by some technique like concatenation. However, some implicit structural or local contextual information may be lost in this transformation. According to the nature of the 3D lung volume, HOSVD is introduced to represent and process the lung in tensor space. Our method can not only directly operate on the original 3D tensor patterns, but also efficiently reduce the computer memory usage. The evaluation resulted in an average Dice coefficient of 97.0 % ± 0.59 %, a mean absolute surface distance error of 1.0403 ± 0.5716 mm, a mean border positioning errors of 0.9187 ± 0.5381 pixel, and a Hausdorff Distance of 20.4064 ± 4.3855, respectively. Experimental results showed that our methods delivered significant and better segmentation results, compared with the three other model-based lung segmentation approaches, namely 3D Snake, 3D ASM and 3D AAM.
... Liu et al. [29] gave the 3D X-ray transform within a multilinear framework and proposed a multilinear X-ray transform feature representation. Feng et al. [30] projection with tensor representation (MMPTR) applied to gait recognition and micro-expression recognition because these recognition issues have some common ground: (1) The beginning and final frames should be labeled when a single recognized sample is defined; thus, the single sample can be viewed as a third-order tensor including all the frames between the beginning and final frames; (2) the dimensionality reduction approach to these tensorial data is required to reduce excess redundancy and extract the discriminant feature for the recognition task. ...
Article
We contribute, through this paper, to design a novel algorithm called maximum margin projection with tensor representation (MMPTR). This algorithm is able to recognize gait and micro-expression represented as third-order tensors. Through maximizing the inter-class Laplacian scatter and minimizing the intra-class Laplacian scatter, MMPTR can seek a tensor-to-tensor projection that directly extracts discriminative and geometry-preserving features from the original tensorial data. We show the validity of MMPTR through extensive experiments on the CASIA(B) gait database, TUM GAID gait database, and CASME micro-expression database. The proposed MMPTR generally obtains higher accuracy than MPCA, GTDA as well as state-of-the-art DTSA algorithm. Experimental results included in this paper suggest that MMPTR is especially effective in such tensorial object recognition tasks.
... Typical generative models are ASM [4], AAM [5] and their extensions [3], [6], [7], [39], [43], [44] . A common characteristic of ASM and AAM is a parametric PCA-based shape model that is constrained by the corresponding eigenvalues when fitting the models to an input image. ...
Article
A large amount of training data is usually crucial for successful supervised learning. However, the task of providing training samples is often time-consuming, involving a considerable amount of tedious manual work. Also the amount of training data available is often limited. As an alternative, in this paper, we discuss how best to augment the available data for the application of automatic facial landmark detection (FLD). We propose the use of a 3D morphable face model to generate synthesised faces for a regression-based detector training. Benefiting from the large synthetic training data, the learned detector is shown to exhibit a better capability to detect the landmarks of a face with pose variations. Furthermore, the synthesised training dataset provides accurate and consistent landmarks as compared to using manual landmarks, especially for occluded facial parts. The synthetic data and real data are from different domains; hence the detector trained using only synthesised faces does not generalise well to real faces. To deal with this problem, we propose a cascaded collaborative regression (CCR) algorithm, which generates a cascaded shape updater that has the ability to overcome the difficulties caused by pose variations, as well as achieving better accuracy when applied to real faces. The training is based on a mix of synthetic and real image data with the mixing controlled by a dynamic mixture weighting schedule. Initially the training uses heavily the synthetic data, as this can model the gross variations between the various poses. As the training proceeds, progressively more of the natural images are incorporated, as these can model finer detail. To improve the performance of the proposed algorithm further, we designed a dynamic multi-scale local feature extraction method, which captures more informative local features for detector training. An extensive evaluation on both controlled and uncontrolled face datasets demonstrates the merit of the proposed algorithm.
... Once we obtain the state estimation results, we can train the fused shape and appearance models using the training approach in [17]: ...
Conference Paper
In practical applications of pattern recognition and computer vision, the performance of many approaches can be improved by using multiple models. In this paper, we develop a common theoretical framework for multiple model fusion at the feature level using multilinear subspace analysis (also known as tensor algebra). One disadvantage of the multilinear approach is that it is hard to obtain enough training observations for tensor decomposition algorithms. To overcome this difficulty, we adopted the M2SA algorithm to reconstruct the missing entries of the incomplete training tensor. Furthermore, we apply the proposed framework to the problem of face image analysis using Active Appearance Model (AAM) to validate its performance. Evaluations of AAM using the proposed framework are conducted on Multi-PIE face database with promising results.
Conference Paper
We present a method to estimate, based on the horizontal symmetry, an intrinsic coordinate system of faces scanned in 3D. We show that this coordinate system provides an excellent basis for subsequent landmark positioning and model-based refinement such as Active Shape Models, outperforming other -explicit- landmark localisation methods including the commonly-used ICP+ASM approach.
Article
Full-text available
Multilinear subspace analysis (MSA) is a promising methodology for pattern-recognition problems due to its ability in decomposing the data formed from the interaction of multiple factors. The MSA requires a large training set, which is well organized in a single tensor, which consists of data samples with all possible combinations of the contributory factors. However, such a "complete" training set is difficult (or impossible) to obtain in many real applications. The missing-value problem is therefore crucial to the practicality of the MSA but has been hardly investigated up to present. To solve the problem, this paper proposes an algorithm named M(2)SA, which is advantageous in real applications due to the following: 1) it inherits the ability of the MSA to decompose the interlaced semantic factors; 2) it does not depend on any assumptions on the data distribution; and 3) it can deal with a high percentage of missing values. M(2)SA is evaluated by face image modeling on two typical multifactorial applications, i.e., face recognition and facial age estimation. Experimental results show the effectiveness of M(2) SA even when the majority of the values in the training tensor are missing.
Conference Paper
Full-text available
Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing ensembles of images resulting from the interaction of any number of underlying factors. We present a dimensionality reduction algorithm that enables subspace analysis within the multilinear framework. This N-mode orthogonal iteration algorithm is based on a tensor decomposition known as the N-mode SVD, the natural extension to tensors of the conventional matrix singular value decomposition (SVD). We demonstrate the power of multilinear subspace analysis in the context of facial image ensembles, where the relevant factors include different faces, expressions, viewpoints, and illuminations. In prior work we showed that our multilinear representation, called TensorFaces, yields superior facial recognition rates relative to standard, linear (PCA/eigenfaces) approaches. We demonstrate factor-specific dimensionality reduction of facial image ensembles. For example, we can suppress illumination effects (shadows, highlights) while preserving detailed facial features, yielding a low perceptual error.
Article
Full-text available
Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instance; i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed ("projected out") using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.
Article
Full-text available
We propose to use statistical models of shape and texture as deformable anatomical atlases. By training on sets of labelled examples these can represent both the mean structure and appearance of anatomy in medical images, and the allowable modes of deformation. Given enough training examples such a model should be able synthesise any image of normal anatomy. By finding the parameters which minimise the difference between the synthesised model image and the target image we can locate all the modelled structure. This potentially time consuming step can be solved rapidly using the Active Appearance Model (AAM). In this paper we describe the models and the AAM algorithm and demonstrate the approach on structures in MR brain cross-sections.
Article
The Active appearance model (AAM) is a well-known model that can represent a non-rigid object effectively. However, the fitting result is often unsatisfactory when an input image deviates from the training images due to its fixed shape and appearance model. To obtain more robust AAM fitting, we propose a tensor-based AAM that can handle a variety of subjects, poses, expressions, and illuminations in the tensor algebra framework, which consists of an image tensor and a model tensor. The image tensor estimates image variations such as pose, expression, and illumination of the input image using two different variation estimation techniques: discrete and continuous variation estimation. The model tensor generates variation-specific AAM basis vectors from the estimated image variations, which leads to more accurate fitting results. To validate the usefulness of the tensor-based AAM, we performed variation-robust face recognition using the tensor-based AAM fitting results. To do, we propose indirect AAM feature transformation. Experimental results show that tensor-based AAM with continuous variation estimation outperforms that with discrete variation estimation and conventional AAM in terms of the average fitting error and the face recognition rate.
Conference Paper
Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes. Our multilinear modeling technique employs a tensor extension of the conventional matrix singular value decomposition (SVD), known as the N-mode SVD.As a concrete example, we consider the multilinear analysis of ensembles of facial images that combine several modes, including different facial geometries (people), expressions, head poses, and lighting conditions. Our resulting "TensorFaces" representation has several advantages over conventional eigenfaces. More generally, multilinear analysis shows promise as a unifying framework for a variety of computer vision problems.
Matlab tensor toolbox version 2.5. Available online
  • T G K Brett
  • W Bader
T. G. K. Brett W. Bader et al. Matlab tensor toolbox version 2.5. Available online, January 2012.