Robust Facial Feature Tracking Using Shape-Constrained Multiresolution-Selected Linear Predictors

Centre for Vision, Speech & Signal Process., Univ. of Surrey, Guildford, UK
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.78). 10/2011; 33(9):1844 - 1859. DOI: 10.1109/TPAMI.2010.205
Source: IEEE Xplore


This paper proposes a learned data-driven approach for accurate, real-time tracking of facial features using only intensity information. The task of automatic facial feature tracking is nontrivial since the face is a highly deformable object with large textural variations and motion in certain regions. Existing works attempt to address these problems by either limiting themselves to tracking feature points with strong and unique visual cues (e.g., mouth and eye corners) or by incorporating a priori information that needs to be manually designed (e.g., selecting points for a shape model). The framework proposed here largely avoids the need for such restrictions by automatically identifying the optimal visual support required for tracking a single facial feature point. This automatic identification of the visual context required for tracking allows the proposed method to potentially track any point on the face. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel intensities into tracked feature position displacements. Building upon the simplicity and strengths of linear predictors, a more robust biased linear predictor is introduced. Multiple linear predictors are then grouped into a rigid flock to further increase robustness. To improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multiresolution LP model. Finally, we also exploit a simple shape constraint for correcting the occasional tracking failure of a minority of feature points. Experimental results show that this method performs more robustly and accurately than AAMs, with minimal training examples on example sequences that range from SD quality to Youtube quality. Additionally, an analysis of the visual support consistency across different subjects is also provided.

Download full-text


Available from: Richard Bowden, Oct 05, 2015
46 Reads
  • Source
    • "Several methods, such as [6], [16] and [20], propose to exploit discriminatively-trained regressors to propose new object states, while a classifier was used in combination to validate the predictions. [15] also used a sequence of linear regressors for facial feature tracking. Furthermore, the successful cascade of linear regressors, popularised by [22] for face alignment, can be traced back to a model-free tracking work [27], in which the authors proposed to learn a sequence of linear regressors (referred to as predictors), each of increased precision but lower robustness. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel approach to part-based tracking by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations. We propose to use cascaded regression with incremental learning to track generic objects without any prior knowledge of an object's structure or appearance. We exploit the spatial constraints between parts by implicitly learning the shape and deformation parameters of the object in an online fashion. We integrate a multiple temporal scale motion model to initialise our cascaded regression search close to the target and to allow it to cope with occlusions. Experimental results show that our tracker ranks first on the CVPR 2013 Benchmark.
    International Conference on Computer Vision, Chile; 12/2015
  • Source
    • "Consequently, a growing number of face image-based applications have been developed and investigated. These include face detection (Zhang and Zhang 2010), alignment (Liu 2009), tracking (Ong and Bowden 2011), modeling (Tao et al. 2008), and recognition (Chellappa et al. 1995; Zhao et al. 2003) for security control, surveillance monitoring, authentication, biometrics, digital entertainment and rendered services for a legitimate user only, and age synthesis and estimation (Fu et al. 2010) for explosively emerging real-world applications such as forensic art, electronic customer relationship management , and cosmetology. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.
    International Journal of Computer Vision 09/2013; 106(1). DOI:10.1007/s11263-013-0645-9 · 3.81 Impact Factor
  • Source
    • "This FBT is accurate for simultaneous head and facial feature tracking but inherits the drawbacks of stereo vision and optical flow computation; Namely, this system is restricted to controlled illumination, it requires pre-calibration and it is sensitive to large variations in head pose and facial feature position. Instead, [3] proposes a statistical method based on a set of linear predictors modelling intensity information for accurate and real-time tracking of facial features. Active Shape Models (ASM) [4] are an alternative to FBT. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose an On-line Appearance-Based Tracker (OABT) for simultaneous tracking of 3D head pose, lips, eyebrows, eyelids and irises in monocular video sequences. In contrast to previously proposed tracking approaches, which deal with face and gaze tracking separately, our OABT can also be used for eyelid and iris tracking, as well as 3D head pose, lips and eyebrows facial actions tracking. Furthermore, our approach applies an on-line learning of changes in the appearance of the tracked target. Hence, the prior training of appearance models, which usually requires a large amount of labeled facial images, is avoided. Moreover, the proposed method is built upon a hierarchical combination of three OABTs, which are optimized using a Levenberg–Marquardt Algorithm (LMA) enhanced with line-search procedures. This, in turn, makes the proposed method robust to changes in lighting conditions, occlusions and translucent textures, as evidenced by our experiments. Finally, the proposed method achieves head and facial actions tracking in real-time.
    Image and Vision Computing 04/2013; 31(4):322–340. DOI:10.1016/j.imavis.2013.02.001 · 1.59 Impact Factor
Show more