Conference Paper

Using 3D computer graphics for perception: the role of local and global information in face processing.

DOI: 10.1145/1272582.1272586 Conference: Proceedings of the 4th Symposium on Applied Perception in Graphics and Visualization, APGV 2007, At Tübingen, Germany
Source: DBLP

ABSTRACT Everyday life requires us to recognize faces under transient changes in pose, expression and lighting conditions. Despite this, humans are adept at recognizing familiar faces. In this study, we focused on determining the types of information human observers use to recognize faces across variations in viewpoint. Of specific interest was whether holistic information is used exclusively, or whether the local information contained in facial parts (featural or component information), as well as their spatial relationships (configural information) is also encoded. A rigorous study investigating this question ahs not previously been possible, as the generatio of a suitable set of stimuli using standard image manipulation techniques was not feasible. A 3D database of faces that have been processed to extract morphable models (Blanz & Vetter, 1999) allows us to generate such stimuli efficiently and with a high degree of control over display parameters. Three experiments were conducted, modeled after the inter-extra-ortho experiments by Bülthoff & Edelman, 1992. The first experiment served as a baseline for the subsequent two experiments. Ten face-stimuli were presented from a frontal view and from a 45 degree side view. At test, they had to be recognized among ten distractor faces shown from different viewpoints. We found systematic effects of viewpoint, in that the recognition performance increased as the angle between the learned view and the tested view decreased. This finding is consistent with face processing models based on 2D-view interpolation. Experiments 2 and 3 were the same as Experiment 1 except for the fact that in the testing phase, the faces were presented scrambled or blurred. Scrambling was used to isolate featural from configural information. Blurring was used to provide stimuli in which local featural information was reduced. The results demonstrated that human observers are capable of recognizing faces across different viewpoints on the sole basis of isolated featural information and of isolated configural information.

Full-text

Available from: Adrian Schwaninger, May 28, 2015
0 Followers
 · 
90 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a framework for component-based face alignment and representation that demonstrates improvements in matching performance over the more common holistic approach to face alignment and representation. This work is motivated by recent evidence from the cognitive science community demonstrating the efficacy of component-based facial representations. The component-based framework presented in this paper consists of the following major steps: 1) landmark extraction using Active Shape Models (ASM), 2) alignment and cropping of components using Procrustes Analysis, 3) representation of components with Multiscale Local Binary Patterns (MLBP), 4) per-component measurement of facial similarity, and 5) fusion of per-component similarities. We demonstrate on three public datasets and an operational dataset consisting of face images of 8000 subjects, that the proposed component-based representation provides higher recognition accuracies over holistic-based representations. Additionally, we show that the proposed component-based representations: 1) are more robust to changes in facial pose, and 2) improve recognition accuracy on occluded face images in forensic scenarios.
    IEEE Transactions on Information Forensics and Security 01/2013; 8(1):239-253. DOI:10.1109/TIFS.2012.2226580 · 2.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations generated by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.
    02/2012; DOI:10.1109/TVCG.2012.67
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The hypothesis of the present study is that features of abstract face-like patterns can be perceived in the archi-tectural design of selected house façades and trigger emo-tional responses of observers. In order to simulate this phe-nomenon, which is a form of pareidolia, a software system for pattern recognition based on statistical learning was ap-plied. One-class classification was used for face detection and an eight-class classifier was employed for facial ex-pression analysis. The system was trained by means of a database consisting of 280 frontal images of human faces that were normalised to the inner eye corners. A separate set of test images contained human facial expressions and selected house façades. The experiments demonstrated how facial expression patterns associated with emotional states such as surprise, fear, happiness, sadness, anger, disgust, contempt or neutrality could be identified in both types of test images, and how the results depended on preprocessing and parameter selection for the classifiers.