Article

Parametric Generation of Facial Expressions Based on FACS

Wiley
Computer Graphics Forum
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents a parametric performance-based model for facial animation that was inspired by Facial Action Coding System (FACS) developed by P. Ekman and F. W. Friesen. The FACS consists of 44 Action Units (AUs) corresponding to visual changes on the face. Additionally, predefined co-occurrence rules describe how different AUs influence each other. In our model, each facial animation parameter corresponds to one of the AUs as defined in FACS. Implementation of the model is completed with methods for accumulating displacement from separate AUs together, and fuzzy-logical adaptation of co-occurrence rules from the FACS. We also describe the method for adapting our model to a specific person.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Motivated by this, we have developed a system that allow average users to generate facial animations using a synthetic 3D face based on Facial Coding System (FACS – [13]) in a simple manner [31]. We have also built a dictionary of facial expression (FED – [19]) that stores the facial expressions that naturally occur in face to face communication. ...
... The research field of ECA [5] attempts to create agents that include emotion, personality and convention properties as humans do in face-to-face conversation. Fig. 2. A snapshot facial expression modulator [31] Directly animating human faces with speech is challenging because there are so many parameters to be controlled for realistic facial expressions. To alleviate such difficulties for animators, the BEAT system takes annotated text to be spoken by an animated figure as input, and outputs appropriate synchronized nonverbal behaviors and synthesized speech in a form that can be sent to a number of different animation systems [6]. ...
... By the employment of virtual human actor(s) many researchers showed that imitating human face-toface conversation could facilitate a robust and natural human-machine interaction [24][30]. Motivated by this, we have developed a system that allow average users to generate facial animations using a synthetic 3D face based on Facial Coding System (FACS – [13]) in a simple manner [31]. We have also built a dictionary of facial expression (FED – [19]) that stores the facial expressions that naturally occur in face to face communication. ...
Article
Full-text available
Web lectures have many positive aspects, e.g. they enable learners to easily control the learning experiences). To develop high-quality online learning materials takes a lot of time and human efforts [2]. An alternative is to develop a digital teacher. We have developed a prototype of a synthetic 3D face that shows emotion associated to text-based speech in an automated way. As a first step, we studied how humans express emotions in face-to-face communication. Based on this study, we have developed a 2D affective lexicon and a set of rules that describes dependencies between linguistic contents and emotions.
... A computer rendered virtual head -also the first occurrence of a "talking head" -has a wide range of freedom in terms of aesthetics and functional design and allows extending exciting areas already explored with avatars such as lifelikeness and human-like behaviours, see (Prendinger & Ishizuka, 2005) and more recently (Pelachaud, 2005) and (Peters & Qureshi, 2010 Early on, CGI researchers (see (Wojdel & Rothkrantz, 2005) for modelling) and vision researchers (Pantic & Rothkrantz, 2000) have based their work on Ekman's facial action coding system (FACS) which has been refined over the years and yielded the newest version in 2002. Briefly, FACS divides the face in 44 basic Action Units (AU) that are involved in facial expressions. ...
... It is possible (and often the case) that some vertices belong to more than one AU and conflicts can arise. However AU normalization allows precise blending of AUs togetheradditionally, rules can be applied similar to Wojdel (Wojdel & Rothkrantz, 2005) -keeping facial expressions consistent and scalable across 3D models. ...
Thesis
Full-text available
As people respond strongly to faces and facial features, both con sciously and subconsciously, faces are an essential aspect of social robots. Robotic faces and heads until recently belonged to one of the following categories: virtual, mechatronic or animatronic. As an orig inal contribution to the field of human-robot interaction, I present the R-PAF technology (Retro-Projected Animated Faces): a novel robotic head displaying a real-time, computer-rendered face, retro-projected from within the head volume onto a mask, as well as its driving soft ware designed with openness and portability to other hybrid robotic platforms in mind. The work constitutes the first implementation of a non-planar mask suitable for social human-robot interaction, comprising key elements of social interaction such as precise gaze direction control, facial ex pressions and blushing, and the first demonstration of an interactive video-animated facial mask mounted on a 5-axis robotic arm. The LightHead robot, a R-PAF demonstrator and experimental platform, has demonstrated robustness both in extended controlled and uncon trolled settings. The iterative hardware and facial design, details of the three-layered software architecture and tools, the implementation of life-like facial behaviours, as well as improvements in social-emotional robotic communication are reported. Furthermore, a series of evalua tions present the first study on human performance in reading robotic gaze and another first on user’s ethnic preference towards a robot face.
... As reported by psychologists, human communication relies a lot on facial expressions to convey emotion[19]. Ekman's facial action coding system (FACS) has been refined over the years and yielded the newest version in 2002 upon which computer graphics/vision researchers based their work ([20] for modeling, [21] for recognition). Briefly, FACS divides the face in 44 basic Action Units (AU) that are involved in facial expressions. ...
... The original 3D model was defined with a neutral face so this is equivalent to all AUs set at an intensity of 0. Each AU effect is then modeled on the face with a maximum activation defined as 1. This normalization allows us to precisely blend AUs together (similar to Wojdel [20]) and keep consistent values across the AUs involved in a facial expression, thus differing from FACS intensity classification. Consequently, we are using a weighted AU approach to define a facial expression E: ...
Conference Paper
Full-text available
This paper presents a new implementation of a robot face using retro-projection of a video stream onto a semitransparent facial mask. The technology is contrasted against mechatronic robot faces, of which Kismet is a typical example, and android robot faces, as used on the Ishiguro robots. The paper highlights the strengths of Retro-projected Animated Faces (RAF) technology (with cost, flexibility and robustness being notably strong) and discusses potential developments.
... For a greater range of facial expressions there is active research into facial visualisation techniques. This includes 2D graphics [13,14] as well as physical 3D models [15]. These techniques seek for realism in the displayed face and are loosing on the expressiveness of the exaggerated emojis. ...
Preprint
Full-text available
The key challenge of affective computing is to translate subjective emotional experiences into measurable data. Most recent advances in that field have relied on facial expressions as indicators for inner emotional states. Our current understanding of these expressions is categorical, i.e. there are some subjective feelings that are understood to be indicated by specific faces. There is no agreed measure for their distance and no clear rule for how emotions can combine or do exclude. The computational processing of emotions therefore mostly relies on compositional approaches using intependent dimensions, like valence and arousal. However, these methods lack a consistent mapping to a facial expression that turned out so crucial for the identification of emotional states. This paper seeks to solve this problem by introducing a consistent facial visualisation for a five dimensional model. A graphical representation as a comic style facial expression is provided in examples, explanations and code. It visualises emotions in a comically exaggerated style, similar to the successful emojis used in electronic communication. All graphical parameters depend linearly on the input dimensions (valence, arousal, dominance, contempt and control). This might be a rough and crude method. It might not even be a good one, but it is at least something linear in the space of emotions. Empirical results from crowd workers confirm that the encoded emotional information can be recorgnized intuitively and without further training.
... To reveal affective responses to stimuli, facial expressions were measured while the participants were looking at the packaging designs. Facial expressions are defined as facial muscle motions, which can be linked to specific emotion states (Wojdel and Rothkrantz 2005;Lewinski, den Uyl, and Butler 2014;Ekman, 1992). The automated emotions measurements provide more accurate results than self-report measures, as being less cognitively biased and hence less subjective (Poels and Dewitte 2006). ...
Article
This paper aims to find out the differences between Northern European and Northeast Asian consumers product packaging designs perception and choice (based on preference). Focusing on two important product packaging visual design elements – color and picture location – effect on attention, emotions and preference-based choice. Data were obtained in two different culture groups, via two different studies, with a methodological contribution by combining three different methods: eye tracking, facial expressions-based emotion measurement and conjoint analysis. Study 1 (n = 57) findings showed that some level of adaptation strategy is needed for the Northern European and Northeast Asian markets. Even if the differences did not reveal in customers’ explicit preferences (in study 2, n = 258), they can be revealed in implicit, subconscious processes (emotions, attention), which also indicates the need to use a multimethod approach to fully understand different processes of consumer behavior.
... In addition to eye tracking, we used automated facial expressions recognition to detect positive emotions people had. Facial expressions are the motions in facial muscles that are linked to emotional states (Wojdel and Rothkrantz, 2005;Lewinski et al., 2014). An autonomic facial expression measuring enables to see objective responses to the sales flyers, in addition to asking subjective feedback and opinion. ...
Article
Purpose The purpose of this paper is to show how analysing sales flyers with a combination of eye tracking, measurement of emotions, interview and content analysis can give an in-depth understanding on how different design aspects influence sales flyers’ effectiveness as a communication tool. The paper shows the relationship between different sales flyer design principles and a person’s preference towards it, as well as the intent to read it. Design/methodology/approach The paper chose for pilot study using eye tracking and emotions measurement to analyse retail sales flyers. In addition, interviews and content analysis were conducted to fully understand which aspects of sales flyer design influenced consumers. Findings The paper’s main findings are that sales flyers that evoke more positive emotions are prone to be chosen, and the attention and the view time of content pages is related to the number of elements on the page, page coherence and the location of the offers. Research limitations/implications This research uses eye tracking were sales flyers are shown on screen, which is not a natural way to read sales flyers. Future research should aim to test this methodology and prepositions in the natural environment. Practical implications The paper includes implications for designing better sales flyers. Originality/value To the authors’ knowledge, sales flyers have never been studied with a research design combining eye tracking, measurement of emotions, interview, content analysis and preferences.
... Further, there are some studies in this area whereby the Facial Action Coding Systems (FACS) based on the facial muscle movements are utilised e.g. [6,15,16]. ...
Chapter
In this paper we discuss a novel method of mathematically modelling facial action units for accurate representation of human facial expressions in 3-dimensions. Our method utilizes the approach of Facial Action Coding System (FACS). It is based on a boundary-value approach, which utilizes a solution to a fourth order elliptic Partial Differential Equation (PDE) subject to a suitable set of boundary conditions. Here the PDE surface generation method for human facial expressions is utilized in order to generate a wide variety of facial expressions in an efficient and realistic way. For this purpose, we identify a set of boundary curves corresponding to the key features of the face which in turn define a given facial expression in 3-dimensions. The action units (AUs) relating to the FACS are then efficiently represented in terms of Fourier coefficients relating to the boundary curves which enables us to store both the face and the facial expressions in an efficient way.
... At MMI-Group TUDelft, there is a project running on natural human computer interaction. We have developed a synthetic 3D face (Wojdel and Rothkrantz 2005) based on Facial Action Coding System (FACS) (Ekman and Friesen 1975). The system allows average users to generate facial animations in a simple manner. ...
Article
Full-text available
The human face in particular serves not only communicative functions, but it is also the primary channel to express emotion. We develop a prototype of a synthetic 3D face that shows emotion associated to text-based speech in an automated way. As a first step we studied how many and what kind of emotional expressions produced by humans during conversations. The next, we studied the correlation between the displayed facial expressions and text. Based on these results, we developed a set of rules that describes dependencies between text and emotions by the employment of ontology. For this purpose, a 2D affective lexicon database has been built using WordNet database and the specific facial expressions are stored in a nonverbal dictionary. The results described in this paper enable affective-based multimodal fission.
... FACSbased face models have been used to control facial animation (e.g. Wojdel & Rothkrantz, 2005). Currently, state of the art methods for realistic facial animation used in video games and feature films use FACS to drive models derived from motion capture data (Parag, 2006). ...
Chapter
Full-text available
In this chapter we showed that it is possible to train a deep belief net with multiple layers of features to function as an animation system capable of converting high-level descriptions of facial attributes into realistic face images. By specifying particular labels to the DBN, we were able to generate realistic faces displaying specific identities and facial actions. In addition, the DBN could generalize from the associations it learned during training to synthesize novel combinations of identities and facial actions. Thus, like the human brain,
... Meanwhile, the analysis of facial features has been one of the challenging problems in vision-based face modeling and animation. Especially, the facial expression retargeting is considered a critical work for human-centered interface design and even facial expression cloning [3,8,9]. Many studies have been done for recovering face motion from image sequences [4,5,6,7]. ...
Conference Paper
Full-text available
This paper introduces a novel approach for vision-based head motion tracking and facial expression cloning to create the realistic facial animation of 3D avatar in real time. The exact head pose estimation and facial expression tracking are critical problems to be solved in developing a vision based computer animation. The proposed method consists of dynamic head pose estimation and facial expression cloning. The proposed head pose estimation technique can robustly estimate 3D head pose from a sequence of input video images. Given an initial reference template of head image and corresponding 3D head pose, full the head motion is recovered by projecting a cylindrical head model to the face image. By updating the template dynamically, it is possible to recover head pose robustly regardless of light variation and self-occlusion. In addition, to produce a realistic 3D face animation, the variation of major facial feature points is tracked by use of optical flow and retargeted to the 3D avatar. We exploit Gaussian RBF to deform the local region of 3D face model around the major feature points. During the model deformation, the clusters of the regional feature points around the major facial features are estimated and the positions of the clusters are changed according to the variation of the major feature points. From the experiments, we can prove that the proposed vision-based animation technique efficiently estimate 3D head pose and produce realistic 3D facial animation rather than using feature-based tracking method.
... Meanwhile, the analysis of facial features has been one of the challenging problems in computer vision field. Especially, the facial expression retargeting is considered a critical work for human-centered interface design and even facial expression cloning[9,11,12]. Many studies have been done for recovering face motion from image sequences[2,3,4,5]. ...
Conference Paper
Full-text available
This paper presents a new approach to estimate 3D head pose from a sequence of input images and retarget facial expression to 3D face model using RBF(Radial Based Function) for vision-based animation. The exact head pose estimation and facial motion tracking are critical problems to be solved in developing a vision based human computer interaction or animation. Given an initial reference template of head image and corresponding 3D head pose, full the head motion is recovered by projecting a cylindrical head model to the face image. By updating the template dynamically, it is possible to recover head pose robustly regardless of light variation and self-occlusion. Moreover, to produce a realistic 3D face model, we utilize Gaussian RBF to deform the 3D face model according to the detected facial feature points from input images. During the model deformation, the clusters of the minor feature points around the major facial features are estimated and the positions of the clusters are changed according to the variation of the major feature points. From the experiments, the proposed method can efficiently estimate and track the 3D head pose and create a realistic 3D facial animation model.
Conference Paper
Emotions play an important role in distant learning. Negative emotions are one of the main reasons of high drop-off. Real time assessment of the emotional state of the learner is complex and that complicates real time adaptation of the learning context to boost the motivation of learners. In this paper we introduce the use of avatars with facial expressions corresponding with the emotional state of the learner. Facial expressions of learners are recorded and analysed. This enables the use of personalised Avatars and emotional facial expression. The system and results of experiments are described in this paper.
Chapter
Full-text available
1. INTRODUCIÓN OU CHÁMALLE BURRO AO CABALO! En 1900, Herr von Osten mercou un cabalo en Berlín, Alemaña. Cando von Osten comezou a adestrar o seu cabalo, Hans, para que aprendese a contar mediante golpes cunha das súas patas dianteira, non imaxinaba que Hans se ía converter no cabalo máis famoso da historia (Pfungst 2000 [1907]). Hans aprendeu rapidamente a contar e enseguida soubo sumar, res-tar, multiplicar, dividir e mesmo aprendeu a solucionar problemas de frac-cións. Como isto non lle parecía abondo, von Osten exhibiu a Hans en sesións públicas, nas que o cabalo contaba a cantidade de persoas presentes ou simplemente a cantidade de persoas que gastaban lentes. Sempre respondía escaravellando na terra e dando golpes cunha das súas patas dianteiras. Así Hans podía dicir que hora era, usar o calendario, lembrar o ton dunha música e facer moitas outras proezas. Despois de que von Osten lle ensinase a Hans un alfabeto que podía codificar con golpes de casco, o cabalo puido respostar practicamente calquera pregunta, oral ou escrita. Enseguida recibiu o alcume de Hans " o listo ". Debido ás repercusións en varios campos da ciencia e tamén a que moitas persoas desconfiaban de que existise algunha trampa, constituíuse un comité investigador para decidir se nas actuacións de Hans había algún engano. Para integrar esta comisión de expertos en cabalos convocaron a un profesor de psicoloxía e filoloxía, a un director de circo, veterinarios e oficiais de cabalaría. Un experimento realizado con Hans, no que excluíron a von Osten, non presentou ningunha mudanza na aparente intelixencia de Hans; a comisión afirmou que non había ningunha fraude. A convocatoria dunha segunda comisión foi o comezo da fin de Hans " o listo ". Pedíuselle a von Osten que murmurase un número no ouvido esquerdo do cabalo, mentres outro experimentador facía o mesmo no dereito.
Conference Paper
This paper presents an improved method for facial expression animation based on the radial basis functions. The existing facial action coding system is adopted for expression parameterization. A set of parameters that are able to generate proper expressions is carefully selected through a series of analyses and comparisons. A deformation algorithm combining compactly supported radial basis functions with a geodesic distance metric is employed to address the difficulties in generating facial expressions, such as localized deformation and hole handling. In this work, the complex manipulation of a 3D face mesh is transformed into simple linear parameter adjustments, which is intuitive and efficient. Implementation results have demonstrated that real-time interactive mesh modification for facial expression animation is achieved.
Article
With the growing number of researchers interested in modeling the inner workings of affective social intelligence, the need for tools to easily model its associated expressions has emerged. The goal of this article is two-fold: 1) we describe HapFACS, a free software and API that we developed to provide the affective computing community with a resource that produces static and dynamic facial expressions for three-dimensional speaking characters; and 2) we discuss results of multiple experiments that we conducted in order to scientifically validate our facial expressions and head animations in terms of the widely accepted Facial Action Coding System (FACS) standard, and its Action Units (AU). The result is that users, without any 3D-modeling nor computer graphics expertise, can animate speaking virtual characters with FACS-based realistic facial expression animations, and embed these expressive characters in their own application(s). The HapFACS software and API can also be used for generating repertoires of realistic FACS-validated facial expressions, useful for testing emotion expression generation theories.
Article
Facial expressions play an important role in human communication. Accurate recognition of facial expressions is important to understand nonverbal communication. Many tools have been developed but to recognize facial impressions in real life communications independent of lighting conditions, posture, occlusion and different intensities is still an unsolved problem. In the underlying paper we researched facial movements in daily interaction by tracking painted markers on the face to assess the displayed facial expressions. We analysed recordings of text based communication and computed the dictionary of emotional facial expressions and nonverbal grammatical rules.
Article
This paper presents an approach for reproducing optimal 3-D facial expressions based on blendshape regression. It aims to improve fidelity of facial expressions but maintain the efficiency of the blendshape method, which is necessary for applications such as human–machine interaction and avatars. The method intends to optimize the given facial expression using action units (AUs) based on the facial action coding system recorded from human faces. To help capture facial movements for the target face, an intermediate model space is generated, where both the target and source AUs have the same mesh topology and vertex number. The optimization is conducted interactively in the intermediate model space through adjusting the regulating parameter. The optimized facial expression model is transferred back to the target facial model to produce the final facial expression. We demonstrate that given a sketched facial expression with rough vertex positions indicating the intended facial expression, the proposed method approaches the sketched facial expression through automatically selecting blendshapes with corresponding weights. The sketched expression model is finally approximated through AUs representing true muscle movements, which improves the fidelity of facial expressions.
Article
This paper presents vision-based 3D facial expression animation technique and system which provide the robust 3D head pose estimation and real-time facial expression control. Many researches of 3D face animation have been done for the facial expression control itself rather than focusing on 3D head motion tracking. However, the head motion tracking is one of critical issues to be solved for developing realistic facial animation. In this research, we developed an integrated animation system that includes 3D head motion tracking and facial expression control at the same time. The proposed system consists of three major phases: face detection, 3D head motion tracking, and facial expression control. For face detection, with the non-parametric HT skin color model and template matching, we can detect the facial region efficiently from video frame. For 3D head motion tracking, we exploit the cylindrical head model that is projected to the initial head motion template. Given an initial reference template of the face image and the corresponding head motion, the cylindrical head model is created and the foil head motion is traced based on the optical flow method. For the facial expression cloning we utilize the feature-based method, The major facial feature points are detected by the geometry of information of the face with template matching and traced by optical flow. Since the locations of varying feature points are composed of head motion and facial expression information, the animation parameters which describe the variation of the facial features are acquired from geometrically transformed frontal head pose image. Finally, the facial expression cloning is done by two fitting process. The control points of the 3D model are varied applying the animation parameters to the face model, and the non-feature points around the control points are changed by use of Radial Basis Function(RBF). From the experiment, we can prove that the developed vision-based animation system can create realistic facial animation with robust head pose estimation and facial variation from input video image.
Conference Paper
Emotion influences the choice of facial expression. In a dialogue the emotional state is co-determined by the events that happen during a dialogue. To enable rich, human like expressivity of a dialogue agent, the facial displays should show a correct expression of the state of the agent in the dialogue. This paper reports about our study in building knowledge on how to appropriately express emotions in face to face communication. We have analyzed the appearance of facial expressions and corresponding dialogue-text (in balloons) of characters of selected cartoon illustrations. From the facial expressions and dialogue-text, we have extracted independently the emotional state and the communicative function. We also collected emotion words from the dialogue-text. The emotional states (label) and the emotion words are represented along two dimensions “arousal” and “valence”. Here, the relationship between facial expressions and text were explored. The final goal of this research is to develop emotional-display rules for a text-based dialogue agent.
Article
Full-text available
This paper presents a novel approach for facial motion tracking and facial expression cloning to create a realistic facial animation of a 3D avatar. The exact head pose estimation and facial expression tracking are critical issues that must be solved when developing vision-based computer animation. In this paper, we deal with these two problems. The proposed approach consists of two phases: dynamic head pose estimation and facial expression cloning. The dynamic head pose estimation can robustly estimate a 3D head pose from input video images. Given an initial reference template of a face image and the corresponding 3D head pose, the full head motion is recovered by projecting a cylindrical head model onto the face image. It is possible to recover the head pose regardless of light variations and self-occlusion by updating the template dynamically. In the phase of synthesizing the facial expression, the variations of the major facial feature points of the face images are tracked by using optical flow and the variations are retargeted to the 3D face model. At the same time, we exploit the RBF (Radial Basis Function) to deform the local area of the face model around the major feature points. Consequently, facial expression synthesis is done by directly tracking the variations of the major feature points and indirectly estimating the variations of the regional feature points. From the experiments, we can prove that the proposed vision-based facial expression cloning method automatically estimates the 3D head pose and produces realistic 3D facial expressions in real time.
Article
Full-text available
Recognition and simulation of actions performable on rigidly-jointed actors such as human bodies have been the subject of our research for some time. One part of an ongoing effort towards a total human movement simulator is to develop a system to perform the actions of American Sign Language (ASL). However, one of the “channels” of ASL communication, the face, presents problems which are not well handled by a rigid model. An integrated system for an internal representation and simulation of the face is presented, along with a proposed image analysis model. Results from an implementation of the internal model and simulation modules are presented, as well as comments on the future of computer controlled recognition of facial actions. We conclude with a discussion on extensions of the system, covering relations between flexible masses and rigid (jointed) ones. Applications of this theory into constrained actions, such as across rigid nonmoving sheets of bone (forehead, eyes) are also discussed.
Article
Full-text available
We propose a prototype of a facial surgery simulation system for surgical planning and the prediction of facial deformation. We use a physics-based human head model. Our head model has a 3D hierarchical structure that consists of soft tissue and the skull, constructed from the exact 3D CT patient data. Anatomic points measured on X-ray images from both frontal and side views are used to fire the model to the patient's head. The purposes of this research is to analyze the relationship between changes of mandibular position and facial morphology after orthognathic surgery, and to simulate the exact postoperative 3D facial shape. In the experiment, we used our model to predict the facial shape after surgery for patients with mandibular prognathism. Comparing the simulation results and the actual facial images after the surgery shows that the proposed method is practical.
Article
Full-text available
This article reports results from a program that produces high-quality animation of facial expressions and head movements as automatically as possible in conjunction with meaning-based speech synthesis, including spoken intonation. The goal of the research is as much to test and define our theories of the formal semantics for such gestures, as to produce convincing animation. Towards this end, we have produced a high-level programming language for three-dimensional (3-D) animation of facial expressions. We have been concerned primarily with expressions conveying information correlated with the intonation of the voice: This includes the differences of timing, pitch, and emphasis that are related to such semantic distinctions of discourse as “focus,” “topic,” and “comment,” “theme” and “rheme,” or “given” and “new” information. We are also interested in the relation of affect or emotion to facial expression. Until now, systems have not embodied such rule-governed translation from spoken utterance meaning to facial expressions. Our system embodies rules that describe and coordinate these relations: intonation/information, intonation/affect, and facial expressions/affect. A meaning representation includes discourse information: What is contrastive/background information in the given context, and what is the “topic” or “theme” of the discourse? The system maps the meaning representation into how accents and their placement are chosen, how they are conveyed over facial expression, and how speech and facial expressions are coordinated. This determines a sequence of functional groups: lip shapes, conversational signals, punctuators, regulators, and manipulators. Our algorithms then impose synchrony, create coarticulation effects, and determine affectual signals, eye and head movements. The lowest level representation is the Facial Action Coding System (FACS), which makes the generation system portable to other facial models.
Conference Paper
Full-text available
We present ongoing work on a project for automatic recognition of spon- taneous facial actions. Spontaneous facial expressions differ substan- tially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects delib- erately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an ap- proach based on 3-D warping of images into canonical views. We eval- uated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mech- anisms that can be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach pre- sented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images.
Conference Paper
Full-text available
Human faces are attractive and effective in every-day communica- tion. In human-computer interaction, because of the lack of suffi- cient knowledge and appropriate tools to model and animate realistic 3D faces, 2D cartoon faces are feasible alternatives with the extra appeal of 'beyond realism' features. We discuss CharToon, an interactive system to design and animate 2D cartoon faces. We give illustrations (also movies on CD) of the expressive and artistic effects which can be produced. CharToon is fully implemented in Java, allows real-time animation on PCs and through the Web. It has been used with success by different types of users.
Conference Paper
Full-text available
In this paper we present how to implement the co- occurrence rules defined by psychologist Paul Ekman in a computer animated face. The rules describe the depen- dencies between the atomic observable movements of the human face (so called Action Units). They are defined in a form suitable for a human observer who needs to pro- duce a consistent binary scoring of visible occurrences on the human face. They are not directly applicable to au- tomated animation systems that must deal with facial ge- ometry, smooth changes in the occurrences intensities etc. In order to be able to utilize the knowledge about human faces which is present in the work of Ekman, we chose a fuzzy logical approach by defining the co-occurrence rules as specific fuzzy-logical operators.
Conference Paper
Full-text available
We develop an automatic system to analyze subtle changes in upper face expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal image sequence. Our system recognizes fine-grained changes in facial expression based on Facial Action Coding System (FACS) action units (AUs). Multi-state facial component models are proposed for tracting and modeling different facial features, including eyes, brews, cheeks, and furrows. Then we convert the results of tracking to detailed parametric descriptions of the facial features. These feature parameters are fed to a neural network which recognizes 7 upper face action units. A recognition rate of 95% is obtained for the test data that include both single action units and AU combinations
Conference Paper
Full-text available
Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions (e.g., happiness and anger). Such prototypic expressions, however, occur infrequently. Human emotions and intentions are communicated more often by changes in one or two discrete facial features. We develop an automatic system to analyze subtle changes in facial expressions based on both permanent (e.g., mouth, eye, and brow) and transient (e.g., furrows and wrinkles) facial features in a nearly frontal image sequence. Multi-state facial component models are proposed for tracking and modeling different facial features. Based on these multi-state models, and without artificial enhancement, we detect and track the facial features, including mouth, eyes, brow, cheeks, and their related wrinkles and facial furrows. Moreover we recover detailed parametric descriptions of the facial features. With these features as the inputs, 11 individual action units or action unit combinations are recognized by a neural network algorithm. A recognition rate of 96.7% is obtained. The recognition results indicate that our system can identify action units regardless of whether they occur singly or in combinations
Article
Full-text available
Bimodal perception leads to better speech understanding than auditory perception alone. We evaluated the overall benefit of lip-reading on natural utterances of French produced by a single speaker. Eighteen French subjects with good audition and vision were administered a closed set identification test of VCVCV nonsense words consisting of three vowels [i, a, y] and six consonants [b, v, z, 3, R, l]. Stimuli were presented under both auditory and audio-visual conditions with white noise added at various signal-to-noise ratios. Identification scores were higher in the bimodal condition than in the auditory-alone condition, especially in situations where acoustic information was reduced. The auditory and audio-visual intelligibility of the three vowels [i, a, y] averaged over the six consonantal contexts was evaluated as well. Two different hierarchies of intelligibility were found. Auditorily, [a] was most intelligible, followed by [i] and then by [y]; whereas visually [y] was most intelligible, followed by [a] and [i]. We also quantified the contextual effects of the three vowels on the auditory and audio-visual intelligibility of the consonants. Both the auditory and the audio-visual intelligibility of surrounding consonants was highest in the [a] context, followed by the [i] context and lastly the [y] context.
Conference Paper
Full-text available
We describe how to create with machine learning techniques a generative, videorealistic, and speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence, which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned.
Conference Paper
Full-text available
The authors present a parametric model for facial animation and a method for adapting it to a specific person. Every facial expression can be described as a contraction or relaxation of the facial muscles. P. Ekman and F.W. Friesen (1975) selected 44 Action Units corresponding to visual changes on the face, which cannot be decomposed into smaller ones and which combinations universally represent all facial expressions. Our model for facial animation is built on the basis of that Facial Action Coding System (FACS). The model adaptation is based on performance measurements of the subject's facial movements. Our model combines the advantages of parametric animation models such as wireframe model independency with the accuracy of face movements reproduction (cloning) obtained with performance driven models. The described model forms a part of the facial animation system that is currently under development at Delft University of Technology. A brief description of this system is also given
Article
Full-text available
Automatic recognition of facial gestures (i.e., facial muscle activity) is rapidly becoming an area of intense interest in the research field of machine vision. In this paper, we present an automated system that we developed to recognize facial gestures in static, frontal- and/or profile-view color face images. A multidetector approach to facial feature localization is utilized to spatially sample the profile contour and the contours of the facial components such as the eyes and the mouth. From the extracted contours of the facial features, we extract ten profile-contour fiducial points and 19 fiducial points of the contours of the facial components. Based on these, 32 individual facial muscle actions (AUs) occurring alone or in combination are recognized using rule-based reasoning. With each scored AU, the utilized algorithm associates a factor denoting the certainty with which the pertinent AU has been scored. A recognition rate of 86% is achieved.
Article
Full-text available
Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more often communicated by changes in one or a few discrete facial features. In this paper, we develop an automatic face analysis (AFA) system to analyze facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal-view face image sequence. The AFA system recognizes fine-grained changes in facial expression into action units (AU) of the Facial Action Coding System (FACS), instead of a few prototypic expressions. Multistate face and facial component models are proposed for tracking and modeling the various facial features, including lips, eyes, brows, cheeks, and furrows. During tracking, detailed parametric descriptions of the facial features are extracted. With these parameters as the inputs, a group of action units (neutral expression, six upper face AU and 10 lower face AU) are recognized whether they occur alone or in combinations. The system has achieved average recognition rates of 96.4 percent (95.4 percent if neutral expressions are excluded) for upper face AU and 96.7 percent (95.6 percent with neutral expressions excluded) for lower face AU. The generalizability of the system has been tested by using independent image databases collected and FACS-coded for ground-truth by different research teams
Article
Full-text available
The first virtual humans appeared in the early 1980s in such films as Dreamflight (1982) and The Juggler (1982). Pioneering work in the ensuing period focused on realistic appearance in the simulation of virtual humans. In the 1990s, the emphasis has shifted to real-time animation and interaction in virtual worlds. Virtual humans have begun to inhabit virtual worlds and so have we. To prepare our place in the virtual world we first develop techniques for the automatic representation of a human face capable of being animated in real time using both video and audio input. The objective is for one's representative to look, talk, and behave like oneself in the virtual world. Furthermore, the virtual inhabitants of this world should be able to see our avatars and to react to what we say and to the emotions we convey. We sketch an overview of the problems related to the analysis and synthesis of face-to-virtual-face communication in a virtual world. We describe different components of our system for real-time interaction and communication between a cloned face representing a real person and an autonomous virtual face. It provides an insight into the various problems and gives particular solutions adopted in reconstructing a virtual clone capable of reproducing the shape and movements of the real person's face. It includes the analysis of the facial expression and speech of the cloned face, which can be used to elicit a response from the autonomous virtual human with both verbal and nonverbal facial movements synchronized with the audio voice
Article
Full-text available
The long term goal of our work is to predict visual confusion matrices from physical measurements. In this paper, four talkers were chosen to record 69 American-English Consonant-Vowel syllables with audio, video, and facial movements captured. During the recording, 20 markers were put on the face and an optical Qualisys system was used to track three-dimensional facial movements. The videotapes (with markers on the face and without sound) were presented to normal hearing viewers with average or above average lipreading ability, and visual confusion matrices were obtained. Results showed that the facial measurements were correlated with visual perception data by about 0.79 and account for about 63% of the variance.
Article
Full-text available
We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned.
Article
We present a new set of techniques for modeling and animating realistic faces from photographs and videos. Given a set of face photographs taken simultaneously, our modeling technique allows the interactive recovery of a textured 3D face model. By repeating this process for several facial expressions, we acquire a set of face models that can be linearly combined to express a wide range of expressions. Given a video sequence, this linear face model can be used to estimate the face position, orientation, and facial expression at each frame. We illustrate these techniques on several datasets and demonstrate robust estimations of detailed face geometry and motion.
Article
English and Italian encoders were asked to communicate two-dimensional shapes to decoders of their own culture, with and without the use of hand gestures, for materials of high and low verbal codability. The decoders drew what they thought the shapes were and these were rated by English and Italian judges, for similarity to the originals. Higher accuracy scores were obtained by both the English and the Italians, when gestures were allowed, for materials of both high and low codability; but the effect of using gestures was greater for materials of low codability. Improvement in performance when gestures were allowed was greater for the Italians than for the English for both levels of codability. An analysis of the recorded verbal utterances has shown that the detriment in communication accuracy with the elimination of gestures cannot be attributed to disruption of speech performance; rather, changes in speech content occur indicating an increased reliance on verbal means of conveying spatial information. Nevertheless, gestures convey this kind of semantic information more accurately and evidence is provided for the gestures of the Italians communicating this information more effectively than those of the English.
Article
This paper describes the representation, animation and data collection techniques that have been used to produce "realistic" computer generated half-tone animated sequences of a human face changing expression. It was determined that approximating the surface of a face with a polygonal skin containing approximately 250 polygons defined by about 400 vertices is sufficient to achieve a realistic face. Animation was accomplished using a cosine interpolation scheme to fill in the intermediate frames between expressions. This approach is good enough to produce realistic facial motion. The three-dimensional data used to describe the expressions of the face was obtained photogrammetrically using pairs of photographs.
Article
Realistic facial animation is achieved through geometric and image manipulations. Geometric deformations usually account for the shape and deformations unique to the physiology and expressions of a person. Image manipulations model the reflectance properties of the facial skin and hair to achieve small-scale detail that is difficult to model by geometric manipulation alone. Modeling and animation methods often exhibit elements of each realm. This paper summarizes the theoretical approaches used in published work and describes their strengths, weaknesses, and relative performance. Taxonomy groups the methods into classes that highlight their similarities and differences.
Article
We develop a new 3D hierarchical model of the human face. The model incorporates a physically-based approximation to facial tissue and a set of anatomically-motivated facial muscle actuators. Despite its sophistication, the model is efficient enough to produce facial animation at interactive rates on a high-end graphics workstation. A second contribution of this paper is a technique for estimating muscle contractions from video sequences of human faces performing expressive articulations. These estimates may be input as dynamic control parameters to the face model in order to produce realistic animation. Using an example, we demonstrate that our technique yields sufficiently accurate muscle contraction estimates for the model to reconstruct expressions from dynamic images of faces.
Article
A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Conference Paper
The development of a parameterized facial muscle process, that incorporates the use of a model to create realistic facial animation is described.Existing methods of facial parameterization have the inherent problem of hard-wiring performable actions. The development of a muscle process that is controllable by a limited number of parameters and is non-specific to facial topology allows a richer vocabulary and a more general approach to the modelling of the primary facial expressions.A brief discussion of facial structure is given, from which a method for a simple modelling of a muscle process that is suitable for the animation of a number of divergent facial types is described.
Article
This paper describes interactive facilities for simulating abstract muscle actions using rational free form deformations (RFFD). The particular muscle action is simulated as the displacement of the control points of the control-unit for an RFFD defined on a region of interest. One or several simulated muscle actions constitute a minimum perceptible action (MPA), which is defined as the atomic action unit, similar to action unit (AU) of the facial action coding system (FACS), to build an expression
Article
A new way of controlling human face animation and synchronizing speech is proposed. It is based on the concept of abstract muscle action procedure (AMA procedure). An AMA procedure is a specialized procedure which simulates the specific action of a face muscle. The paper describes the new technique and presents a methodology for animating the face of synthetic actors based on three levels: the AMA-procedure level, the expression level and the script level. The role of multiple tracks is also emphasized. Practical examples are also explained in detail, based on the film Rendez-vous a Montreal with the synthetic actors Marilyn Monroe and Humphrey Bogart
Article
Presents a computer model for the representation of human faces. This three-dimensional, parametric model produces shaded facial images. The face, constructed of polygonal surfaces, is manipulated through the use of parameters which control interpolation, translation, rotation and scaling of the various features. Thesis (Ph. D.)--University of Utah, 1974. Vita. Includes bibliographical references. Photocopy. s
Conference Paper
The paper presents a physically-based 3D facial model based on anatomical knowledge for facial expression animation. The facial model incorporates a physically-based approximation to facial skin and a set of anatomically-motivated facial muscles. The skin model is established through the use of a mass-spring system with nonlinear springs which simulate the elastic-dynamics of a real facial skin. Muscle models are developed to emulate facial muscle contraction. Lagrangian mechanics governs the dynamics, dictating the deformation of facial surface in response to muscular forces. We show that when surface regions are influenced by the large muscular force, the local deformation becomes inaccurate. The conventional method to deal with this problem is using a fine network, but it also increases the cost of computation. We therefore present an approach to adaptively refine the mass-spring facial model to a required accuracy. It generates more pleasing results at low computational expense
Article
Parameterized models can produce realistic, manipulable images of human faces— with a surprisingly small number of parameters.
Article
We present ongoing work on a project for automatic recognition of spontaneous facial actions. Spontaneous facial expressions differ substantially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects deliberately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an approach based on 3-D warping of images into canonical views. We evaluated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mechanisms that can be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach presented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images.
Article
This paper describes the prototype of a facial expression editor. In contrast to existing systems the presented editor takes advantage of both medical data for the simulation and the consideration of facial anatomy during the definition of muscle groups. The C 1 -continuous geometry and the high degree of abstraction for the expression editing sets this system apart from others. Using finite elements we achieve a better precision in comparison to particle systems. Furthermore, a precomputing of facial action units enables us to compose facial expressions by a superposition of facial action geometries in real-time. The presented model is based on a generic facial model using a thin plate and membrane approach for the surface and elastic springs for facial tissue modeling. It has been used successfully for performing facial surgery simulation. We illustrate features of our system with examples from the Visible Human Dataset.^TM Additional Keywords and Phrases: Facial Modeling,...
Article
This dissertation addresses the problems of modeling and animating realistic faces. The general approach followed is to extract information from face photographs and videos by applying image-based modeling and rendering techniques. Given a set of face photographs taken simultaneously, our modeling technique permits us to interactively recover a 3D face model. We present animation techniques based on morphing between face models corresponding to different expressions. We demonstrate that a wide range of expressions can be generated by forming linear combinations of a small set of initial expressions. Given a video sequence, this linear face model can be used to estimate the face position, orientation, and facial expression at each frame. The thesis also explores different applications of face tracking, such as performance...
Article
A major unsolved problem in computer graphics is the construction and animation of realistic human facial models. Traditionally, facial models have been built painstakingly by manual digitization and animated by ad hoc parametrically controlled facial mesh deformations or kinematic approximation of muscle actions. Fortunately, animators are now able to digitize facial geometries through the use of scanning range sensors and animate them through the dynamic simulation of facial tissues and muscles. However, these techniques require considerableuser input to construct facial models of individuals suitable for animation. In this paper, we present a methodology for automating this challenging task. Starting with a structured facial mesh, we develop algorithms that automatically construct functional models of the heads of human subjects from laser-scanned range and reflectance data. These algorithms automatically insert contractile muscles at anatomically correct positions within a dynamic skin model and root them in an estimated skull structure with a hinged jaw. They also synthesize functional eyes, eyelids, teeth, and a neck and fit them to the final model. The constructed face may be animated via muscle actuations. In this way, we create the most authentic and functional facial models of individuals available to date and demonstrate their use in facial animation. CR Categories: I.3.5 [Computer Graphics]: Physically based modeling; I.3.7 [Computer Graphics]: Animation. Additional Keywords: Physics-based Facial Modeling, Facial Animation, RGB/Range Scanners, Feature-Based Facial Adaptation, Texture Mapping, Discrete Deformable Models. 1
Face tovirtualface c ?The Eurographics Association and
  • N Magnenat
  • P Thalmann
  • M Kalra
  • Escher
N. Magnenat-Thalmann, P. Kalra, and M. Escher. Face tovirtualface.ProceedingsoftheIEEE,86(5),870–883, May 1998. c ?The Eurographics Association and Blackwell Publishing Ltd 2005
Abstract muscle action procedures for human face ani-mation. The Visual Computer c ?The Eurographics Association and Blackwell Publishing Ltd 2005 rA. Wojdel ? and L
  • N Magnenat
  • E Thalmann
  • D Primeau
  • Thalmann
N. Magnenat-Thalmann, E. Primeau, and D. Thalmann. Abstract muscle action procedures for human face ani-mation. The Visual Computer, 3(5) 290–297, 1988. c ?The Eurographics Association and Blackwell Publishing Ltd 2005 rA. Wojdel ? and L. J. M. Rothkrantz/Parametric Generation of Facial Expressions Based on FACS 75
Audio-visual intelligibility of french speech in noise
  • C Benoît
  • T K S Mohamadi
C. Benoît and T. K. S. Mohamadi. Audio-visual intelligibility of french speech in noise. Journal of Speech and Hearing Research, 37, 1195-1203, 1994.
Animating facial ex-pressions. Computer Graphics (SIGGRAPH'81)
  • S M Platt
  • N I Badler
S. M. Platt, and N. I. Badler. Animating facial ex-pressions. Computer Graphics (SIGGRAPH'81), 15(3), 245–252, August 1981.
A fast, efficient, accurate way to represent the human face
  • Kleiser J.
J. Kleiser. A fast, efficient, accurate way to represent the human face. In State of the Art in Facial Animation, SIG-GRAPH'89 Tutorials (New York, 1989), Vol. 22, ACM, pp. 37-40, 1989.
The faces of suicidal depression (translation les visages de la depression de suicidal)
  • Heller M.
M. Heller, and V. Haynal. The faces of suicidal depression (translation les visages de la depression de suicidal).
The MPEG-4 video standard and its potential for future multimedia applications
  • T Sikora
T. Sikora. The MPEG-4 video standard and its potential for future multimedia applications. In Proceedings IEEE ISCAS Conference (Hongkong, June 1997).
Teach Yourself Body Language
  • G Wainright
G. Wainright. Teach Yourself Body Language, 2 ed. McGraw–Hill, January 2003.
The faces of suicidal depression (translation les visages de la depression de suicidal) Kahiers Psychiatriques Genevois (Medicine et Hygiene Editors) l1
  • M Heller
  • V Haynal
M. Heller, and V. Haynal. The faces of suicidal depression (translation les visages de la depression de suicidal). Kahiers Psychiatriques Genevois (Medicine et Hygiene Editors) l1, 107–117, 1994.
  • F I Parke
  • K Waters
F. I. Parke, and K. Waters. Computer Facial Animation. A. K. Peters, Ltd., Wellesley, MA, USA, 1996.