Conference Paper

An Evaluation of Other-Avatar Facial Animation Methods for Social VR

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The role that perceptions of behavioural realism of avatars play in user experience during VR-based social interactions has received considerable research interest in recent years [1][2][3][4][5][6][7][8][9][10]. The behavioural realism of an avatar can be defined by the quality of the avatar's movements, including gestures and facial gestures [11][12][13]. ...
... Studies that involve VR social interactions often lack full-body motion capture avatars, thus limiting their behavioural realism [6,8,35,39,[54][55][56][57][58]. Furthermore, many of the studies that include user appraisal of full-body motion capture avatars are not conducted within a social interaction context that resembles a personable face-to-face conversation [2,24,28,30,37,53,57]. ...
... Furthermore, many of the studies that include user appraisal of full-body motion capture avatars are not conducted within a social interaction context that resembles a personable face-to-face conversation [2,24,28,30,37,53,57]. Instead, prior studies typically have people rate video play-back of avatars rather than have the user interact with the avatar [3,6,8,11,17,21,51,59,60]. ...
Article
Full-text available
This study investigates how the behavioural realism of avatars can enhance virtual reality (VR) social interactions involving self-disclosure. First, we review how factors such as trust, enjoyment, and nonverbal communication could be influenced by motion capture technology by enhancing behavioural realism. We also address a gap in the prior literature by comparing different motion capture systems and how these differences affect perceptions of realism, enjoyment, and eye contact. Specifically, this study compared two types of avatars: an iClone UNREAL avatar with full-body and facial motion capture and a Vive Sync avatar with limited motion capture for self-disclosure. Our participants rated the iClone UNREAL avatar higher for realism, enjoyment, and eye contact duration. However, as shown in our post-experiment survey, some participants reported that they preferred the avatar with less behavioural realism. We conclude that a higher level of behavioural realism achieved through more advanced motion capture can improve the experience of VR social interactions. We also conclude that despite the general advantages of higher motion capture, the simpler avatar was still acceptable and preferred by some participants. This has important implications for improving the accessibility of avatars for different contexts, such as therapy, where simpler avatars may be sufficient.
... The notion of believability corresponds also to the perception of credibility, for instance in [5] to explore how credible the IVA is in its role of recruiter. In [26], the believability of the IVA's behaviour is evaluated through the notion of plausibility and naturalness focusing on both the behaviour and the appearance. ...
... Several articles in our literature review have shown the influence of IVAs' behaviours on the users' perception of believability. First of all, concerning the animations, [39] and [26] emphasise that an animated IVA, regardless of the animation mode, is perceived as more natural and more plausible than a static one, indicating that the movements are a key factor in the perception of believability. ...
... Moreover, [39] shows that a more varied and nuanced behaviour leads to a better perception of the realism. [26] also focuses on the naturalness and plausibility of facial animation behaviour, showing that synthesised expressions (i.e. facial expressions generated from data such as audio, head movements, and tagged gaze targets to correspond with expected facial expressions in specific situations) are evaluated as more natural and plausible than tracked expressions (i.e. ...
Conference Paper
Full-text available
The multimodal behaviour of IVAs may convey different socio-affective dimensions, such as emotions, personality, or social capabilities. Several research works show that factors may impact the perception of the IVA’s behaviour. This paper proposes a systematic review, based on the PRISMA method, to investigate how the multi-modal behaviour of IVAs is perceived with respect to socio-affective dimensions. To compare the results of different research works, a socio-emotional framework is proposed, considering the dimensions commonly employed in the studies. The conducted analysis of a wide array of studies ensures a comprehensive and transparent review, providing guidelines on the design of socio-affective IVAs.
... They found that increasing face animation levels increased ratings for embodiment, enfacement, and self-identification. Kullmann et al. [45] let participants rate an observed avatar's naturalness and plausibility. Animated faces were rated as more natural and plausible than static faces -interestingly more so for synthesized than for tracked animations. ...
... Overall, ratings of embodiment and self-identification were highest for whole face animation (AU-AL) compared to other conditions (AU-SL, SU-AL, SU-SL). This is in line with previous work that found increases in measures related to self-perception when using avatars with more face animation [29,31,45] and our hypothesis H1. However, the data did not show H1's hypothesized benefits of partial face animation (SU-AL, AU-SL) over no face animation (SU-SL) for embodiment and self-identification factors. ...
Article
Full-text available
Facial expressions are crucial for many eXtended Reality (XR) use cases, from mirrored self exposures to social XR, where users interact via their avatars as digital alter egos. However, current XR devices differ in sensor coverage of the face region. Hence, a faithful reconstruction of facial expressions either has to exclude these areas or synthesize missing animation data with model-based approaches, potentially leading to perceivable mismatches between executed and perceived expression. This paper investigates potential effects of the coverage of facial animations (none, partial, or whole) on important factors of self-perception. We exposed 83 participants to their mirrored personalized avatar. They were shown their mirrored avatar face with upper and lower face animation, upper face animation only, lower face animation only, or no face animation. Whole animations were rated higher in virtual embodiment and slightly lower in uncanniness. Missing animations did not differ from partial ones in terms of virtual embodiment. Contrasts showed significantly lower humanness, lower eeriness, and lower attractiveness for the partial conditions. For questions related to self-identification, effects were mixed. We discuss participants' shift in body part attention across conditions. Qualitative results show participants perceived their virtual representation as fascinating yet uncanny.
... Comic-like (stylized) avatars personalized with upper body features, gender, and names based on prior research by Kullmann et al. (2023) were chosen as virtual representations of the participants. To work interactively on the task solution, group workstations with virtual 3D models including numbered labels along with plus and minus symbols were provided for visualization purposes. ...
Book
Full-text available
This book is a collection of 11 papers that were presented at SITE’s 2024 annual conference in Las Vegas, NV. It also includes a Foreword from SITE President Jake Cohen and an Introduction from the Editors, Drs. Todd Cherner and Rebecca Blankenship.
... Additional tools were used to evaluate the realism of the avatars and the behavior of the two configurations, ensuring that issues like the uncanny valley phenomenon [80] were avoided. The perception of the other participant's verbal and non-verbal cues was assessed using the behavioral naturalness section of the questionnaire by Kullmann et al. [81], which helped to spot discrepancies between expected and actual behavior in the immersive SVR application (1-to-7 scale). Additionally, the Godspeed questionnaire [40], widely used in virtual agent and robot development, was employed to evaluate the realism of the avatars' behavior (1-to-5 scale). ...
Article
The growing availability of affordable Virtual Reality (VR) hardware and the increasing interest in the Metaverse are driving the expansion of Social VR (SVR) platforms. These platforms allow users to embody avatars in immersive social virtual environments, enabling real-time interactions using consumer devices. Beyond merely replicating real-life social dynamics, SVR platforms offer opportunities to surpass real-world constraints by augmenting these interactions. One example of such augmentation is Artificial Facial Mimicry (AFM), which holds significant potential to enhance social experiences. Mimicry, the unconscious imitation of verbal and non-verbal behaviors, has been shown to positively affect human-agent interactions, yet its role in avatar-mediated human-to-human communication remains under-explored. AFM presents various possibilities, such as amplifying emotional expressions, or substituting one emotion for another to better align with the context. Furthermore, AFM can address the limitations of current facial tracking technologies in fully capturing users' emotions. To investigate the potential benefits of AFM in SVR, an automated AM system was developed. This system provides AFM, along with other kinds of head mimicry (nodding and eye contact), and it is compatible with consumer VR devices equipped with facial tracking. This system was deployed within a test-bench immersive SVR application. A between-dyads user study was conducted to assess the potential benefits of AFM for interpersonal communication while maintaining avatar behavioral naturalness, comparing the experiences of pairs of participants communicating with AFM enabled against a baseline condition. Subjective measures revealed that AFM improved interpersonal closeness, aspects of social attraction, interpersonal trust, social presence, and naturalness compared to the baseline condition. These findings demonstrate AFM's positive impact on key aspects of social interaction and highlight its potential applications across various SVR domains.
... To evaluate the level of co-presence experienced during interactions with the remote therapist, the Networked Minds Social Presence Questionnaire (NMMSQ) [9] was used. Additionally, the Behavior Naturalness [12] scale assessed how natural the verbal and non-verbal behaviors of the therapists representation appeared to the subjects. The Social Presence [16] scale measured the subjects' perception of the therapists presence and their feelings of engagement and support. ...
Conference Paper
Recent advancements in technology have improved rehabilitation through tele-rehabilitation, offering flexible and personalized care with remote monitoring. Interactive "exergames" using eXtended Reality (XR) enhance treatment by combining traditional methods with digital gaming. However, there is limited research on virtual therapist representations. This study evaluates three techniques for therapist representation in Mixed Reality (MR) tele-rehabilitation: audio-only, video, and 3D avatar. Using a collaborative exergame for upper limb rehabilitation in Multiple Sclerosis (MS) patients, the study assesses these methods based on peer acceptance, user experience, social and co-presence, and naturalness, aiming to optimize therapist representation in MR-enabled tele-rehabilitation contexts.
... Wei et al. 19 found that adding positive facial expressions to a virtual coach in a VR phobia treatment significantly improved user connection with that coach. Furthermore, animated faces of VR characters have been found to be perceived as more natural and believable than static faces in VR social experiences 16,20 . ...
Article
Full-text available
Virtual reality (VR) is increasingly used in the study and treatment of paranoia. This is based on the finding that people who mistakenly perceive hostile intent from other people also perceive similar threat from virtual characters. However, there has been no study of the programming characteristics of virtual characters that may influence their interpretation. We set out to investigate how the animation and expressions of virtual humans may affect paranoia. In a two-by-two factor, between-groups, randomized design, 122 individuals with elevated paranoia rated their perceptions of virtual humans, set in an eye-tracking enabled VR lift scenario, that varied in facial animation (static or animated) and expression (neutral or positive). Both facial animation (group difference = 102.328 [51.783, 152.872], p < 0.001, ηp2{\eta }_{p}^{2}\hspace{0.17em} η p 2 = 0.125) and positive expressions (group difference = 53.016 [0.054, 105.979], p = 0.049, ηp2{\eta }_{p}^{2}\hspace{0.17em} η p 2 = 0.033) led to less triggering of paranoid thoughts about the virtual humans. Facial animation (group difference = 2.442 [− 4.161, − 0.724], p = 0.006, ηp2{\eta }_{p}^{2}\hspace{0.17em} η p 2 = 0.063) but not positive expressions (group difference = 0.344 [− 1.429, 2.110], p = 0.681, ηp2{\eta }_{p}^{2}\hspace{0.17em} η p 2 = 0.001) significantly increased the likelihood of neutral thoughts about the characters. Our study shows that the detailed programming of virtual humans can impact the occurrence of paranoid thoughts in VR. The programming of virtual humans needs careful consideration depending on the purpose of their use.
... The work of Skarbez et al. (2017b) investigated a virtual agent's behavior coherence and its relative importance as a contributing factor for the overall plausibility of a VR experience. Further research focused on the congruence of spatial and behavioral cues of agents (Kim et al., 2017), gaze behavior, and auditory features of virtual groups (Bergström et al., 2017), facial animation methods (Kullmann et al., 2023), different virtual body animation features for avatars and agents (Debarba et al., 2022), and renderings of single virtual humans and their (in) congruence with the device-related presentation of the respective environment (Wolf et al., 2022c). However, we are unaware of related studies investigating the congruence of styles within a group of virtual humans. ...
Article
Full-text available
Virtual humans play a pivotal role in social virtual environments, shaping users’ VR experiences. The diversity in available options and users’ individual preferences can result in a heterogeneous mix of appearances among a group of virtual humans. The resulting variety in higher-order anthropomorphic and realistic cues introduces multiple (in)congruencies, eventually impacting the plausibility of the experience. However, related work investigating the effects of being co-located with multiple virtual humans of different appearances remains limited. In this work, we consider the impact of (in)congruencies in the realism of a group of virtual humans, including co-located others (agents) and one’s self-representation (self-avatar), on users’ individual VR experiences. In a 2 × 3 mixed design, participants embodied either (1) a personalized realistic or (2) a customized stylized self-avatar across three consecutive VR exposures in which they were accompanied by a group of virtual others being either (1) all realistic, (2) all stylized, or (3) mixed between stylized and realistic. Our results indicate groups of virtual others of higher realism, i.e., potentially more congruent with participants’ real-world experiences and expectations, were considered more human-like, increasing the feeling of co-presence and the impression of interaction possibilities. (In)congruencies concerning the homogeneity of the group did not cause considerable effects. Furthermore, our results indicate that a self-avatar’s congruence with the participant’s real-world experiences concerning their own physical body yielded notable benefits for virtual body ownership and self-identification for realistic personalized avatars. Notably, the incongruence between a stylized self-avatar and a group of realistic virtual others resulted in diminished ratings of self-location and self-identification. This suggests that higher-order (in)congruent visual cues that are not within the ego-central referential frame of one’s (virtual) body, can have an (adverse) effect on the relationship between one’s self and body. We conclude on the implications of our findings and discuss our results within current theories of VR experiences, considering (in)congruent visual cues and their impact on the perception of virtual others, self-representation, and spatial presence.
Conference Paper
Virtual humans significantly contribute to users’ plausible XR experiences. However, it may be not only the congruent rendering of the virtual human but also the degree of immersion having an impact on virtual humans’ plausibility. In a low-immersive desktop-based and a high-immersive VR condition, participants rated realistic and abstract animated virtual humans regarding plausibility, affective appraisal, and social judgments. First, our results confirmed the factor structure of a preliminary virtual human plausibility questionnaire in VR. Further, the appearance and behavior of realistic virtual humans were overall perceived as more plausible compared to abstract virtual humans, an effect that increased with high immersion. Moreover, only for high immersion, realistic virtual humans were rated as more trustworthy and sympathetic than abstract virtual humans. Interestingly, we observed a potential uncanny valley effect for low but not for high immersion. We discuss the impact of a natural perception of anthropomorphic and realistic cues in VR and highlight the potential of immersive technology to elicit distinct effects in virtual humans.
Article
In Mixed Reality (MR), users' heads are largely (if not completely) occluded by the MR Head-Mounted Display (HMD) they are wearing. As a consequence, one cannot see their facial expressions and other communication cues when interacting locally. In this paper, we investigate how displaying virtual avatars' heads on-top of the (HMD-occluded) heads of participants in a Video See-Through (VST) Mixed Reality local collaborative task could improve their collaboration as well as social presence. We hypothesized that virtual heads would convey more communicative cues (such as eye direction or facial expressions) hidden by the MR HMDs and lead to better collaboration and social presence. To do so, we conducted a between-subject study (n=88) with two independent variables: the type of avatar (CartoonAvatar/RealisticAvatar/NoAvatar) and the level of facial expressions provided (HighExpr/LowExpr). The experiment involved two dyadic communication tasks: (i) the “20-question” game where one participant asks questions to guess a hidden word known by the other participant and (ii) a urban planning problem where participants have to solve a puzzle by collaborating. Each pair of participants performed both tasks using a specific type of avatar and facial animation. Our results indicate that while adding an avatar's head does not necessarily improve social presence, the amount of facial expressions provided through the social interaction does have an impact. Moreover, participants rated their performance higher when observing a realistic avatar but rated the cartoon avatars as less uncanny. Taken together, our results contribute to a better understanding of the role of partial avatars in local MR collaboration and pave the way for further research exploring collaboration in different scenarios, with different avatar types or MR setups.
Article
Full-text available
We present Voice2Face: a Deep Learning model that generates face and tongue animations directly from recorded speech. Our approach consists of two steps: a conditional Variational Autoencoder generates mesh animations from speech, while a separate module maps the animations to rig controller space. Our contributions include an automated method for speech style control, a method to train a model with data from multiple quality levels, and a method for animating the tongue. Unlike previous works, our model generates animations without speaker‐dependent characteristics while allowing speech style control. We demonstrate through a user study that Voice2Face significantly outperforms a comparative state‐of‐the‐art model in terms of perceived animation quality, and our quantitative evaluation suggests that Voice2Face yields more accurate lip closure in speech with bilabials through our speech style optimization. Both evaluations also show that our data quality conditioning scheme outperforms both an unconditioned model and a model trained with a smaller high‐quality dataset. Finally, the user study shows a preference for animations including tongue. Results from our model can be seen at https://go.ea.com/voice2face.
Article
Full-text available
We review the concept of presence in virtual reality, normally thought of as the sense of “being there” in the virtual world. We argued in a 2009 paper that presence consists of two orthogonal illusions that we refer to as Place Illusion (PI, the illusion of being in the place depicted by the VR) and Plausibility (Psi, the illusion that the virtual situations and events are really happening). Both are with the proviso that the participant in the virtual reality knows for sure that these are illusions. Presence (PI and Psi) together with the illusion of ownership over the virtual body that self-represents the participant, are the three key illusions of virtual reality. Copresence, togetherness with others in the virtual world, can be a consequence in the context of interaction between remotely located participants in the same shared virtual environments, or between participants and virtual humans. We then review several different methods of measuring presence: questionnaires, physiological and behavioural measures, breaks in presence, and a psychophysics method based on transitions between different system configurations. Presence is not the only way to assess the responses of people to virtual reality experiences, and we present methods that rely solely on participant preferences, including the use of sentiment analysis that allows participants to express their experience in their own words rather than be required to adopt the terminology and concepts of researchers. We discuss several open questions and controversies that exist in this field, providing an update to the 2009 paper, in particular with respect to models of Plausibility. We argue that Plausibility is the most interesting and complex illusion to understand and is worthy of significant more research. Regarding measurement we conclude that the ideal method would be a combination of a psychophysical method and qualitative methods including sentiment analysis.
Article
Full-text available
Presence is often considered the most important quale describing the subjective feeling of being in a computer-generated and/or computer-mediated virtual environment. The identification and separation of orthogonal presence components, i.e., the place illusion and the plausibility illusion, has been an accepted theoretical model describing Virtual Reality (VR) experiences for some time. This perspective article challenges this presence-oriented VR theory. First, we argue that a place illusion cannot be the major construct to describe the much wider scope of virtual, augmented, and mixed reality (VR, AR, MR: or XR for short). Second, we argue that there is no plausibility illusion but merely plausibility, and we derive the place illusion caused by the congruent and plausible generation of spatial cues and similarly for all the current model’s so-defined illusions. Finally, we propose congruence and plausibility to become the central essential conditions in a novel theoretical model describing XR experiences and effects.
Article
Full-text available
Realistic and lifelike 3D-reconstruction of virtual humans has various exciting and important use cases. Our and others’ appearances have notable effects on ourselves and our interaction partners in virtual environments, e.g., on acceptance, preference, trust, believability, behavior (the Proteus effect), and more. Today, multiple approaches for the 3D-reconstruction of virtual humans exist. They significantly vary in terms of the degree of achievable realism, the technical complexities, and finally, the overall reconstruction costs involved. This article compares two 3D-reconstruction approaches with very different hardware requirements. The high-cost solution uses a typical complex and elaborated camera rig consisting of 94 digital single-lens reflex (DSLR) cameras. The recently developed low-cost solution uses a smartphone camera to create videos that capture multiple views of a person. Both methods use photogrammetric reconstruction and template fitting with the same template model and differ in their adaptation to the method-specific input material. Each method generates high-quality virtual humans ready to be processed, animated, and rendered by standard XR simulation and game engines such as Unreal or Unity. We compare the results of the two 3D-reconstruction methods in an immersive virtual environment against each other in a user study. Our results indicate that the virtual humans from the low-cost approach are perceived similarly to those from the high-cost approach regarding the perceived similarity to the original, human-likeness, beauty, and uncanniness, despite significant differences in the objectively measured quality. The perceived feeling of change of the own body was higher for the low-cost virtual humans. Quality differences were perceived more strongly for one’s own body than for other virtual humans.
Article
Full-text available
Latency in video-mediated interaction can frustrate smooth turn-taking: it may cause participants to perceive silence at points where talk should occur, it may cause them to talk in overlap, and it impedes their ability to return to one-speaker-at-a-time. Whilst potentially frustrating for participants, this makes video-mediated interaction a perspicuous setting for the study of social interaction: it is an environment that nurtures the occurrence of turn-taking problems. For this paper, we conducted secondary analysis of 25 video consultations recorded for heart failure, (antenatal) diabetes, and cancer services in the UK. By comparing video recordings of the patient's and clinician's side of the call, we provide a detailed analysis of how latency interferes with the turn-taking system, how participants understand problems, and how they address them. We conclude that in our data latency unnoticed until it becomes problematic: participants act as if they share the same reality.
Article
Full-text available
This study focuses on the individual and joint contributions of two nonverbal channels (i.e., face and upper body) in avatar mediated-virtual environments. 140 dyads were randomly assigned to communicate with each other via platforms that differentially activated or deactivated facial and bodily nonverbal cues. The availability of facial expressions had a positive effect on interpersonal outcomes. More specifically, dyads that were able to see their partner’s facial movements mapped onto their avatars liked each other more, formed more accurate impressions about their partners, and described their interaction experiences more positively compared to those unable to see facial movements. However, the latter was only true when their partner’s bodily gestures were also available and not when only facial movements were available. Dyads showed greater nonverbal synchrony when they could see their partner’s bodily and facial movements. This study also employed machine learning to explore whether nonverbal cues could predict interpersonal attraction. These classifiers predicted high and low interpersonal attraction at an accuracy rate of 65%. These findings highlight the relative significance of facial cues compared to bodily cues on interpersonal outcomes in virtual environments and lend insight into the potential of automatically tracked nonverbal cues to predict interpersonal attitudes.
Conference Paper
Full-text available
Technologies for Virtual, Mixed, and Augmented Reality (VR, MR, and AR) allow to artificially augment social interactions and thus to go beyond what is possible in real life. Motivations for the use of social augmentations are manifold, for example, to synthesize behavior when sensory input is missing, to provide additional affordances in shared environments, or to support inclusion and training of individuals with social communication disorders. We review and categorize augmentation approaches and propose a software architecture based on four data layers. Three components further handle the status analysis, the modification, and the blending of behaviors. We present a prototype (injectX) that supports behavior tracking (body motion, eye gaze, and facial expressions from the lower face), status analysis, decision-making, augmentation, and behavior blending in immersive interactions. Along with a critical reflection, we consider further technical and ethical aspects.
Article
Full-text available
The few previous studies testing whether or not microexpressions are indicators of deception have produced equivocal findings, which may have resulted from restrictive operationalizations of microexpression duration. In this study, facial expressions of emotion produced by community participants in an initial screening interview in a mock crime experiment were coded for occurrence and duration. Various expression durations were tested concerning whether they differentiated between truthtellers and liars concerning their intent to commit a malicious act in the future. We operationalized microexpressions as expressions occurring less than the duration of spontaneously occurring, non-concealed, non-repressed facial expressions of emotion based on empirically documented findings, that is ≤0.50 s, and then more systematically ≤0.40, ≤0.30, and ≤0.20 s. We also compared expressions occurring between 0.50 and 6.00 s and all expressions ≤6.00 s. Microexpressions of negative emotions occurring ≤0.40 and ≤0.50 s differentiated truthtellers and liars. Expressions of negative emotions occurring ≤6.00 s also differentiated truthtellers from liars but this finding did not survive when expressions ≤1.00 s were filtered from the data. These findings provided the first systematic evidence for the existence of microexpressions at various durations and their possible ability to differentiate truthtellers from liars about their intent to commit an act of malfeasance in the future.
Conference Paper
Full-text available
Human gaze is a crucial element in social interactions and therefore an important topic for social Augmented, Mixed, and Virtual Reality (AR,MR,VR) applications. In this paper we systematically compare four modes of gaze transmission: (1) natural gaze, (2) hybrid gaze, which combines natural gaze transmission with a social gaze model, (3) synthesized gaze, which combines a random gaze transmission with a social gaze model, and (4) purely random gaze. Investigating dyadic interactions, results show a linear trend for the perception of virtual rapport, trust, and interpersonal attraction, suggesting that these measures increase with higher naturalness and social adequateness of the transmission mode. We further investigated the perception of realism as well as the resulting gaze behavior of the avatars and the human participants. We discuss these results and their implications.
Conference Paper
Full-text available
Nonverbal expressions of emotions play an important role in social interactions. Regarding virtual environments (VEs) and the transmission of nonverbal cues in avatar-mediated communication, knowledge of the contribution of nonverbal channels to emotion recognition is essential. This study analyzed the impact of emotional expressions in faces and body motion on emotion recognition. Motion capture data of expressive body movements from actors portraying either anger or happiness were animated using avatars with congruent and incongruent facial expressions. Participants viewed the resulting animations and rated the perceived emotion. During stimulus presentation, gaze behavior was recorded. The analysis of the rating results and visual attention patterns indicates that humans predominantly judge emotions based on the facial expression and pay higher attention to the head region as an information source to recognize emotions. This implicates that the transmission of facial expression is of importance for the design of social VEs.
Conference Paper
Full-text available
There are currently no solutions for enabling direct face-to-face interaction between virtual reality (VR) users wearing head-mounted displays (HMDs). The main challenge is that the headset obstructs a significant portion of a user's face, preventing effective facial capture with traditional techniques. To advance virtual reality as a next-generation communication platform, we develop a novel HMD that enables 3D facial performance-driven animation in real-time. Our wearable system uses ultra-thin flexible electronic materials that are mounted on the foam liner of the headset to measure surface strain signals corresponding to upper face expressions. These strain signals are combined with a head-mounted RGB-D camera to enhance the tracking in the mouth region and to account for inaccurate HMD placement. To map the input signals to a 3D face model, we perform a single-instance offline training session for each person. For reusable and accurate online operation, we propose a short calibration step to readjust the Gaussian mixture distribution of the mapping before each use. The resulting animations are visually on par with cutting-edge depth sensor-driven facial performance capture systems and hence, are suitable for social interactions in virtual worlds.
Article
Full-text available
Gestures and speech interact. They are linked in language production and perception, with their interaction contributing to felicitous communication. The multifaceted nature of these interactions has attracted considerable attention from the speech and gesture community. This article provides an overview of our current understanding of manual and head gesture form and function, of the principle functional interactions between gesture and speech aiding communication, transporting meaning and producing speech. Furthermore, we present an overview of research on temporal speech-gesture synchrony, including the special role of prosody in speech-gesture alignment. In addition, we provide a summary of tools and data available for gesture analysis, and describe speech-gesture interaction models and simulations in technical systems. This overview also serves as an introduction to a Special Issue covering a wide range of articles on these topics. We provide links to the Special Issue throughout this paper.
Article
Full-text available
We present a novel technique for animating self-avatar eye movements in an immersive virtual environment without the use of eye-tracking hardware, and evaluate our technique via a two-alternative, forced-choice-with-confidence experiment that compares this simulated-eye-tracking condition to a no-eye-tracking condition and a real-eye-tracking condition in which the avatar’s eyes were rotated with an eye tracker. Viewing the reflection of a tracked self-avatar is often used in virtual-embodiment scenarios to induce in the participant the illusion that the virtual body of the self-avatar belongs to them, however current tracking methods do not account for the movements of the participants eyes, potentially lessening this body-ownership illusion. The results of our experiment indicate that, although blind to the experimental conditions, participants noticed differences between eye behaviors, and found that the real and simulated conditions represented their behavior better than the no-eye-tracking condition. Additionally, no statistical difference was found when choosing between the real and simulated conditions. These results suggest that adding eye movements to selfavatars produces a subjective increase in self-identification with the avatar due to a more complete representation of the participant’s behavior, which may be beneficial for inducing virtual embodiment, and that effective results can be obtained without the need for any specialized eye-tracking hardware.
Article
Full-text available
A meta-analysis was conducted on the accuracy of predictions of various objective outcomes in the areas of clinical and social psychology from short observations of expressive behavior (under 5 min). The overall effect size for the accuracy of predictions for 38 different results was .39. Studies using longer periods of behavioral observation did not yield greater predictive accuracy; predictions based on observations under 0.5 min in length did not differ significantly from predictions based on 4- and 5-min observations. The type of behavioral channel (such as the face, speech, the body, tone of voice) on which the ratings were based was not related to the accuracy of predictions. Accuracy did not vary significantly between behaviors manipulated in a laboratory and more naturally occurring behavior. Last, effect sizes did not differ significantly for predictions in the areas of clinical psychology, social psychology, and the accuracy of detecting deception. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Reviewers of research reports frequently criticize the choice of statistical methods. While some of these criticisms are well-founded, frequently the use of various parametric methods such as analysis of variance, regression, correlation are faulted because: (a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used. In this paper, I dissect these arguments, and show that many studies, dating back to the 1930s consistently show that parametric statistics are robust with respect to violations of these assumptions. Hence, challenges like those above are unfounded, and parametric methods can be utilized without concern for "getting the wrong answer".
Article
Full-text available
Research on gaze and eye contact was organized within the framework of Patterson's (1982) sequential functional model of nonverbal exchange. Studies were reviewed showing how gaze functions to (a) provide information, (b) regulate interaction, (c) express intimacy, (d) exercise social control, and (e) facilitate service and task goals. Research was also summarized that describes personal, experiential, relational, and situational antecedents of gaze and reactions to gaze. Directions were given for a functional analysis of the relation between gaze and physiological responses. Attribution theories were integrated into the sequential model for making predictions about people's perceptions of their own gazing behavior and the gazing behavior of others. Data on people's accuracy in reporting their own and others' gaze were presented and integrated with related findings in attribution research. The sequential model was used to analyze research studies measuring the interaction between gaze and personal and contextual variables. Methodological and measurement issues were discussed and directions were outlined for future research.
Article
Full-text available
In order to clarify the morphological uniqueness of the human eye and to obtain cues to understanding its adaptive significance, we compared the external morphology of the primate eye by measuring nearly half of all extant primate species. The results clearly showed exceptional features of the human eye: (1) the exposed white sclera is void of any pigmentation, (2) humans possess the largest ratio of exposed sclera in the eye outline, and (3) the eye outline is extraordinarily elongated in the horizontal direction. The close correlation of the parameters reflecting (2) and (3) with habitat type or body size of the species examined suggested that these two features are adaptations for extending the visual field by eyeball movement, especially in the horizontal direction. Comparison of eye coloration and facial coloration around the eye suggested that the dark coloration of exposed sclera of nonhuman primates is an adaptation to camouflage the gaze direction against other individuals and/or predators, and that the white sclera of the human eye is an adaptation to enhance the gaze signal. The uniqueness of human eye morphology among primates illustrates the remarkable difference between human and other primates in the ability to communicate using gaze signals.
Article
Full-text available
During social interactions, people's eyes convey a wealth of information about their direction of attention and their emotional and mental states. This review aims to provide a comprehensive overview of past and current research into the perception of gaze behavior and its effect on the observer. This encompasses the perception of gaze direction and its influence on perception of the other person, as well as gaze-following behavior such as joint attention, in infant, adult, and clinical populations. Particular focus is given to the gaze-cueing paradigm that has been used to investigate the mechanisms of joint attention. The contribution of this paradigm has been significant and will likely continue to advance knowledge across diverse fields within psychology and neuroscience.
Article
Full-text available
The use of remote communication technologies to carry out daily work is becoming increasingly common, and their use in certain settings is already commonplace. Yet, in spite of the fact that significant sums are being spent on the acquisition of technologies to support distributed work, we are only beginning to understand the intricacies of these interactions. This paper identifies and analyzes one particular limitation of video-based teleconferencing, the impact of an audio and video delay on distributed communication. It offers a detailed microanalysis of one distributed team's use of videoconferencing to support remote teamwork. We explore through this analysis the impact which technology-generated delays may have on shared meaning-making between remote participants. We draw conclusions about the significance of our findings for understanding talk, interaction and collaboration across remote links, and conclude with recommendations for designers, users and implementers. Keywords CSCW, remote collaboration, telework, videoconferencing, audio, conversation analysis, interaction analysis Delay-Generated Trouble in Distributed Interaction Ruhleder and Jordan 2 1. Remote Communication 1.1 The Problem Corporations, government agencies and academic institutions increasingly rely on remote communication to carry out their daily work. Audio, video and data communications between remote teams are becoming increasingly common, and their use in certain settings is already commonplace and unremarkable. Yet, in spite of the fact that significant sums are being spent on the acquisition of technologies to support distributed work, we are only beginning to understand the intricacies of these interactions. This paper identifies and analyzes one particular limitation of video-base...
Chapter
For many years the Handbook of Methods in Nonverbal Behavior Research (Scherer & Ekman, 1982) has been an invaluable text for researchers looking for methods to study nonverbal behavior and the expression of affect. A successor to this essential text, The New Handbook of Methods in Nonverbal Behavior Research is a substantially updated volume with 90% new material. It includes chapters on coding and methodological issues for a variety of areas in nonverbal behavior: facial actions, vocal behavior, and body movement. Issues relevant to judgment studies, methodology, reliability, analyses, etc. have also been updated. The topics are broad and include specific information about methodology and coding strategies in education, psychotherapy, deception, nonverbal sensitivity, and marital and group behavior. There is also a chapter detailing specific information on the technical aspects of recording the voice and face, and specifically in relation to deception studies. This volume will be valuable for both new researchers and those already working in the fields of nonverbal behavior, affect expression, and related topics. It will play a central role in further refining research methods and coding strategies, allowing a comparison of results from various laboratories where research on nonverbal behavior is being conducted. This will advance research in the field and help to coordinate results so that a more comprehensive understanding of affect expression can be developed.
Article
Through avatar embodiment in Virtual Reality (VR) we can achieve the illusion that an avatar is substituting our body: the avatar moves as we move and we see it from a first person perspective. However, self-identification, the process of identifying a representation as being oneself, poses new challenges because a key determinant is that we see and have agency in our own face. Providing control over the face is hard with current HMD technologies because face tracking is either cumbersome or error prone. However, limited animation is easily achieved based on speaking. We investigate the level of avatar enfacement, that is believing that a picture of a face is one's own face, with three levels of facial animation: (i) one in which the facial expressions of the avatars are static, (ii) one in which we implement lip-sync motion and (iii) one in which the avatar presents lip-sync plus additional facial animations, with blinks, designed by a professional animator. We measure self-identification using a face morphing tool that morphs from the face of the participant to the face of a gender matched avatar. We find that self-identification on avatars can be increased through pre-baked animations even when these are not photorealistic nor look like the participant.
Article
Randomization is an established process to assign participants to treatment groups to reduce selection bias. Minimization is a method of dynamic or adaptive randomization to minimize the imbalance between treatment groups with respect to the number of participants over the participant’s predefined covariate factors. The algorithms for minimization randomization with equal allocation ratio have been well studied in the literature. With the growing demand for unequal allocation in clinical trials, an allocation ratio preserving biased coin minimization (ARP BCM) was proposed to preserve the allocation ratio at every allocation step, using measure of imbalance by the range. In this article, we expand the ARP BCM to unequal allocation which preserves the allocation ratio at every allocation, using more measures of imbalance by the standard deviation and variance. Simulations have been conducted to evaluate the performance of these methods. Furthermore, these algorithms have been implemented in a newly developed R package ‘Minirand’.
Conference Paper
In this paper we present a complete pipeline to create ready-to-animate virtual humans by fitting a template character to a point set obtained by scanning a real person using multi-view stereo reconstruction. Our virtual humans are built upon a holistic character model and feature a detailed skeleton, fingers, eyes, teeth, and a rich set of facial blendshapes. Furthermore, due to the careful selection of techniques and technology, our reconstructed humans are quite realistic in terms of both geometry and texture. Since we represent our models as single-layer triangle meshes and animate them through standard skeleton-based skinning and facial blendshapes, our characters can be used in standard VR engines out of the box. By optimizing for computation time and minimizing manual intervention, our reconstruction pipeline is capable of processing whole characters in less than ten minutes.
Article
We report on the design and results of an experiment investigating factors influencing Slater’s Plausibility Illusion (Psi) in virtual environments (VEs). Slater proposed Psi and Place Illusion (PI) as orthogonal components of virtual experience which contribute to realistic response in a VE. PI corresponds to the traditional conception of presence as “being there,” so there exists a substantial body of previous research relating to PI, but very little relating to Psi. We developed this experiment to investigate the components of plausibility illusion using subjective matching techniques similar to those used in color science. Twenty-one participants each experienced a scenario with the highest level of coherence (the extent to which a scenario matches user expectations and is internally consistent), then in eight different trials chose transitions from lower-coherence to higher-coherence scenarios with the goal of matching the level of Psi they felt in the highest-coherence scenario. At each transition, participants could change one of the following coherence characteristics: the behavior of the other virtual humans in the environment, the behavior of their own body, the physical behavior of objects, or the appearance of the environment. Participants tended to choose improvements to the virtual body before any other improvements. This indicates that having an accurate and well-behaved representation of oneself in the virtual environment is the most important contributing factor to Psi. This study is the first to our knowledge to focus specifically on coherence factors in virtual environments.
Article
Significant challenges currently prohibit expressive interaction in virtual reality (VR). Occlusions introduced by head-mounted displays (HMDs) make existing facial tracking techniques intractable, and even state-of-the-art techniques used for real-time facial tracking in unconstrained environments fail to capture subtle details of the user's facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to an HMD, we record multiple subjects performing various facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user's mouth region to the parameters that control a digital avatar. To make training this system more tractable, we use audio-based alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that this approach is also feasible for tracking the expressions around the user's eye region with an internal infrared (IR) camera, thereby enabling full facial tracking. This system requires no user-specific calibration, uses easily obtainable consumer hardware, and produces high-quality animations of speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art real-time facial tracking techniques.
Article
Facial action is the most commanding and complicated of all nonverbal behavior. The face, even in repose, may provide information about some emotion or mood state. The face is commanding because it is the location for the senses of smell, taste, sight, and hearing. The face commands attention because it is the symbol of the self. This chapter reviews measurement techniques for only one type of signal - rapid. Only one kind of rapid signal - visible movement - is considered. Most of the studies that have used one or another technique to measure visible movement were concerned with only one of the many messages rapid signs may convey - information about emotion. Due its descriptive power, the technique of Ekman and Friesen has encouraged a wide range of research on facial movement. Automated analysis using computer vision produces both action unit recognition and quantitative measures of feature trajectories.
Article
Simulator sickness (SS) in high-fidelity visual simulators is a byproduct of modem simulation technology. Although it involves symptoms similar to those of motion-induced sickness (MS), SS tends to be less severe, to be of lower incidence, and to originate from elements of visual display and visuo-vestibular interaction atypical of conditions that induce MS. Most studies of SS to date index severity with some variant of the Pensacola Motion Sickness Questionnaire (MSQ). The MSQ has several deficiencies as an instrument for measuring SS. Some symptoms included in the scoring of MS are irrelevant for SS, and several are misleading. Also, the configural approach of the MSQ is not readily adaptable to computer administration and scoring. This article describes the development of a Simulator Sickness Questiomaire (SSQ), derived from the MSQ using a series of factor analyses, and illustrates its use in monitoring simulator performance with data from a computerized SSQ survey of 3,691 simulator hops. The database used for development included more than 1,100 MSQs, representing data from 10 Navy simulators. The SSQ provides straightforward computer or manual scoring, increased power to identify "problem" simulators, and improved diagnostic capability.
Article
MOST verbal communication occurs in contexts where the listener can see the speaker as well as hear him. However, speech perception is normally regarded as a purely auditory process. The study reported here demonstrates a previously unrecognised influence of vision upon speech perception. It stems from an observation that, on being shown a film of a young woman's talking head, in which repeated utterances of the syllable [ba] had been dubbed on to lip movements for [ga], normal adults reported hearing [da]. With the reverse dubbing process, a majority reported hearing [bagba] or [gaba]. When these subjects listened to the soundtrack from the film, without visual input, or when they watched untreated film, they reported the syllables accurately as repetitions of [ba] or [ga]. Subsequent replications confirm the reliability of these findings; they have important implications for the understanding of speech perception.
Article
Multimedia synchronization comprises both the definition and the establishment of temporal relationships among media types. The presentation of `in sync' data streams is essential to achieve a natural impression, data that is `out of sync' is perceived as being somewhat artificial, strange, or even annoying. Therefore, the goal of any multimedia system is to enable an application to present data without no or little synchronization errors. The achievement of this goal requires a detailed knowledge of the synchronization requirements at the user interface. The paper presents the results of a series of experiments about human media perception that may be used as `quality of service' guidelines. The results show that skews between related data streams may still give the effect that the data is `in sync' and gives some constraints under which jitter may be tolerated. The author uses the findings to develop a scheme for the processing of nontrivial synchronization skew between more than two data streams
Article
The Media and Acoustics Perception Lab (MAPL) designed a study to determine the minimum amount of audio-visual synchronization (a/v sync) errors that can be detected by end-users. Lip synchronization is the most noticeable a/v sync error, and was used as the testing stimuli to determine the perceptual threshold of audio leading errors. The results of the experiment determined that the average audio leading threshold for a/v sync detection was 185.19 ms, with a standard deviation of 42.32 ms. This threshold determination of lip sync error (with audio leading) will be widely used for validation and verification infrastructures across the industry. By implementing an objective pass/fail value into software, the system or network under test is held against criteria which were derived from a scientific subjective test.
José Pinheiro Douglas Bates and R Core Team. 2022. nlme: Linear and Nonlinear Mixed Effects Models
  • Douglas Bates
  • R Core
Discovering Statistics Using R . Sage, London ; Thousand Oaks
  • Andy P Field
  • Jeremy Miles
  • Zoë Field
  • Field P.