Preprint

Facial Feature Enhancement for Immersive Real-Time Avatar-Based Sign Language Communication using Personalized CNNs

Authors:
To read the file of this research, you can request a copy directly from the authors.

Abstract

Facial recognition is crucial in sign language communication. Especially for virtual reality and avatar-based communication, increased facial features have the potential to integrate the deaf and hard-of-hearing community to improve speech comprehension and empathy. But, current methods lack precision in capturing nuanced expressions. To address this, we present a real-time solution that utilizes personalized Convolutional Neural Networks (CNNs) to capture intricate facial details, such as tongue movement and individual puffed cheeks. Our system's classification models offer easy expansion and integration into existing facial recognition systems via UDP network broadcasting.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
As the world becomes more interconnected, physical separation between people increases. Existing collaborative Virtual Reality (VR) applications, designed to bridge this distance, are not yet sufficient in providing a sense of social connection comparable to face-to-face interactions. Possible reasons are the limited multimodality of VR systems and the lack of non-verbal cues in VR avatars. We systematically investigated how facial expressions influence Social Presence in two collaborative VR tasks. We explored four types of facial expressions: eyes and mouth movements, their combination, and no expressions, for two types of explanations: verbal and graphical. To examine how these expressions influence Social Presence, we conducted a controlled VR experiment (N = 48), in which participants had to explain a specific term to their counterpart. Our results demonstrate that eye and mouth movements positively influence Social Presence in VR. Particularly, combining verbal explanations and eye movements induces the highest feeling of co-presence.
Article
Full-text available
Previous studies demonstrated the positive effects of smiling on interpersonal outcomes. The present research examined if enhancing one’s smile in a virtual environment could lead to a more positive communication experience. In the current study, participants’ facial expressions were tracked and mapped on a digital avatar during a real-time dyadic conversation. The avatar’s smile was rendered such that it was either a slightly enhanced version or a veridical version of the participant’s actual smile. Linguistic analyses using the Linguistic Inquiry Word Count (LIWC) revealed that participants who communicated with each other via avatars that exhibited enhanced smiles used more positive words to describe their interaction experience compared to those who communicated via avatars that displayed smiling behavior reflecting the participants’ actual smiles. In addition, self-report measures showed that participants in the ‘enhanced smile’ condition felt more positive affect after the conversation and experienced stronger social presence compared to the ‘normal smile’ condition. These results are particularly striking when considering the fact that most participants (>90%) were unable to detect the smiling manipulation. This is the first study to demonstrate the positive effects of transforming unacquainted individuals’ actual smiling behavior during a real-time avatar-networked conversation.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Conference Paper
Although facial features are considered to be essential for humans to understand sign language, no prior research work has yet examined their significance for automatic sign language recognition or presented some evaluation results. This paper describes a vision-based recognition system that employs both manual and facial features, extracted from the same input image. For facial feature extraction an active appearance model is applied to identify areas of interest such as the eyes and mouth region. Afterwards a numerical description of facial expression and lip outline is computed. An extensive evaluation was performed on a new sign language corpus, which contains continuous articulations of 25 native signers. The obtained results proved the importance of integrating facial expressions into the classification process. The recognition rates for isolated and continuous signing increased in signer-dependent as well as in signer-independent operation mode. Interestingly, roughly two of ten signs were recognized just from the facial features.