Mohammed E. Hoque

Massachusetts Institute of Technology, Cambridge, Massachusetts, United States

Are you Mohammed E. Hoque?

Claim your profile

Publications (14)0 Total impact

  • M. Hoque, R.W. Picard
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a real-time system including a 3D character that can converse, capture, analyze and interpret subtle and multidimensional human nonverbal behaviors for possible applications such as job interviews, public speaking, or even automated speech therapy. The system works in a personal computer and senses nonverbal data from video (i.e., facial expressions) and audio (i.e., speech recognition and prosody analysis) using a standard web cam. We contextualized the development and evaluation of our system as a training scenario for job interviews. Using user-centered design and iterations, we determine how the nonverbal data could be presented to the user in an intuitive and educational manner. We tested efficacy of the system in context of job interviews with 90 MIT undergraduate students. Our results suggest that the participants who used our system to improve their interview skills were perceived to be better candidates by human judges. Participants reported that the most useful feature was being given feedback on their speaking rate, and overall they reported strong agreement that would consider using this system again for self-reflection.
    Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on; 01/2013
  • Javier Hernandez, Mohammed E. Hoque, Rosalind W. Picard
    [Show abstract] [Hide abstract]
    ABSTRACT: Have you ever wondered whether it's possible to quantitatively measure how friendly or welcoming a community is? Or imagined which parts of the community are happier than others? In this work, we introduce a new technology that begins to address these questions.
    ACM SIGGRAPH 2012 Emerging Technologies; 08/2012
  • M.E. Hoque, D.J. McDuff, R.W. Picard
    [Show abstract] [Hide abstract]
    ABSTRACT: We create two experimental situations to elicit two affective states: frustration, and delight. In the first experiment, participants were asked to recall situations while expressing either delight or frustration, while the second experiment tried to elicit these states naturally through a frustrating experience and through a delightful video. There were two significant differences in the nature of the acted versus natural occurrences of expressions. First, the acted instances were much easier for the computer to classify. Second, in 90 percent of the acted cases, participants did not smile when frustrated, whereas in 90 percent of the natural cases, participants smiled during the frustrating interaction, despite self-reporting significant frustration with the experience. As a follow up study, we develop an automated system to distinguish between naturally occurring spontaneous smiles under frustrating and delightful stimuli by exploring their temporal patterns given video of both. We extracted local and global features related to human smile dynamics. Next, we evaluated and compared two variants of Support Vector Machine (SVM), Hidden Markov Models (HMM), and Hidden-state Conditional Random Fields (HCRF) for binary classification. While human classification of the smile videos under frustrating stimuli was below chance, an accuracy of 92 percent distinguishing smiles under frustrating and delighted stimuli was obtained using a dynamic SVM classifier.
    IEEE Transactions on Affective Computing 01/2012; 3(3):323-334.
  • Source
    M. Hoque, R.W. Picard
    [Show abstract] [Hide abstract]
    ABSTRACT: This work is part of research to build a system to combine facial and prosodic information to recognize commonly occurring user states such as delight and frustration. We create two experimental situations to elicit two emotional states: the first involves recalling situations while expressing either delight or frustration; the second experiment tries to elicit these states directly through a frustrating experience and through a delightful video. We find two significant differences in the nature of the acted vs. natural occurrences of expressions. First, the acted ones are much easier for the computer to recognize. Second, in 90% of the acted cases, participants did not smile when frustrated, whereas in 90% of the natural cases, participants smiled during the frustrating interaction, despite self-reporting significant frustration with the experience. This paper begins to explore the differences in the patterns of smiling that are seen under natural frustration and delight conditions, to see if there might be something measurably different about the smiles in these two cases, which could ultimately improve the performance of classifiers applied to natural expressions.
    Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on; 04/2011
  • Source
    Affective Computing and Intelligent Interaction - 4th International Conference, ACII 2011, Memphis, TN, USA, October 9-12, 2011, Proceedings, Part I; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Affective computing (AC) is a unique discipline which includes modeling affect using one or multiple modalities by drawing on techniques from many different fields. AC often deals with problems that are known to be very complex and multi-dimensional, involving different kinds of data (numeric, symbolic, visual etc.). However, with the advancement of machine learning techniques, a lot of those problems are now becoming more tractable. The purpose of this workshop was to engage the machine learning and affective computing communities towards solving problems related to understanding and modeling social affective behaviors. We welcomed participation of researchers from diverse fields, including signal processing and pattern recognition, statistical machine learning, human-computer interaction, human-robot interaction, robotics, conversational agents, experimental psychology, and decision making. There is a need for a set of high standards for recognizing and understanding affect. At the same time, these standards need to take into account that the expectations and validations in this area may be different than in traditional research on machine learning. This should be reflected in the design of machine learning techniques used to tackle these problems. For example, affective data sets are known to be noisy, high dimensional, and incomplete. Classes may overlap. Affective behaviors are often person specific and require temporal modeling with real-time performance. This first edition of the ACII Workshop on Machine Learning for Affective Computing will be a proper venue to invoke such discussions and engage the community towards design and validation of learning techniques for affective computing.
    Affective Computing and Intelligent Interaction - Fourth International Conference, ACII 2011, Memphis, TN, USA, October 9-12, 2011, Proceedings, Part II; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Approximately one third to one half of individuals diagnosed with an ASD have significant difficulty using speech and language as an effective means of communication. While conventional speech-language therapy can help address these issues, it can be tedious, time consuming, and minimally engaging. Objectives: We aimed to create and evaluate a suite of engaging, customized, interactive computer games to help children improve speech difficulties relating to loudness and speech rate (both of which have direct impact on intelligibility). The games were easily customizable to suit the needs and interests of individuals with diverse levels of ability and free and open-source, making them accessible to all with an Internet connection. Our objective was to supplement regular speech therapy with entertaining and customizable tools that a speech-language therapist can use when working with individuals on the autism spectrum. Methods: Eight children (two females, six males) on the autism spectrum (ranging in age from 7‐20yr) who had difficulties with loudness and/or speech rate participated in this study. Participants were matched and assigned to two groups of four (A and B) based on their speech difficulties, age, and gender. After an initial baseline assessment of loudness and speech rate, Group A engaged in two weeks of computerized speech therapy while Group B engaged in two weeks of conventional speech therapy. In the following two weeks interventions were switched and Group A received conventional speech therapy while Group B received computerized speech therapy. Speech was recorded during all sessions and loudness and speech rate were calculated and summarized at the end of each intervention period. Results: Out of our eight participants, six had speaking louder as a target. Five of the six showed a statistically significant increase in loudness at the end of the study compared to baseline. Three of those five showed either comparable or significantly greater improvement following the computerized speech therapy compared to conventional speech therapy. Only one participant showed a significant decrease in loudness following the computerized sessions compared to conventional therapy. Five participants had speaking slower and two had speaking faster as targets. Two participants showed a statistically significant desirable change in speech rate after intervention compared to baseline. One spoke more slowly after conventional therapy and the other spoke more slowly after the computerized therapy. Conclusions: Our computerized intervention appeared engaging and effective for the majority of our participants. Some participants demonstrated statistically significant changes in speech following the computerized therapy relative to conventional therapy, suggesting that a subset of individuals on the autism spectrum may especially benefit from our intervention. The finding that some participants had statistically significant gains in both the computer as well as the conventional therapy sessions is also promising since computer therapy may be a useful option for individuals who cannot afford, or do not otherwise have regular access to or interest in, speech‐language therapy.
    International Meeting for Autism Research 2010; 05/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Many people on the autism spectrum understand the semantics involved in social interaction; however, embodied information such as facial expressions, gestures, and voice often prove elusive. First-hand accounts from people with autism highlight the challenges inherent in processing these complex and unpredictable social cues. These challenges can be debilitating, complicating social interaction and making integration with society difficult. While many intervention methods have been developed to provide help, the majority fail to include rich, real-world social interactions in their methodology. Objectives: Our goal is to develop a technology-based intervention that helps individuals on the autism spectrum capture, analyze, systemize, and reflect on social-emotional signals communicated by facial and head movements in natural, everyday social interactions. Our approach utilizes an ultra-mobile computer customized with a video camera and pattern analysis algorithms that can automatically identify facial expressions using facial feature tracking. In an effort to make our system robust to real-world conditions and usable by individuals with cognitive, motor, and sensory impairments, we have engaged in a number of user-centered design sessions with people on the autism spectrum and their caregivers. Methods: We conducted five usability sessions with seven verbal adolescents on the autism spectrum and their teachers to address various hardware and software functionality issues related to our system. Results: Our initial interface design using facial expression graphs and points superimposed on the video to indicate features on the face was confusing and not engaging enough for the participants. Based on iterative feedback, interactive affective tagging components were added and the interface was made customizable to suit each participant's interests and difficulties in recognizing particular facial expressions. For example, some participants were good at recognizing happiness, sadness, and anger. For those participants, we were able to instantly customize the interface to handle a more challenging set of affect labels, such as confusion and excitement. In terms of form factor, many participants found the mobile computer's keyboard and track pad distracting. To overcome this, we made custom covers that shield exterior input controls and utilized the ultra-mobile computer's touch screen to input data. We also adjusted the placement and size of touch screen buttons to allow participants to use their thumbs for interaction. Finally, some participants had difficulty reading the text labels describing identified facial expressions. We are currently exploring the use of images instead of text to accommodate reading difficulties. Conclusions: The user-centered design sessions provided insights into the usability of the system and were critical to the development of our technology, underscoring the importance of including people on the autism spectrum and their caregivers in the design process of new technologies. For these technologies to be effective, they need to accommodate the perceptual, motor, and cognitive disabilities of their users. An experimental evaluation of our redesigned system is forthcoming to determine if just-in-time, in-situ assistance can help facilitate learning of facial expressions and underlying emotions for persons on the autism spectrum.
    International Meeting for Autism Research 2009; 05/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Individuals on the autism spectrum often have difficulties producing intelligible speech with either high or low speech rate, and atypical pitch and/or amplitude affect. In this study, we present a novel intervention towards customizing speech enabled games to help them produce intelligible speech. In this approach, we clinically and computationally identify the areas of speech production difficulties of our participants. We provide an interactive and customized interface for the participants to meaningfully manipulate the prosodic aspects of their speech. Over the course of 12 months, we have conducted several pilots to set up the experimental design, developed a suite of games and audio processing algorithms for prosodic analysis of speech. Preliminary results demonstrate our intervention being engaging and effective for our participants.
    INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Participatory user interface design with adolescent users on the autism spectrum presents a number of unique challenges and opportunities. Through our work developing a system to help autistic adolescents learn to recognize facial expressions, we have learned valuable lessons about software and hardware design issues for this population. These lessons may also be helpful in assimilating iterative user input to customize technology for other populations with special needs.
    Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI 2009, Extended Abstracts Volume, Boston, MA, USA, April 4-9, 2009; 01/2009
  • Source
    Mohammed E. Hoque, Rana El Kaliouby, Rosalind W. Picard
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the challenges of getting gro und truth affective labels for spontaneous video, and presents implicat ions for systems such as virtual agents that have automated facial analysis capabilities. We first present a dataset from an intelligent tutoring application an d describe the most prevalent approach to labeling such data. We then present an alternative labeling approach, which closely models how the majority of automated facial analysis systems are designed. We show that while participan ts, peers and trained judges report high inter-rater agreement on expressions of delight, confusion, flow, frustration, boredom, surprise, and neutral when sh own the entire 30 minutes of video for each participant, inter-rater agreement d rops below chance when human coders are asked to watch and label short 8 s econd clips for the same set of labels. We also perform discriminative analysis for facial action units for each affective state represented in the clips. The results emphasize that human coders heavily rely on factors such as familiarity of the person and context of the interaction to correctly infer a person's affec tive state; without this information, the reliability of humans as well as m achines attributing affective labels to spontaneous facial-head movements drops s ignificantly.
    Intelligent Virtual Agents, 9th International Conference, IVA 2009, Amsterdam, The Netherlands, September 14-16, 2009, Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Social communication in autism is significantly hindered by difficulties processing affective cues in realtime face-to-face interaction. The interactive Social-Emotional Toolkit (iSET) allows its users to record and annotate video with emotion labels in real time, then review and edit the labels later to bolster understanding of affective information present in interpersonal interactions. The iSET demo will let the ACII audience experience the augmentation of interpersonal interactions by using the iSET system.
    01/2009;
  • Source
    Philipp Robbel, Mohammed E Hoque, Cynthia Breazeal
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an integrated approach to recognizing and generating affect on a humanoid robot as it interacts with a human user. We describe a method for detecting basic affect signals in the user's speech input and generate appropriately chosen responses on our robot platform. Re-sponses are selected both in terms of content and emotional quality of the voice. Additionally, we synthesize gestures and facial expressions on the robot that magnify the effect of the conveyed emotional state of the robot. The guiding principle of our work is that adding the ability to detect and display emotion to physical agents allows their effective use in novel application areas such as child and elderly care, healthcare, education, and beyond.
    01/2009;
  • Source
    Mohammed E. Hoque
    [Show abstract] [Hide abstract]
    ABSTRACT: Many individuals diagnosed with autism and Down syndrome have difficulties producing intelligible speech. Sy stematic analysis of their voice parameters could lead to be tter understanding of the specific challenges they face in achieving proper speech production. In this study, 100 minute s of speech data from natural conversations between neurotypica ls and individuals diagnosed with autism/Down-syndrome was used. Analyzing their voice parameters indicated new find ings across a variety of speech parameters. An immediate extens ion of this work would be to customize this technology allowing participants to visualize and control their speech parameters in real time and get live feedback.
    Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2008, Halifax, Nova Scotia, Canada, October 13-15, 2008; 01/2008