Article

Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features

Authors:
  • Behavioral Informatix, LLC
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Observational methods are fundamental to the study of human behavior in the behavioral sciences. For example, in the context of research on intimate relationships, psychologists’ hypotheses are often empirically tested by video recording interactions of couples and manually coding relevant behaviors using standardized coding systems. This coding process can be time-consuming, and the resulting coded data may have a high degree of variability because of a number of factors (e.g., inter-evaluator differences). These challenges provide an opportunity to employ engineering methods to aid in automatically coding human behavioral data. In this work, we analyzed a large corpus of married couples’ problem-solving interactions. Each spouse was manually coded with multiple session-level behavioral observations (e.g., level of blame toward other spouse), and we used acoustic speech features to automatically classify extreme instances for six selected codes (e.g., “low” vs. “high” blame). Specifically, we extracted prosodic, spectral, and voice quality features to capture global acoustic properties for each spouse and trained gender-specific and gender-independent classifiers. The best overall automatic system correctly classified 74.1% of the instances, an improvement of 3.95% absolute (5.63% relative) over our previously reported best results. We compare performance for the various factors: across codes, gender, classifier type, and feature type.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the context of couple therapy analysis, interactions are manually annotated by a psychologist which is an expensive and time-consuming process. In [11], manually coded behavior patterns, such as level of blame towards the other spouse, are predicted automatically using vocal interaction analysis. ...
... Both prosodic and spectral features were used. Prosodic features reflect vocal characteristics relating to various behavioral aspects [11,14,18], and spectral features are informative in various tasks related to emotion recognition [46,47] and behavioral signal processing [11,14]. ...
... Both prosodic and spectral features were used. Prosodic features reflect vocal characteristics relating to various behavioral aspects [11,14,18], and spectral features are informative in various tasks related to emotion recognition [46,47] and behavioral signal processing [11,14]. ...
Article
Full-text available
Analysis of couple interactions using speech processing techniques is an increasingly active multi-disciplinary field that poses challenges such as automatic relationship quality assessment and behavioral coding. Here, we focused on the prediction of individuals’ attachment style using interactions of recently married (1–15 months) couples. For low-level acoustic feature extraction, in addition to the frame-based acoustic features such as mel-frequency cepstral coefficients (MFCCs) and pitch, we used the turn-based i-vector features that are the commonly used in speaker verification systems. Sentiments, positive and negative, of the dialog turns were also automatically generated from transcribed text and used as features. Feature and score fusion algorithms were used for low-level acoustic features and text features. Even though score and feature fusion algorithms performed similar, predictions with score fusion were more consistent when couples have known each other for a longer period of time.
... Due to the likelihood of having a lot of features, various features selection methods such as forward feature selection has been used [6]. Additionally, various approaches have been used to remove the speaker, microphone, and environmental variability of the audio signal by performing mean normalizing of the LLDs for the whole session audio (e.g., [8]). ...
... Here are the algorithms used by various works: SVM [5,6,8,11,13,38,39,52,58,61,94,95,101,102], linear discriminant analysis (LDA) [6,101], markov models [21,56,57,70,101], multiple instance learning (diversity density [39,59,60], diversity density SVM [38,52]), maximum likelihood [20,21,37,70], sequential probability ratio test [60], logistic regression [8], perceptron [101], gaussian mixture model (GMM) [102], deep neural networks [22,61,62,96], LSTM [93][94][95][96], GRU [20,63], random forest [11,26], CNN [63]. ...
... Here are the algorithms used by various works: SVM [5,6,8,11,13,38,39,52,58,61,94,95,101,102], linear discriminant analysis (LDA) [6,101], markov models [21,56,57,70,101], multiple instance learning (diversity density [39,59,60], diversity density SVM [38,52]), maximum likelihood [20,21,37,70], sequential probability ratio test [60], logistic regression [8], perceptron [101], gaussian mixture model (GMM) [102], deep neural networks [22,61,62,96], LSTM [93][94][95][96], GRU [20,63], random forest [11,26], CNN [63]. ...
Preprint
Full-text available
Couples' relationships affect the physical health and emotional well-being of partners. Automatically recognizing each partner's emotions could give a better understanding of their individual emotional well-being, enable interventions and provide clinical benefits. In the paper, we summarize and synthesize works that have focused on developing and evaluating systems to automatically recognize the emotions of each partner based on couples' interaction or conversation contexts. We identified 28 articles from IEEE, ACM, Web of Science, and Google Scholar that were published between 2010 and 2021. We detail the datasets, features, algorithms, evaluation, and results of each work as well as present main themes. We also discuss current challenges, research gaps and propose future research directions. In summary, most works have used audio data collected from the lab with annotations done by external experts and used supervised machine learning approaches for binary classification of positive and negative affect. Performance results leave room for improvement with significant research gaps such as no recognition using data from daily life. This survey will enable new researchers to get an overview of this field and eventually enable the development of emotion recognition systems to inform interventions to improve the emotional well-being of couples.
... Modeling humans' observable behaviors and hidden internal states computationally have gained tremendous interest in the engineering community. Recent works in various emerging fields, such as affective computing [1], social signal processing (SSP) [2], and behavioral signal processing (BSP) [3], have all made advancement in deriving novel computational algorithms to objectively quantify and automatically recognize humans' emotion (e.g., [4,5,6]), social behaviors (e.g., [7,8]), and various domain-specific behavior attributes (e.g., [9,10,11]) through the use of signal processing and machine learning techniques. In specific, the interdisciplinary field of BSP focuses on developing computational methods in close collaboration with domain experts such that the research outcomes could provide domain-sensitive decision-making tools for the experts. ...
... In specific, the interdisciplinary field of BSP focuses on developing computational methods in close collaboration with domain experts such that the research outcomes could provide domain-sensitive decision-making tools for the experts. Exemplary BSP works in mental health, e.g., couple therapy [9,12], addiction [13], and autism spectrum disorder [10,14], in professional acting, e.g., [15], and in education, e.g, literacy assessment [16], have all demonstrated that by applying BSP techniques in modeling human behaviors, it would result not only in new development of signal processing algorithms but also in promises of novel scientific insights. ...
... Binarizing labels is a common practice in several previously published related works in training machine learning systems to recognize subjective attributes [9,24]. The key idea to is allow the system to learn from the ground truth labels that are more reliable and consistent (extreme behaviors are easier to be rated consistently by humans). ...
... Traditional supervised behavior recognition systems mainly depend on two aspects: one is the representative feature of the target behavior and, the other is the choice of the classification model. To capture the vocal cues for behavior recognition, traditional computational approaches (Schuller, Steidl and Batliner, 2009;Black, Katsamanis, Baucom, Lee, Lammert, Christensen, Georgiou and Narayanan, 2013;Xia, Gibson, Xiao, Baucom and Georgiou, 2015;Li, Baucom and Georgiou, 2016;Nasir, Baucom, Georgiou and Narayanan, 2017) use a range of hand-crafted low-level descriptors (LLDs) (e.g., f0, intensity, MFCCs (Mel-Frequency Cepstral Coefficients) etc.) with statistical functionals (e.g., mean, median, standard deviation, etc.) to represent segment-or utterance-level features. Based on these raw acoustic LLDs and their functionals, classifiers such as Support Vector Machines (SVM), k-Nearest Neighbors (kNN) and Hidden Markov Models (HMM) etc. have been employed (Zeng, Pantic, Roisman and Huang, 2009;Hu, Xu and Wu, 2007;Schuller, Rigoll and Lang, 2004;El Ayadi, Kamel and Karray, 2011;Xia et al., 2015). ...
... The original 31 behavior codes were rated on a scale of 1-9, where 1 indicates the absence of the given behavior and 9 refers a strong presence. Similar to a previous study (Black et al., 2013), we utilize five of the behaviors by binarizing the top and bottom 20% of the original rating scores. A brief description of the behavior codes used in this work is listed in Table 1. ...
... For couples therapy data, since each session consists of a dyadic conversation and the behavior ratings are provided for each spouse individually, we need to diarize the interactions to obtain the speech regions for each person. We employ the pre-processing procedures described in the work (Black et al., 2013). In short, we select sessions with an Signal-to-noise ratio (SNR) above 5dB, and conduct Voice Activity Detection (VAD) and Speaker diarization. ...
Article
Full-text available
Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context and derive a manifold representation, where speech frames with similar behaviors are closer while frames of different behaviors maintain larger distances. The models are trained on movie audio data and validated on diverse domains including on a couples therapy corpus and other publicly collected data (e.g., stand-up comedy). With encouraging results, our proposed framework shows the feasibility of unsupervised learning within cross-domain behavioral modeling.
... Traditional supervised behavior recognition systems mainly depend on two aspects: one is the representative feature of the target behavior and, the other is the choice of the classification model. To capture the vocal cues for behavior recognition, traditional computational approaches (Schuller, Steidl and Batliner, 2009;Black, Katsamanis, Baucom, Lee, Lammert, Christensen, Georgiou and Narayanan, 2013;Xia, Gibson, Xiao, Baucom and Georgiou, 2015;Li, Baucom and Georgiou, 2016;Nasir, Baucom, Georgiou and Narayanan, 2017) use a range of hand-crafted low-level descriptors (LLDs) (e.g., f0, intensity, MFCCs (Mel-Frequency Cepstral Coefficients) etc.) with statistical functionals (e.g., mean, median, standard deviation, etc.) to represent segment-or utterance-level features. Based on these raw acoustic LLDs and their functionals, classifiers such as Support Vector Machines (SVM), k-Nearest Neighbors (kNN) and Hidden Markov Models (HMM) etc. have been employed (Zeng, Pantic, Roisman and Huang, 2009;Hu, Xu and Wu, 2007;Schuller, Rigoll and Lang, 2004;El Ayadi, Kamel and Karray, 2011;Xia et al., 2015). ...
... The original 31 behavior codes were rated on a scale of 1-9, where 1 indicates the absence of the given behavior and 9 refers a strong presence. Similar to a previous study (Black et al., 2013), we utilize five of the behaviors by binarizing the top and bottom 20% of the original rating scores. A brief description of the behavior codes used in this work is listed in Table 1. ...
... For couples therapy data, since each session consists of a dyadic conversation and the behavior ratings are provided for each spouse individually, we need to diarize the interactions to obtain the speech regions for each person. We employ the pre-processing procedures described in the work (Black et al., 2013). In short, we select sessions with an Signal-to-noise ratio (SNR) above 5dB, and conduct Voice Activity Detection (VAD) and Speaker diarization. ...
Preprint
Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context and derive a manifold representation, where speech frames with similar behaviors are closer while frames of different behaviors maintain larger distances. The models are trained on movie audio data and validated on diverse domains including on a couples therapy corpus and other publicly collected data (e.g., stand-up comedy). With encouraging results, our proposed framework shows the feasibility of unsupervised learning within cross-domain behavioral modeling.
... Behavior quantification from speech Behavioral signal processing (BSP) Georgiou et al., 2011b) can play a central role in informing human assessment and decision making, especially in assisting domain specialists to observe, evaluate and identify domain-specific human behaviors exhibited over longer time scales. For example, in couples therapy (Black et al., 2013;Nasir et al., 2017b), depression (Gupta et al., 2014;Nasir et al., 2016;Stasak et al., 2016;Tanaka et al., 2017) and suicide risk assessment (Cummins et al., 2015;Venek et al., 2017;Nasir et al., 2018Nasir et al., , 2017a, behavior analysis systems help psychologists observe and evaluate domain-specific behaviors during interactions. Li et al. (2016) proposed sparsely connected and disjointly trained deep neural networks to deal with the low-resource data issue in behavior understanding. ...
... Behavioral dataset pre-processing For preprocessing the couples therapy corpus we employ the procedure described in (Black et al., 2013). The main steps are Speech Activity Detection (SAD) and diarization. ...
... During testing phase, a leave-test-couples-out process is employed to ensure separation of speaker, dyad, and interaction topics. More details of the preprocessing steps can be found in (Black et al., 2013). ...
Preprint
Human behavior refers to the way humans act and interact. Understanding human behavior is a cornerstone of observational practice, especially in psychotherapy. An important cue of behavior analysis is the dynamical changes of emotions during the conversation. Domain experts integrate emotional information in a highly nonlinear manner, thus, it is challenging to explicitly quantify the relationship between emotions and behaviors. In this work, we employ deep transfer learning to analyze their inferential capacity and contextual importance. We first train a network to quantify emotions from acoustic signals and then use information from the emotion recognition network as features for behavior recognition. We treat this emotion-related information as behavioral primitives and further train higher level layers towards behavior quantification. Through our analysis, we find that emotion-related information is an important cue for behavior recognition. Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks' contextual view of the data. This demonstrates that the sequence of emotions is critical in behavior expression. To achieve these frameworks we employ hybrid architectures of convolutional networks and recurrent networks to extract emotion-related behavior primitives and facilitate automatic behavior recognition from speech.
... Recent technological advances have given rise to a digital healthcare era with numerous applications focusing on mental health [10]. Automatic behavioral coding, in particular, has drawn a lot of interest over the last few years (e.g., [11][12][13][14][15]) and holds promise for more efficient training, more effective supervision, and more positive clinical outcomes. However, despite being one of the most dominant psychotherapy interventions, the literature focusing on computational analysis and evaluation of CBT sessions is relatively scarce, partly because of limited available data. ...
... There has been an increasing interest in developing systems for automatic psychotherapy evaluation over the last few years, focusing on both acoustic (e.g., [11,57]) and textual information. Depending on the domain, coding procedures may be applied at different resolutions, i.e., at the utterance (e.g., [58,59]) or at the session level. ...
Article
Full-text available
During a psychotherapy session, the counselor typically adopts techniques which are codified along specific dimensions (e.g., ‘displays warmth and confidence’, or ‘attempts to set up collaboration’) to facilitate the evaluation of the session. Those constructs, traditionally scored by trained human raters, reflect the complex nature of psychotherapy and highly depend on the context of the interaction. Recent advances in deep contextualized language models offer an avenue for accurate in-domain linguistic representations which can lead to robust recognition and scoring of such psychotherapy-relevant behavioral constructs, and support quality assurance and supervision. In this work, we propose a BERT-based model for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT), where prior work is limited to frequency-based language features and/or short text excerpts which do not capture the unique elements involved in a spontaneous long conversational interaction. The model focuses on the classification of therapy sessions with respect to the overall score achieved on the widely-used Cognitive Therapy Rating Scale (CTRS), but is trained in a multi-task manner in order to achieve higher interpretability. BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements. We train and evaluate our models on a set of 1,118 real-world therapy sessions, recorded and automatically transcribed. Our best model achieves an F1 score equal to 72.61% on the binary classification task of low vs. high total CTRS.
... Such efforts span a wide range of psychotherapeutic approaches including couples therapy (Black et al., 2013), MI (Xiao, Can, et al., 2016) and cognitive behavioral therapy (Flemotomos, Martinez, et al., 2018), used to treat a variety of conditions such as addiction (Xiao, Can, et al., 2016) and post-traumatic stress disorder (Shiner et al., 2012). Both text-based (Imel, Steyvers, & Atkins, 2015;Xiao, Can, Georgiou, Atkins, & Narayanan, 2012) and audio-based (Black et al., 2013;Xiao et al., 2014) behavioral descriptors have been explored in the literature and have been used either unimodally or in combination with each other (Singla et al., 2018). ...
... Such efforts span a wide range of psychotherapeutic approaches including couples therapy (Black et al., 2013), MI (Xiao, Can, et al., 2016) and cognitive behavioral therapy (Flemotomos, Martinez, et al., 2018), used to treat a variety of conditions such as addiction (Xiao, Can, et al., 2016) and post-traumatic stress disorder (Shiner et al., 2012). Both text-based (Imel, Steyvers, & Atkins, 2015;Xiao, Can, Georgiou, Atkins, & Narayanan, 2012) and audio-based (Black et al., 2013;Xiao et al., 2014) behavioral descriptors have been explored in the literature and have been used either unimodally or in combination with each other (Singla et al., 2018). ...
Article
Full-text available
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is, however, a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called “motivational interviewing”, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist’s vs. client’s talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients’ perspective). We describe our platform and its performance using a dataset of more than 5000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts’ capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.
... Recent technological advances have given rise to a digital healthcare era with numerous applications focusing on mental health [11]. Automatic behavioral coding, in particular, has drawn a lot of interest over the last few years (e.g., [12][13][14][15][16]) and holds promise for more efficient training, more effective supervision, and more positive clinical outcomes. However, despite being one of the most dominant psychotherapy interventions, the literature focusing on computational analysis and evaluation of CBT sessions is relatively scarce, partly because of limited available data. ...
... There has been an increasing interest in developing systems for automatic psychotherapy evaluation over the last few years, focusing on both acoustic (e.g., [12,39]) and textual information. Depending on the domain, coding procedures may be applied at different resolutions, i.e., at the utterance (e.g., [40,41]) or at the session level. ...
Preprint
Full-text available
During a psychotherapy session, the counselor typically adopts techniques which are codified along specific dimensions (e.g., 'displays warmth and confidence', or 'attempts to set up collaboration') to facilitate the evaluation of the session. Those constructs, traditionally scored by trained human raters, reflect the complex nature of psychotherapy and highly depend on the context of the interaction. Recent advances in deep contextualized language models offer an avenue for accurate in-domain linguistic representations which can lead to robust recognition and scoring of such psychotherapy-relevant behavioral constructs, and support quality assurance and supervision. In this work, a BERT-based model is proposed for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT), where prior work is limited to frequency-based language features and/or short text excerpts which do not capture the unique elements involved in a spontaneous long conversational interaction. The model is trained in a multi-task manner in order to achieve higher interpretability. BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements.
... Such efforts span a wide range of psychotherapeutic approaches including couples therapy (Black et al., 2013), MI (Xiao, Can, et al., 2016) and cognitive behavioral therapy (Flemotomos, Martinez, et al., 2018), used to treat a variety of conditions such as addiction (Xiao, Can, et al., 2016) and post-traumatic stress disorder (Shiner et al., 2012). Both text-based (Imel, Steyvers, & Atkins, 2015;Xiao, Can, Georgiou, Atkins, & Narayanan, 2012) and audio-based (Black et al., 2013;Xiao et al., 2014) behavioral descriptors have been explored in the literature and have been used either unimodally or in combination with each other (Singla et al., 2018). ...
... Such efforts span a wide range of psychotherapeutic approaches including couples therapy (Black et al., 2013), MI (Xiao, Can, et al., 2016) and cognitive behavioral therapy (Flemotomos, Martinez, et al., 2018), used to treat a variety of conditions such as addiction (Xiao, Can, et al., 2016) and post-traumatic stress disorder (Shiner et al., 2012). Both text-based (Imel, Steyvers, & Atkins, 2015;Xiao, Can, Georgiou, Atkins, & Narayanan, 2012) and audio-based (Black et al., 2013;Xiao et al., 2014) behavioral descriptors have been explored in the literature and have been used either unimodally or in combination with each other (Singla et al., 2018). ...
Preprint
Full-text available
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care, in order to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is however a cost-prohibitive and time-consuming method which leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called Motivational Interviewing, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance, using a dataset of more than 5,000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. We are confident that a widespread use of automated psychotherapy rating tools in the near future will augment experts' capabilities by providing an avenue for more effective training and skill improvement and will eventually lead to more positive clinical outcomes.
... In the most recent years, methodologies including Machine Learning (ML) techniques have begun to be seen as potentially useful for the field of psychology. Some researchers have already discussed the possible applications of ML in clinical psychology [16] and other more general branches of psychology [17]; the aspect of greatest interest lies in identifying behavioral descriptors useful for predicting or monitoring therapy for specific situations [18][19][20][21]. Shatte et al. [22] identified more than 190 studies that have applied ML for detecting and diagnosing mental disorders, and more than 60 aimed at predicting their progression over time, as well as exploring computerized support for their management [11]. ...
Article
Full-text available
Despite their diverse assumptions, clinical psychology approaches share the goal of mental health promotion. The literature highlights their usefulness, but also some issues related to their effectiveness, such as their difficulties in monitoring psychological change. The elective strategy for activating and managing psychological change is the clinical question. But how do different types of questions foster psychological change? This work tries to answer this issue by studying therapist–patient interactions with a ML model for text analysis. The goal was to investigate how psychological change occurs thanks to different types of questions, and to see if the ML model recognized this difference in analyzing patients’ answers to therapists’ clinical questions. The experimental dataset of 14,567 texts was divided based on two different question purposes, splitting answers in two categories: those elicited by questions asking patients to start describing their clinical situation, or those from asking them to detail how they evaluate their situation and mental health condition. The hypothesis that these categories are distinguishable by the model was confirmed by the results, which corroborate the different valences of the questions. These results foreshadow the possibility to train ML and AI models to suggest clinical questions to therapists based on patients’ answers, allowing the increase of clinicians’ knowledge, techniques, and skills.
... These models' ability to meaningfully interact with novel user input suggests that computerized tools for psychotherapy training may well be on the horizon (Aafjes-van Doorn et al., 2021;Creed et al., 2022;Imel et al., 2019;Kasneci et al., 2023). Over the last several years, machine learning methods have been successfully used for extracting meaningful features of text relevant to various populations and settings, including couple therapy (Black et al., 2013), university counseling (Kuo et al., 2024), motivational interviewing (Xiao et al., 2016), cognitive behavioral therapy (Flemotomos et al., 2018), and posttraumatic stress disorder (Shiner et al., 2012). ...
Article
Full-text available
The value of skillfully adopting a multicultural orientation (MCO) in psychotherapy has been increasingly recognized. Deliberate practice methods may be helpful in developing this capacity, but limited opportunities for practice and feedback exist. The present study provided an initial test of the feasibility, usability, and acceptability of a self-guided, web-based deliberate practice tool designed to support the development of therapists’ MCO: MCO Deliberate Practice Online (MCO-O). This tool included brief didactic instructions along with opportunities to practice responding to video vignettes of actors portraying clients discussing cultural topics in psychotherapy. A sample of therapists and trainees (n = 287) visited the MCO-O website and consented to the study. Recruitment through emails to listservs and a webinar was highly feasible. Quantitative ratings of usability were modest. Quantitative metrics of acceptability were also modest, with a minority of participants (18.8%) visiting the MCO-O website more than once and 51.2% of participants viewing two or more of the video vignettes. Younger participants found the MCO-O website more usable, and having MCO-O assigned was associated with watching more videos, when controlling for participant demographics. Qualitative themes included a mixture of positive feedback along with critiques and confusion regarding the MCO-O website. Taken together, results highlight the potential of this approach along with important limitations. Ultimately, it may prove difficult for therapists and trainees to engage in self-guided MCO training, particularly if using software tools that have not undergone extensive (and potentially resource intensive) user experience testing and development.
... For instance, many studies in this review used behavioral coding schemes to assess verbal content. Such schemes require the development of detailed manuals, reliability training and testing, and often many hours of manual coding (Black et al., 2013). Given the considerable effort to collect and make sense of language data, it is unsurprising that research has been slow-going. ...
Article
Full-text available
Background Language is a fundamental aspect of human social behavior that is linked to many rewarding social experiences, such as social bonding. Potential effects of alcohol on affiliative language may therefore be an essential feature of alcohol reward and may elucidate pathways through which alcohol is linked to social facilitation. Examinations of alcohol's impact on language content, however, are sparse. Accordingly, this investigation represents the first systematic review and meta‐analysis of alcohol's effects on affiliative language. We test the hypothesis that alcohol increases affiliative verbal approach behaviors and discuss future research directions. Methods PsycInfo and Web of Science were systematically searched in March 2023 according to our preregistered plan. Eligible studies included social alcohol administration experiments in which affiliative verbal language was assessed. We present a random‐effects meta‐analysis that examines the effect of alcohol compared to control on measures of affiliative verbal behavior. Results Our search identified 16 distinct investigations (comprising 961 participants) that examined the effect of alcohol on affiliative verbal behavior. Studies varied greatly in methods and measures. Meta‐analytic results demonstrated that alcohol is modestly associated with increases in affiliative verbal behavior (Hedges' g = 0.164, 95% CI [0.027, 0.301], p = 0.019). Study quality was rated using an adapted version of the Quality Assessment Tool for Quantitative Studies and did not significantly moderate alcohol's effects. Conclusions This study provides preliminary evidence that alcohol can increase affiliative verbal behaviors. This effect may be an important feature of alcohol reward. Given heterogeneity in study features, low study quality ratings, and limited reporting of effect size data, results simultaneously highlight the promise of this research area and the need for more work. Advances in language processing methodologies that could allow future work to systematically expand upon this finding are discussed.
... We aim at building upon our current system to incorporate these measures. Studies link acoustic [17,18], visual cues [19,20] etc. to human behavior and one can incorporate such cues to supplement the system. Also, we aim to further investigate other aspects of laughters (e.g. ...
... Beyond informing designs of intuitive and natural user interfaces, computational tools and models of human interactions can inform research and practice across a variety of behaviorcentered domains such as mental health. The emerging field of Behavioral Signal Processing [4] offers encouraging results towards such a direction: e.g., developing predictive models of affective behaviors in distressed married couples' interactions using multimodal signals including acoustic [5], lexical [6], visual [7], and vocal entrainment cues [8], as well as jointly modeling both the child and the psychologist in interactive diagnostic settings for Autism [9]. The approach relies on integrating domain knowledge and engineering; e.g., feature design and machine learning methods are guided by domain knowledge, and experimental results in turn validate the effects of these multimodal features and algorithms on real datasets, hence offering new insights about the interaction mechanisms. ...
... Coding systems differ in how they resolve such ambiguities. On the one hand, a system might restrict attention to readily observable emotion cues, as in automated analysis of affect based on word valence (Baek, Cappella, & Findman, 2011), facial expressions (Cohn & Sayette, 2010), or acoustic features of speech (Black et al., 2013). Alternatively, coders might identify emotions from context, based on their own implicit cultural knowledge and experience. ...
... Compared with male speech, female speech is marked by a higher pitch. A study found that gender-specific models were more accurate in detecting sadness and positive and negative affect than gender-independent models [34]. Depressed speech is marked by reduced pitch, whereas anxious speech is marked by increased pitch [10]. ...
Article
Full-text available
Background Artificial intelligence tools have the potential to objectively identify youth in need of mental health care. Speech signals have shown promise as a source for predicting various psychiatric conditions and transdiagnostic symptoms. Objective We designed a study testing the association between obsessive-compulsive disorder (OCD) diagnosis and symptom severity on vocal features in children and adolescents. Here, we present an analysis plan and statistical report for the study to document our a priori hypotheses and increase the robustness of the findings of our planned study. Methods Audio recordings of clinical interviews of 47 children and adolescents with OCD and 17 children and adolescents without a psychiatric diagnosis will be analyzed. Youths were between 8 and 17 years old. We will test the effect of OCD diagnosis on computationally derived scores of vocal activation using ANOVA. To test the effect of OCD severity classifications on the same computationally derived vocal scores, we will perform a logistic regression. Finally, we will attempt to create an improved indicator of OCD severity by refining the model with more relevant labels. Models will be adjusted for age and gender. Model validation strategies are outlined. Results Simulated results are presented. The actual results using real data will be presented in future publications. Conclusions A major strength of this study is that we will include age and gender in our models to increase classification accuracy. A major challenge is the suboptimal quality of the audio recordings, which are representative of in-the-wild data and a large body of recordings collected during other clinical trials. This preregistered analysis plan and statistical report will increase the validity of the interpretations of the upcoming results. International Registered Report Identifier (IRRID) DERR1-10.2196/39613
... To overcome such problems related to manual annotation and coding, computational approaches for modeling and assessing the quality of conversation-based behavioral signals (?) have been recently developed and used in multiple clinical domains such as for autism diagnosis (Bone et al., 2016), understanding oncology communication (Alam et al., 2018;Chen et al., 2020b), and supporting primary care (Park et al., 2019). A great amount of work has been particularly focused on psychotherapy interactions, including addiction counseling (Can et al., 2015;Xiao et al., 2016;Gibson et al., 2017;Singla et al., 2018;Tavabi et al., 2020) and couple therapy sessions (Black et al., 2013;Tseng et al., 2017). Those methods have focused on predicting both utterance-level and globally-coded, session-level behaviors. ...
Article
Full-text available
Text-based computational approaches for assessing the quality of psychotherapy are being developed to support quality assurance and clinical training. However, due to the long durations of typical conversation based therapy sessions, and due to limited annotated modeling resources, computational methods largely rely on frequency-based lexical features or dialogue acts to assess the overall session level characteristics. In this work, we propose a hierarchical framework to automatically evaluate the quality of transcribed Cognitive Behavioral Therapy (CBT) interactions. Given the richly dynamic nature of the spoken dialog within a talk therapy session, to evaluate the overall session level quality, we propose to consider modeling it as a function of local variations across the interaction. To implement that empirically, we divide each psychotherapy session into conversation segments and initialize the segment-level qualities with the session-level scores. First, we produce segment embeddings by fine-tuning a BERT-based model, and predict segment-level (local) quality scores. These embeddings are used as the lower-level input to a Bidirectional LSTM-based neural network to predict the session-level (global) quality estimates. In particular, we model the global quality as a linear function of the local quality scores, which allows us to update the segment-level quality estimates based on the session-level quality prediction. These newly estimated segment-level scores benefit the BERT fine-tuning process, which in turn results in better segment embeddings. We evaluate the proposed framework on automatically derived transcriptions from real-world CBT clinical recordings to predict session-level behavior codes. The results indicate that our approach leads to improved evaluation accuracy for most codes when used for both regression and classification tasks.
... Various works have used linguistic features (i.e., what has been said) and paralinguistic features (i.e., how it was said) to predict the emotions of each partner in couples interactions more broadly [3,4,8,9,19,20,22,[28][29][30]32] and in conflict interactions in particular [5,10]. Most of these works have used observer ratings (perceived emotions) as labels rather than self-reports (one's actual emotions). ...
Conference Paper
Full-text available
How romantic partners interact with each other during a conflict influences how they feel at the end of the interaction and is predictive of whether the partners stay together in the long term. Hence understanding the emotions of each partner is important. Yet current approaches that are used include self-reports which are burdensome and hence limit the frequency of this data collection. Automatic emotion prediction could address this challenge. Insights from psychology research indicate that partners’ behaviors influence each other’s emotions in conflict interaction and hence, the behavior of both partners could be considered to better predict each partner’s emotion. However, it is yet to be investigated how doing so compares to only using each partner’s own behavior in terms of emotion prediction performance. In this work, we used BERT to extract linguistic features (i.e., what partners said) and openSMILE to extract paralinguistic features (i.e., how they said it) from a data set of 368 German-speaking Swiss couples (N = 736 individuals) who were videotaped during an 8-minutes conflict interaction in the laboratory. Based on those features, we trained machine learning models to predict if partners feel positive or negative after the conflict interaction. Our results show that including the behavior of the other partner improves the prediction performance. Furthermore, for men, considering how their female partners spoke is most important and for women considering what their male partner said is most important in getting better prediction performance. This work is a step towards automatically recognizing each partners’ emotion based on the behavior of both, which would enable a better understanding of couples in research, therapy, and the real world.
... In contrast, the current task does not use absolute duration features and has different task goals, and thus requires different methodology. Finally, this analysis complements other studies of human behavior [11,12,13]; like some such studies [14], we plan to examine cognitive processes of children with autism using collected RAN data, now that methods to establish normal patterns have begun to be explored. ...
... BSP models high-level human behavioral constructs using novel computational frameworks in order to support and supplement a domain-expert's decisionmaking process. Applications in mental health include: autism spectrum disorder [7], addiction [8], and couple's therapy [9]. ...
... To overcome such problems related to manual annotation and coding, computational approaches for modeling and assessing the quality of conversation-based behavioral signals have been recently developed and used in multiple clinical domains such as for autism diagnosis (Bone et al., 2016), understanding oncology communication (Alam et al., 2018;Chen et al., 2020), and supporting primary care (Park et al., 2019). A great amount of work has been particularly focused on psychotherapy interactions, including addiction counseling (Can et al., 2015;Xiao et al., 2016;Gibson et al., 2017;Singla et al., 2018;Tavabi et al., 2020) and couple therapy sessions (Black et al., 2013;Tseng et al., 2017). Those methods have focused on predicting both utterance-level and globally-coded, session-level behaviors. ...
Preprint
Full-text available
Computational approaches for assessing the quality of conversation-based psychotherapy, such as Cognitive Behavioral Therapy (CBT) and Motivational Interviewing (MI), have been developed recently to support quality assurance and clinical training. However, due to the long session lengths and limited modeling resources, computational methods largely rely on frequency-based lexical features or distribution of dialogue acts. In this work, we propose a hierarchical framework to automatically evaluate the quality of a CBT interaction. We divide each psychotherapy session into conversation segments and input those into a BERT-based model to produce segment embeddings. We first fine-tune BERT for predicting segment-level (local) quality scores and then use segment embeddings as lower-level input to a Bidirectional LSTM-based neural network to predict session-level (global) quality estimates. In particular, the segment-level quality scores are initialized with the session-level scores and we model the global quality as a function of the local quality scores to achieve the accurate segment-level quality estimates. These estimated segment-level scores benefit theBERT fine-tuning and in learning better segment embeddings. We evaluate the proposed framework on data drawn from real-world CBT clinical session recordings to predict multiple session-level behavior codes. The results indicate that our approach leads to improved evaluation accuracy for most codes in both regression and classification tasks.
... Various works have used linguistic features (i.e., what has been said) and paralinguistic features (i.e., how it was said) to predict the emotions of each partner in couples interactions more broadly [3,4,8,9,19,20,22,[27][28][29]31] and in conflict interactions in particular [5,10]. Most of these works have used observer ratings (perceived emotions) as labels rather than self-reports (one's actual emotions). ...
Preprint
Full-text available
How romantic partners interact with each other during a conflict influences how they feel at the end of the interaction and is predictive of whether the partners stay together in the long term. Hence understanding the emotions of each partner is important. Yet current approaches that are used include self-reports which are burdensome and hence limit the frequency of this data collection. Automatic emotion prediction could address this challenge. Insights from psychology research indicate that partners' behaviors influence each other's emotions in conflict interaction and hence, the behavior of both partners could be considered to better predict each partner's emotion. However, it is yet to be investigated how doing so compares to only using each partner's own behavior in terms of emotion prediction performance. In this work, we used BERT to extract linguistic features (i.e., what partners said) and openSMILE to extract paralinguistic features (i.e., how they said it) from a data set of 368 German-speaking Swiss couples (N = 736 individuals) which were videotaped during an 8-minutes conflict interaction in the laboratory. Based on those features, we trained machine learning models to predict if partners feel positive or negative after the conflict interaction. Our results show that including the behavior of the other partner improves the prediction performance. Furthermore, for men, considering how their female partners spoke is most important and for women considering what their male partner said is most important in getting better prediction performance. This work is a step towards automatically recognizing each partners' emotion based on the behavior of both, which would enable a better understanding of couples in research, therapy, and the real world.
... Given the inherent complexity of behavior, thin-slice methods ease the burden of behavioral measurement because measuring behavior is an arduous task. Various researchers' descriptions of dynamic behavioral coding include: "timeconsuming, " "labor-intensive, " "tedious, " "costly, " "complex, " "challenging, " "painstaking, " "mentally-straining, " "inefficient, " "serious commitment, " and "daunting, " among many other unfavorable terms (Gosling et al., 1998;Murphy, 2005;Black et al., 2013;Fujiwara and Daibo, 2014;Carcone et al., 2015). One way researchers deal with the time-consuming nature of behavioral coding is to ask coders or raters to watch or listen for several behaviors at the same time-for example, to simultaneously count smiles and head tilts, or to simultaneously rate anger, anxiety, and sadness (e.g., Wang et al., 2020). ...
Article
Full-text available
Thin slices are used across a wide array of research domains to observe, measure, and predict human behavior. This article reviews the thin-slice method as a measurement technique and summarizes current comparative thin-slice research regarding the reliability and validity of thin slices to represent behavior or social constructs. We outline decision factors in using thin-slice behavioral coding and detail three avenues of thin-slice comparative research: (1) assessing whether thin slices can adequately approximate the total of the recorded behavior or be interchangeable with each other (representativeness); (2) assessing how well thin slices can predict variables that are different from the behavior measured in the slice (predictive validity), and (3) assessing how interpersonal judgment accuracy can depend on the length of the slice (accuracy-length validity). The aim of the review is to provide information researchers may use when designing and evaluating thin-slice behavioral measurement.
... Some leveraged interaction dynamics among the partners (e.g. entrainment -synchrony between partners) [2,28,29] and salient instances [16,17,26] to perform recognition. These works tend to use emotion labels from external raters rather than the couples and hence do not reflect the subjective emotions of the couples. ...
Conference Paper
Full-text available
Extensive couples’ literature shows that how couples feel after a conflict is predicted by certain emotional aspects of that conver- sation. Understanding the emotions of couples leads to a better understanding of partners’ mental well-being and consequently their relationships. Hence, automatic emotion recognition among couples could potentially guide interventions to help couples im- prove their emotional well-being and their relationships. It has been shown that people’s global emotional judgment after an experience is strongly influenced by the emotional extremes and ending of that experience, known as the peak-end rule. In this work, we leveraged this theory and used machine learning to investigate, which au- dio segments can be used to best predict the end-of-conversation emotions of couples. We used speech data collected from 101 Dutch- speaking couples in Belgium who engaged in 10-minute long con- versations in the lab. We extracted acoustic features from (1) the audio segments with the most extreme positive and negative rat- ings, and (2) the ending of the audio. We used transfer learning in which we extracted these acoustic features with a pre-trained convolutional neural network (YAMNet). We then used these fea- tures to train machine learning models — support vector machines — to predict the end-of-conversation valence ratings (positive vs negative) of each partner. The results of this work could inform how to best recognize the emotions of couples after conversation- sessions and eventually, lead to a better understanding of couples’ relationships either in therapy or in everyday life.
... Compared to the existing literature for behavioral code prediction using linguistic content, the multimodal domain remains relatively less explored. Black et al. [2] uses speech prosody features toward measuring different emotional cues within sessions of married couples partaking in problem-solving interactions. They use prosodic, spectral, and voice quality features to capture global acoustic properties for each spouse and trained gender-specific and gender-independent classifiers for classification of the extreme instances into six selected codes (e.g., "low" versus "high" blame). ...
Conference Paper
Motivational Interviewing (MI) is defined as a collaborative conversation style that evokes the client's own intrinsic reasons for behavioral change. In MI research, the clients' attitude (willingness or resistance) toward change as expressed through language, has been identified as an important indicator of their subsequent behavior change. Automated coding of these indicators provides systematic and efficient means for the analysis and assessment of MI therapy sessions. In this paper, we study and analyze behavioral cues in client language and speech that bear indications of the client's behavior toward change during a therapy session, using a database of dyadic motivational interviews between therapists and clients with alcohol-related problems. Deep language and voice encoders, i.e., BERT and VGGish, trained on large amounts of data are used to extract features from each utterance. We develop a neural network to automatically detect the MI codes using both the clients' and therapists' language and clients' voice, and demonstrate the importance of semantic context in such detection. Additionally, we develop machine learning models for predicting alcohol-use behavioral outcomes of clients through language and voice analysis. Our analysis demonstrates that we are able to estimate MI codes using clients' textual utterances along with preceding textual context from both the therapist and client, reaching an F1-score of 0.72 for a speaker-independent three-class classification. We also report initial results for using the clients' data for predicting behavioral outcomes, which outlines the direction for future work.
... Facial expressions were informed by the Facial Action Coding System, such that the SUPER Scale includes descriptions of relevant facial action units from the Facial Action Coding System [41][42][43]. Body posture cues were informed by existing coding schemes of emotion and affect in the context of social interactions [44,45], and from observational coding systems from the pediatric pain literature [2,32,46]. Thus, for each emotive response of fear, warmth, disengagement, and humor several nonverbal behaviors were generated and grouped into facial, vocal and body/posture cues indicative of each emotive response. ...
Article
Full-text available
Aim: Fully illuminating mechanisms relating parent behaviors to child pain require examining both verbal and nonverbal communication. We conducted a multimethod investigation into parent nonverbal communication and physiology, and investigated the psychometric properties of the Scheme for Understanding Parent Emotive Responses Scale to assess parent nonverbals accompanying reassurance and distraction. Materials & methods: 23 children (7–12 years) completed the cold pressor task with their parent (predominately mothers). Parent heart rate and heart rate variability were monitored and assessed. The Scheme for Understanding Parent Emotive Responses Scale coding of parent nonverbal behaviors (i.e., vocal cues, facial expressions, posture) was used to detect levels of fear, warmth, disengagement and humor. Results & conclusion: Preliminary evidence for the psychometric properties of the scale are offered. Parent reassurance was associated with more fear, less warmth and less humor compared with distraction.
... They have been well analyzed using ML-based computational approaches that have been found to be useful across a variety of behavioral and health domains [17,18]. For instance, in Couples Therapy [15], multiple works have effectively quantified behaviors related to speakers' mental states such as Blame, Positive and Sadness using the speaker's language [19] and vocal traits [20]. Similarly, in Cancer Care [21] interactions, lexical and acoustic cues have been found to be useful in predicting Hostile and Positive behaviors [22]. ...
... Couples' communication efficiency may be a product of their motivation to coordinate their behaviors to achieve a short-term goal (e.g., where to go for a destination vacation). Previous studies have identified the positive influence of coordinative behaviors on relationship outcomes, including collaborative dialogue (e.g., "we" pronoun use; Biesen, Schooler, & Smith, 2016;Rentscher, Rohrbaugh, Shoham, & Mehl, 2013), similarities in speech rate (Aguilar et al., 2016;Black et al., 2013;Cannava & Bodie, 2017;Manson, Bryant, Gervais, & Kline, 2013), behaviors (Aguilar et al., 2016), and language styles (Duff et al., 2011;Ireland & Henderson, 2014;Ireland & Pennebaker, 2010;Ireland et al., 2011;Kovacs & Kleinbaum, 2020). Taken together, findings suggest that these behaviors may maximize both goal and relationship outcomes through their positive impact on partner perceptions of their levels of compassion, perspective-taking, and responsivity (Reis & Shaver, 1988;Schramm et al., 2017;Wallace Goddard, Olson, Galovan, Schramm, & Marshall, 2016). ...
Article
The speed, or efficiency, in which people communicate is linked to positive interpersonal outcomes. However, no studies of communication efficiency have examined romantic partners, making it unclear whether efficient communication is linked to relationship satisfaction above and beyond previously identified communication skills (e.g., problem-solving). We recruited dating couples (N = 56) to attend a laboratory session to complete survey measures and a collaborative communication task. Multilevel models demonstrated that both task efficiency (β = −.36, p = .04) and self-reported problem- solving communication skills (β = .28, p = .002) were associated with relationship satisfaction. Results suggest that communication task efficiency can be meaningfully applied to the study of romantic relationships and couple communication skills.
... Extant studies have typically relied on teams of trained coders to identify key behaviors while maintaining standards of interrater reliability (i.e., agreement between two coders; Kerig & Baucom, 2004). Yet researchers are continually seeking to reduce bias associated with subjective evaluation through incorporating sophisticated methods like artificial intelligence as well as individual and dyadic eye-tracking (e.g., Black et al., 2013;Campbell et al., 2014). ...
Chapter
Humans are a social species, wired for relationships. The presence or absence of another has salient effects on human responding. This chapter discusses important considerations for dyadic research, including key concepts and theories, common designs and measures, recent innovations, and unique challenges. Attachment theory is foundational for understanding a range of dyadic relationships, including child-caregiver, peer, and romantic couples. Social baseline theory provides further context regarding how humans utilize relationships to enhance survival potential. Numerous dyadic methods and measures exist, though fewer are designed for peer relationships. Recent innovations have focused on automated coding methods, vocal pitch analysis, and cutting-edge statistics. Common obstacles for dyadic research include participant scheduling, ethical concerns, complex research paradigms, and data set configuration.
... They have been well analyzed using ML-based computational approaches that have been found to be useful across a variety of behavioral and health domains [17,18]. For instance, in Couples Therapy [15], multiple works have effectively quantified behaviors related to speakers' mental states such as Blame, Positive and Sadness using the speaker's language [19] and vocal traits [20]. Similarly, in Cancer Care [21] interactions, lexical and acoustic cues have been found to be useful in predicting Hostile and Positive behaviors [22]. ...
Preprint
Full-text available
Suicide is a major societal challenge globally, with a wide range of risk factors, from individual health, psychological and behavioral elements to socio-economic aspects. Military personnel, in particular, are at especially high risk. Crisis resources, while helpful, are often constrained by access to clinical visits or therapist availability, especially when needed in a timely manner. There have hence been efforts on identifying whether communication patterns between couples at home can provide preliminary information about potential suicidal behaviors, prior to intervention. In this work, we investigate whether acoustic, lexical, behavior and turn-taking cues from military couples' conversations can provide meaningful markers of suicidal risk. We test their effectiveness in real-world noisy conditions by extracting these cues through an automatic diarization and speech recognition front-end. Evaluation is performed by classifying 3 degrees of suicidal risk: none, ideation, attempt. Our automatic system performs significantly better than chance in all classification scenarios and we find that behavior and turn-taking cues are the most informative ones. We also observe that conditioning on factors such as speaker gender and topic of discussion tends to improve classification performance.
... Specifications of data used in this paper Consistent with previous work [5,7,[53][54][55], for each participant and behavior, we take the average of the annotators' ratings as the true rating in that session. Therefore, each speaker's data sample consists of the manual transcription of their utterances and their behavior ratings in that session. ...
Preprint
Full-text available
Automatic quantification of human interaction behaviors based on language information has been shown to be effective in psychotherapy research domains such as marital therapy and cancer care. Existing systems typically use a moving-window approach where the target behavior construct is first quantified based on observations inside a window, such as a fixed number of words or turns, and then integrated over all the windows in that interaction. Given a behavior of interest, it is important to employ the appropriate length of observation, since too short a window might not contain sufficient information. Unfortunately, the link between behavior and observation length for lexical cues has not been well studied and it is not clear how these requirements relate to the characteristics of the target behavior construct. Therefore, in this paper, we investigate how the choice of window length affects the efficacy of language-based behavior quantification, by analyzing (a) the similarity between system predictions and human expert assessments for the same behavior construct and (b) the consistency in relations between predictions of related behavior constructs. We apply our analysis to a large and diverse set of behavior codes that are used to annotate real-life interactions and find that behaviors related to negative affect can be quantified from just a few words whereas those related to positive traits and problem solving require much longer observation windows. On the other hand, constructs that describe dysphoric affect do not appear to be quantifiable from language information alone, regardless of how long they are observed. We compare our findings with related work on behavior quantification based on acoustic vocal cues as well as with prior work on thin slices and human personality predictions and find that, in general, they are in agreement.
... There are various existing empirically supported approaches to assessing emotion regulation as a dyadic process that could be used in future studies replicating the APIMeM from the current study to supplement the DERS as a self-report measure, including combining self-report and partner-report scores about specific dyadic regulation behaviors (Horn & Maercker, 2016), using observation and behavior coding of audiovisual recordings with facial expressions and body language as indicators of distress (Jahromi, Meek, Ober-Reynolds, 2012), or even using speech acoustic features to behaviorally code interactions between partners (M. P. Black et al., 2013). ...
Article
According to adult attachment theory, levels of insecure attachment—both anxious and avoidant—are associated with abilities to regulate emotions in a relational context. This study is the first to test emotion dysregulation as a mediator of the association between levels of insecure attachment and psychological aggression using dyadic data. Cross-sectional self-report data were collected from 124 couples presenting for couple or family therapy at an outpatient clinic. Path analysis was used to analyze an actor–partner interdependence mediational model. Results did not support emotion dysregulation mediating the association between level of anxious attachment and psychological aggression, or the association between level of avoidant attachment and psychological aggression. Results indicated a direct actor effect between level of anxious attachment and psychological aggression for women (β = .19, p = .045) and men (β = .19, p = .027). Direct partner effects between people’s own levels of anxious attachment and their partners’ psychological aggression for women (β = .28, p = .001) and men (β = .33, p = .001) were also identified. Results also indicated direct actor effects between anxious attachment and emotion dysregulation in both women (β = .51, p < .001) and men (β = .58, p < .001), whereas direct actor effects between avoidant attachment and emotion dysregulation were only identified among women (β = .32, p < .001). Results suggest that increasing partners’ abilities to effectively manage their own maladaptive attachment-related behaviors may decrease levels of psychological aggression between partners. Limitations and clinical implications for couple therapists are discussed.
... These challenges hint at the potential benefits of involving automatic behavior annotation, in which data-driven machine learning techniques are employed to automatically extract behavioral information directly from data, rather than relying on time-consuming and expensive annotations from human experts. Such behavior analysis work has been shown to be effective at identifying behaviors during interactions in domains such as couple therapy [7,8,9], depression [10,11,12] and suicide risk assessment [13,14,15,16]. However, due to potential domain mismatch, obtaining accurate performance in one domain by utilizing well-trained behavior analysis systems from a different domain is not straightforward. ...
... These challenges hint at the potential benefits of involving automatic behavior annotation, in which data-driven machine learning techniques are employed to automatically extract behavioral information directly from data, rather than relying on time-consuming and expensive annotations from human experts. Such behavior analysis work has been shown to be effective at identifying behaviors during interactions in domains such as couple therapy [7,8,9], depression [10,11,12] and suicide risk assessment [13,14,15,16]. However, due to potential domain mismatch, obtaining accurate performance in one domain by utilizing well-trained behavior analysis systems from a different domain is not straightforward. ...
Preprint
Full-text available
Cancer impacts the quality of life of those diagnosed as well as their spouse caregivers, in addition to potentially influencing their day-to-day behaviors. There is evidence that effective communication between spouses can improve well-being related to cancer but it is difficult to efficiently evaluate the quality of daily life interactions using manual annotation frameworks. Automated recognition of behaviors based on the interaction cues of speakers can help analyze interactions in such couples and identify behaviors which are beneficial for effective communication. In this paper, we present and detail a dataset of dyadic interactions in 85 real-life cancer-afflicted couples and a set of observational behavior codes pertaining to interpersonal communication attributes. We describe and employ neural network-based systems for classifying these behaviors based on turn-level acoustic and lexical speech patterns. Furthermore, we investigate the effect of controlling for factors such as gender, patient/caregiver role and conversation content on behavior classification. Analysis of our preliminary results indicates the challenges in this task due to the nature of the targeted behaviors and suggests that techniques incorporating contextual processing might be better suited to tackle this problem.
Chapter
This indispensable collection provides extensive, yet accessible, coverage of conceptual and practical issues in research design in personality and social psychology. Using numerous examples and clear guidelines, especially for conducting complex statistical analysis, leading experts address specific methods and areas of research to capture a definitive overview of contemporary practice. Updated and expanded, this third edition engages with the most important methodological innovations over the past decade, offering a timely perspective on research practice in the field. To reflect such rapid advances, this volume includes commentary on particularly timely areas of development such as social neuroscience, mobile sensing methods, and innovative statistical applications. Seasoned and early-career researchers alike will find a range of tools, methods, and practices that will help improve their research and develop new conceptual and methodological possibilities. Supplementary online materials are available on Cambridge Core.
Conference Paper
Autism Spectrum Disorder (ASD) is a specific category of neurodevelopmental disorder that can be associated with several behavioral conditions and has no known cure to date. ASD can be detected from a very early stage in childhood and upon successful detection can be ameliorated. There have been several clinical diagnosis procedures and they can be error-prone and time-consuming. Thus, machine learning-based prediction models for early-stage ASD as well as in adolescents and adults are being developed over the years. In our study, several parameters of ASD detection were implemented with open-source ASD datasets and analysed using several machine learning models like Logistic regression, XGboost, SVC and Naive Bayes. Among these XGboost showed the best performance. The outcomes of such analytical approaches demonstrate that, when suitably optimized, machine-learning techniques can offer robust predictions of Autism Spectrum Disorder (ASD) status. These findings imply that it may be feasible to employ these models for the early ASD detection, thereby enhancing the prospects of timely and effective intervention. XGBoost has given best results throughout all datasets, including cross validation. An accuracy of 100% has been achieved, making the model best for prediction.
Article
Full-text available
Objetivou-se investigar as relações de gênero e poder no contexto das vulnerabilidades de mulheres às infecções sexualmente transmissíveis. Trata-se de um estudo de natureza qualitativa, desenvolvido com oito mulheres em fase reprodutiva e com histórico de contaminação por infecção sexualmente transmissível. Os dados produzidos por entrevistas estruturadas foram submetidos à análise temática proposta por Bardin. As mulheres são retraídas em posição de submissão, refletindo a forte influência de gênero nas relações afetivas conjugais, situação que dificulta o diálogo e a negociação de uma relação sexual segura, além de contribuir para que a mulher não se reconheça como um sujeito de direitos sexuais e reprodutivos. Portanto, elas devem ser sensibilizadas por meio de intervenções que estimulem o empoderamento para a negociação do sexo seguro, o protagonismo e o reconhecimento de si como um sujeito de direitos sexuais e reprodutivos.
Article
Social scientists increasingly use video data, but large-scale analysis of its content is often constrained by scarce manual coding resources. Upscaling may be possible with the application of automated coding procedures, which are being developed in the field of computer vision. Here, we introduce computer vision to social scientists, review the state-of-the-art in relevant subfields, and provide a working example of how computer vision can be applied in empirical sociological work. Our application involves defining a ground truth by human coders, developing an algorithm for automated coding, testing the performance of the algorithm against the ground truth, and running the algorithm on a large-scale dataset of CCTV images. The working example concerns monitoring social distancing behavior in public space over more than a year of the COVID-19 pandemic. Finally, we discuss prospects for the use of computer vision in empirical social science research and address technical and ethical challenges.
Article
This paper investigates the feasibility of the task of automatic behaviour coding of spoken interactions in teamwork settings. We introduce the coding schema used to classify the behaviours of the group members and the corpus we collected to assess the coding schema reliability in real teamwork meetings. The behaviours embedded in spoken utterances are modeled using a discriminative approach based on conditional random fields, and state-of-the-art neural networks based models. Moreover, we fine-tune publicly available language models to fit our target domain and task and demonstrate how this type of knowledge transfer improves classification models’ generalisation capacity. To utilise public resources, the AMI corpus was used for deploying the proposed framework. However, the models were evaluated on both AMI (matched task) and recordings of students solving an engineering challenge (mismatched task). Evaluation results reveal that neural networks are the best performing models in matched tasks, but that CRF models outperform them in mismatched tasks. Mitigating the effect of noisy data, by implementing a lightly supervised approach leads to relative improvements of 32% and 22%, in F1 measures of CRF and BERT, respectively. The proposed classifiers are used as a part of technological support to the training programme in collaborative skills for undergraduate students.
Conference Paper
Full-text available
Many processes in psychology are complex, such as dyadic interactions between two interacting partners (e.g., patient-therapist, intimate relationship partners). Nevertheless, many basic questions about interactions are difficult to investigate because dyadic processes can be within a person and between partners, they are based on multimodal aspects of behavior and unfold rapidly. Current analyses are mainly based on the behavioral coding method, whereby human coders annotate behavior based on a coding schema. But coding is labor-intensive, expensive, slow, focuses on few modalities, and produces sparse data which has forced the field to use average behaviors across entire interactions, thereby undermining the ability to study processes on a fine-grained scale. Current approaches in psychology use LIWC for analyzing couples’ interactions. However, advances in natural language processing such as BERT could enable the development of systems to potentially automate behavioral coding, which in turn could substantially improve psychological research. In this work, we train machine learning models to automatically predict positive and negative communication behavioral codes of 368 German-speaking Swiss couples during an 8-minute conflict interaction on a fine-grained scale (10-seconds sequences) using linguistic features and paralinguistic features derived with openSMILE. Our results show that both simpler TF-IDF features as well as more complex BERT features performed better than LIWC, and that adding paralinguistic features did not improve the performance. These results suggest it might be time to consider modern alternatives to LIWC, the de facto linguistic features in psychology, for prediction tasks in couples research. This work is a further step towards the automated coding of couples’ behavior which could enhance couple research and therapy, and be utilized for other dyadic interactions as well.
Preprint
Full-text available
Many processes in psychology are complex, such as dyadic interactions between two interacting partners (e.g. patient-therapist, intimate relationship partners). Nevertheless, many basic questions about interactions are difficult to investigate because dyadic processes can be within a person and between partners, they are based on multimodal aspects of behavior and unfold rapidly. Current analyses are mainly based on the behavioral coding method, whereby human coders annotate behavior based on a coding schema. But coding is labor-intensive, expensive, slow, focuses on few modalities. Current approaches in psychology use LIWC for analyzing couples' interactions. However, advances in natural language processing such as BERT could enable the development of systems to potentially automate behavioral coding, which in turn could substantially improve psychological research. In this work, we train machine learning models to automatically predict positive and negative communication behavioral codes of 368 German-speaking Swiss couples during an 8-minute conflict interaction on a fine-grained scale (10-seconds sequences) using linguistic features and paralinguistic features derived with openSMILE. Our results show that both simpler TF-IDF features as well as more complex BERT features performed better than LIWC, and that adding paralinguistic features did not improve the performance. These results suggest it might be time to consider modern alternatives to LIWC, the de facto linguistic features in psychology, for prediction tasks in couples research. This work is a further step towards the automated coding of couples' behavior which could enhance couple research and therapy, and be utilized for other dyadic interactions as well.
Article
Full-text available
Appropriate embedding transformation of sentences can aid in downstream tasks such as NLP and emotion and behavior analysis. Such efforts evolved from word vectors which were trained in an unsupervised manner using large-scale corpora. Recent research, however, has shown that sentence embeddings trained using in-domain data or supervised techniques, often through multitask learning, perform better than unsupervised ones. Representations have also been shown to be applicable in multiple tasks, especially when training incorporates multiple information sources. In this work we aspire to combine the simplicity of using abundant unsupervised data with transfer learning by introducing an online multitask objective. We present a multitask paradigm for unsupervised learning of sentence embeddings which simultaneously addresses domain adaption. We show that embeddings generated through this process increase performance in subsequent domain-relevant tasks. We evaluate on the affective tasks of emotion recognition and behavior analysis and compare our results with state-of-the-art general-purpose supervised sentence embeddings. Our unsupervised sentence embeddings outperform the alternative universal embeddings in both identifying behaviors within couples therapy and in emotion recognition.
Article
The task of quantifying human behavior by observing interaction cues is an important and useful one across a range of domains in psychological research and practice. Machine learning-based approaches typically perform this task by first estimating behavior based on cues within an observation window, such as a fixed number of words, and then aggregating the behavior over all the windows in that interaction. The length of this window directly impacts the accuracy of estimation by controlling the amount of information being used. The exact link between window length and accuracy, however, has not been well studied, especially in spoken language. In this paper, we investigate this link and present an analysis framework that determines appropriate window lengths for the task of behavior estimation. Our proposed framework utilizes a two-pronged evaluation approach: (a) extrinsic similarity between machine predictions and human expert annotations, and (b) intrinsic consistency between intra-machine and intra-human behavior relations. We apply our analysis to real-life conversations that are annotated for a large and diverse set of behavior codes and examine the relation between the nature of a behavior and how long it should be observed. We find that behaviors describing negative and positive affect can be accurately estimated from short to medium-length expressions whereas behaviors related to problem-solving and dysphoria require much longer observations and are difficult to quantify from language alone. These findings are found to be generally consistent across different behavior modeling approaches.
Conference Paper
Full-text available
Speech signals may provide important information for measuring and modelling human behaviour, especially for assessing mental health as well as estimating the emotional state of a person. Speaker's physiological and/or physical state may be thus identified by detecting the cognitive decline (CD) or stress levels using signal analysis of voice. This preliminary study presents a survey on methods introduced for CD and stress voice detection in humans. It is shown that increase in signal's fundamental frequency (f0) as well as its frequency formants, are the most common effects of CD and stress. Additional voice parameters could be used to identify normal from CD and normal from stressed voice as well as the cognitive state of the elderly. The present study poses the initiation of a project for the development of a mobile application for the automated voice detection and analysis on the fly, which will aid in the detection of early signs of CD and stress. Further investigation with application on a large number of voice samples is required for validating the methods.
Article
Full-text available
Human behavior refers to the way humans act and interact. Understanding human behavior is a cornerstone of observational practice, especially in psychotherapy. An important cue of behavior analysis is the dynamical changes of emotions during the conversation. Domain experts integrate emotional information in a highly nonlinear manner; thus, it is challenging to explicitly quantify the relationship between emotions and behaviors. In this work, we employ deep transfer learning to analyze their inferential capacity and contextual importance. We first train a network to quantify emotions from acoustic signals and then use information from the emotion recognition network as features for behavior recognition. We treat this emotion-related information as behavioral primitives and further train higher level layers towards behavior quantification. Through our analysis, we find that emotion-related information is an important cue for behavior recognition. Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks’ contextual view of the data. This demonstrates that the sequence of emotions is critical in behavior expression. To achieve these frameworks we employ hybrid architectures of convolutional networks and recurrent networks to extract emotion-related behavior primitives and facilitate automatic behavior recognition from speech.
Article
Full-text available
We propose a methodology for estimating human behaviors in psychotherapy sessions using multi-label and multi-task learning paradigms. We discuss the problem of behavioral coding in which data of human interactions are annotated with labels to describe relevant human behaviors of interest. We describe two related, yet distinct, corpora consisting of therapist-client interactions in psychotherapy sessions. We experimentally compare the proposed learning approaches for estimating behaviors of interest in these datasets. Specifically, we compare single and multiple label learning approaches, single and multiple task learning approaches, and evaluate the performance of these approaches when incorporating turn context. We demonstrate that the best multi-label, multi-task learning model with turn context achieves 18.9% and 19.5% absolute improvements with respect to a logistic regression classifier (for each behavioral coding task respectively) and 6.4% and 6.1% absolute improvements with respect to the best single-label, single-task deep neural network models. Lastly, we discuss the insights these modeling paradigms provide into these complex interactions including key commonalities and differences of behaviors within and between the two prevalent psychotherapy approaches-Motivational Interviewing and Cognitive Behavioral Therapy-considered.
Article
Full-text available
This article evaluates the efficacy, effectiveness, and clinical significance of empirically supported couple and family interventions for treating marital distress and individual adult disorders, including anxiety disorders, depression, sexual dysfunctions, alcoholism and problem drinking, and schizophrenia. In addition to consideration of different theoretical approaches to treating these disorders, different ways of including a partner or family in treatment are highlighted: (a) partner–family-assisted interventions, (b) disorder-specific partner–family interventions, and (c) more general couple–family therapy. Findings across diagnostic groups and issues involved in applying efficacy criteria to these populations are discussed.
Article
Full-text available
Two longitudinal studies of marital interaction were conducted using observational coding of couples attempting to resolve a high-conflict issue. We found that a different pattern of results predicts concurrent marital satisfaction than predicts change in marital satisfaction over 3 years. Results suggest that some marital interaction patterns, such as disagreement and anger exchanges, which have usually been considered harmful to a marriage, may not be harmful in the long run. These patterns were found to relate to unhappiness and negative interaction at home concurrently, but they were predictive of improvement in marital satisfaction longitudinally. However, three interaction patterns: were identified as dysfunctional in terms of longitudinal deterioration: defensiveness (which includes: whining), stubborness, and withdrawal from interaction. Hypotheses about gender differences in roles for the maintenance of marital satisfaction are presented.
Article
Full-text available
Although much has been learned from cross-sectional research on marriage, an understanding of how marriages develop, succeed, and fail is best achieved with longitudinal data. In view of growing interest in longitudinal research on marriage, the authors reviewed and evaluated the literature on how the quality and stability of marriages change over time. First, prevailing theoretical perspectives are examined for their ability to explain change in marital quality and stability. Second, the methods and findings of 115 longitudinal studies—representing over 45,000 marriages—are summarized and evaluated, yielding specific suggestions for improving this research. Finally, a model is outlined that integrates the strengths of previous theories of marriage, accounts for established findings, and indicates new directions for research on how marriages change. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Article
Full-text available
Published data on the frequency of the voice fundamental (F 0) in speech show its range of variation, often expressed in terms of two standard deviations (SD) of the F 0-distribution, to be approximately the same for men and women if expressed in semitones, but the observed SD varies substantially between different investigations. Most of the differences can be attributed to the following factors: SD is increased in tone languages and it varies with the type of discourse. The more 'lively' the type of discourse, the larger it is. The dependence of SD on the type of discourse tends to be mom pronounced in the speech of women than of men. Based on an analysis of various production data A is shown that speakers normally achieve an increased SD by increasing the excursions of F0 from a 'base-value' that lies about 1.5 SD below their mean F0. This is relevant to applications in speech technology as well as to general theories of speech communication such as the 'modulation theory' in which the base-value of F0 is seen as a carrier frequency.
Article
Full-text available
Individually focused Attribute × Treatment interaction (ATI) research has neglected attributes of couple and family relationships that may moderate response to different treatments. Sixty-three couples with a male alcoholic partner participated in up to 20 sessions of either cognitive–behavioral therapy (CBT) or family-systems therapy (FST). As hypothesized, couples high on pretreatment measures of demand–withdraw interaction (DWI) attended fewer sessions of CBT, whereas DWI made little difference in FST. A specific, alcohol-related wife-demand/husband-withdraw pattern moderated retention more than the opposite husband-demand/wife-withdraw pattern, although the general affective quality of a couple's relationship may have contributed to ATIs as well. Results illustrate the importance of relational moderators in ATI research and suggest possible benefits of matching alcoholics to treatments when the unit of treatment involves more than 1 person. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
Full-text available
The last decade has seen a substantial body of literature on the recognition of emotion from speech. However, in comparison to related speech processing tasks such as Automatic Speech and Speaker Recognition, practically no standardised corpora and test-conditions exist to compare performances under exactly the same conditions. Instead a multiplicity of evaluation strategies employed - such as cross-validation or percentage splits without proper instance definition - prevents exact reproducibility. Further, in order to face more realistic scenarios, the community is in desperate need of more spontaneous and less prototypical data. This INTERSPEECH 2009 Emotion Challenge aims at bridging such gaps between excellent research on human emotion recognition from speech and low compatibility of results. The FAU Aibo Emotion Corpus [1] serves as basis with clearly defined test and training partitions incorporating speaker independence and different room acoustics as needed in most real-life settings. This paper introduces the challenge, the corpus, the features, and benchmark results of two popular approaches towards emotion recognition from speech.
Conference Paper
Full-text available
We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
Chapter
Full-text available
In this chapter, we focus on the automatic recognition of emotional states using acoustic and linguistic parameters as features and classifiers as tools to predict the ‘correct’ emotional states. We first sketch history and state of the art in this field; then we describe the process of ‘corpus engineering’, i.e. the design and the recording of databases, the annotation of emotional states, and further processing such as manual or automatic segmentation. Next, we present an overview of acoustic and linguistic features that are extracted automatically or manually. In the section on classifiers, we deal with topics such as the curse of dimensionality and the sparse data problem, classifiers, and evaluation. At the end of each section, we point out important aspects that should be taken into account for the planning or the assessment of studies. The subject area of this chapter is not emotions in some narrow sense but in a wider sense encompassing emotion-related states such as moods, attitudes, or interpersonal stances as well. We do not aim at an in-depth treatise of some specific aspects or algorithms but at an overview of approaches and strategies that have been used or should be used.
Article
Full-text available
During expressive speech, the voice is enriched to convey not only the intended semantic message but also the emotional state of the speaker. The pitch contour is one of the important properties of speech that is affected by this emotional modulation. Although pitch features have been commonly used to recognize emotions, it is not clear what aspects of the pitch contour are the most emotionally salient. This paper presents an analysis of the statistics derived from the pitch contour. First, pitch features derived from emotional speech samples are compared with the ones derived from neutral speech, by using symmetric Kullback-Leibler distance. Then, the emotionally discriminative power of the pitch features is quantified by comparing nested logistic regression models. The results indicate that gross pitch contour statistics such as mean, maximum, minimum, and range are more emotionally prominent than features describing the pitch shape. Also, analyzing the pitch statistics at the utterance level is found to be more accurate and robust than analyzing the pitch statistics for shorter speech regions (e.g., voiced segments). Finally, the best features are selected to build a binary emotion detection system for distinguishing between emotional versus neutral speech. A new two-step approach is proposed. In the first step, reference models for the pitch features are trained with neutral speech, and the input features are contrasted with the neutral model. In the second step, a fitness measure is used to assess whether the input speech is similar to, in the case of neutral speech, or different from, in the case of emotional speech, the reference models. The proposed approach is tested with four acted emotional databases spanning different emotional categories, recording settings, speakers and languages. The results show that the recognition accuracy of the system is over 77% just with the pitch features (baseline 50%). When compared to conventional classification schemes, th- - e proposed approach performs better in terms of both accuracy and robustness.
Conference Paper
Full-text available
Analysis of audiovisual human behavior observations is a common practice in behavioral sciences. It is generally carried through by expert annotators who are asked to evaluate several aspects of the observations along various dimensions. This can be a tedious task. We propose that automatic classification of behavioral patterns in this context can be viewed as a multiple instance learning problem. In this paper, we analyze a corpus of married couples interacting about a problem in their relationship. We extract features from both the audio and the transcriptions and apply the Diverse Density-Support Vector Machine framework. Apart from attaining classification on the expert annotations, this framework also allows us to estimate salient regions of the complex interaction.
Conference Paper
Full-text available
The HUMAINE project is concerned with developing interfaces that will register and respond to emotion, particularly pervasive emotion (forms of feeling, expression and action that colour most of human life). The HUMAINE Database provides naturalistic clips which record that kind of material, in mul- tiple modalities, and labelling techniques that are suited to describing it.
Conference Paper
Full-text available
Entrainment has played a crucial role in analyzing marital couples interactions. In this work, we introduce a novel technique for quantifying vocal entrainment based on Principal Component Analysis (PCA). The entrainment measure, as we define in this work, is the amount of preserved variability of one interlocutor's speaking characteristic when projected onto representing space of the other's speaking characteristics. Our analysis on real couples interactions shows that when a spouse is rated as having positive emotion, he/she has a higher value of vocal entrainment compared when rated as having negative emotion. We further performed various statistical analyses on the strength and the directionality of vocal entrainment under different affective interaction conditions to bring quantitative insights into the entrainment phenomenon. These analyses along with a baseline prediction model demonstrate the validity and utility of the proposed PCA-based vocal entrainment measure.
Conference Paper
Full-text available
In this paper, we report on classification results for emotional user states (4 classes, German database of children interacting with a pet robot). Six sites computed acoustic and linguistic features independently from each other, following in part dif- ferent strategies. A total of 4244 features were pooled together and grouped into 12 low level descriptor types and 6 functional types. For each of these groups, classification results using Sup- port Vector Machines and Random Forests are reported for the full set of features, and for 150 features each with the highest individual Information Gain Ratio. The performance for the different groups varies mostly between ≈ 50% and ≈ 60%. Index Terms: emotional user states, automatic classification, feature types, functionals
Book
This book is an updated text. It has new material on coding and methodological issues for a variety of areas in nonverbal behavior: facial actions, vocal behavior, and body movement. Issues relevant to judgment studies, methodology, reliability, analyses, etc. have also been updated. The topics are broad and include specific information about methodology and coding strategies in education, psychotherapy, deception, nonverbal sensitivity, and marital and group behavior. There is also a chapter detailing specific information on the technical aspects of recording the voice and face, and specifically in relation to deception studies. This book provides an overview and hands on information concerning the many methods and techniques that are available to code or rate affective behavior and emotional expression in different modalities. This books hopes to help further refining research methods and coding strategies that permit comparison of results from various laboratories where research on nonverbal behavior is being conducted. This will advance research in the field and help to coordinate results so that a more comprehensive understanding of affect expression can be developed.
Article
The present investigation studied couples' resolution of existing marital issues. Videotapes of distressed and nondistressed couples were coded by two groups of coders. One group categorized the content of messages, and the other group categorized the nonverbal delivery of messages by the speaker ("affect") and the nonverbal behaviors of the listener ("context"). An analysis of marital interaction was obtained from a study of content, affect, and context differences as well as from sequential analyses of the data. Findings show that this coding system made it possible to account for most of the variance in the classification of couples as distressed or nondistressed. Specific findings provided tests of many currently untested hypotheses about good communication in marriages that have been the basis of clinical interventions. The hypotheses which were studied in the present investigation involve the function of metacommunication, the expression of feelings, summarizing self versus other, feeling probes, nonverbal behavior during message delivery, context differences, and positive and negative reciprocity. Functions of messages were assessed by sequential analysis procedures.
Conference Paper
Automatically extracting social meaning and intention from spoken dialogue is an important task for dialogue systems and social computing. We describe a system for detecting elements of interactional style: whether a speaker is awkward, friendly, or flirtatious. We create and use a new spoken corpus of 991 4-minute speed-dates. Participants rated their interlocutors for these elements of style. Using rich dialogue, lexical, and prosodic features, we are able to detect flirtatious, awkward, and friendly styles in noisy natural conversational data with up to 75% accuracy, compared to a 50% baseline. We describe simple ways to extract relatively rich dialogue features, and analyze which features performed similarly for men and women and which were gender-specific.
Conference Paper
Automatically detecting human social intentions from spoken conversation is an important task for dialogue understanding. Since the social intentions of the speaker may differ from what is perceived by the hearer, systems that analyze human conversations need to be able to extract both the perceived and the intended social meaning. We investigate this difference between intention and perception by using a spoken corpus of speed-dates in which both the speaker and the listener rated the speaker on flirtatiousness. Our flirtation-detection system uses prosodic, dialogue, and lexical features to detect a speaker's intent to flirt with up to 71.5% accuracy, significantly outperforming the baseline, but also outperforming the human inter-locuters. Our system addresses lexical feature sparsity given the small amount of training data by using an autoencoder network to map sparse lexical feature vectors into 30 compressed features. Our analysis shows that humans are very poor perceivers of intended flirtatiousness, instead often projecting their own intended behavior onto their interlocutors.
Book
With the advent of new technology, the most important impetus for the proliferation of studies on vocal affect expression has been the recent interest in the large-scale application of speech technology in automatic speech and speaker recognition and speech synthesis. This chapter advocates the Brunswikian lens model as a meta-structure for studies in this area, especially because it alerts researchers to important design considerations in studies of vocal affect expression. Vocal expression involves the joint operation of push and pull effects, and the interaction of psychobiological and sociocultural factors, both of which urgently need to be addressed in future studies. So far there is very little cross-language and cross-cultural research in this area, which is surprising because phonetic features of language may constrain the affect signaling potential of voice cues.
Article
Links between pronoun use, relationship satisfaction, and observed behavior were examined during 2 problem-solving interactions in which 134 distressed and 48 nondistressed couples participated. Results supported hypotheses that distressed and nondistressed couples would use pronouns at significantly different rates, and that rates would also differ for partners depending on whose topic was being discussed. Actor–partner interdependence models (APIMs; D. A. Kenny, 1996) revealed actor and partner effects of pronoun use on satisfaction and observed positivity and negativity. Interestingly, I-focus pronouns were found to be linked with satisfaction in distressed partners and dissatisfaction in nondistressed partners. The pattern of findings was otherwise largely consistent across topics and levels of distress. These findings have implications for both future research and clinical interventions.
Article
The marital interaction coding system (MICS) has been developed and used to objectively record verbal and nonverbal behaviors that occur as marriage partners attempt to negotiate, in a laboratory setting, resolutions of their marital problems. Primary emphasis is placed on the accurate coding of every behavior emitted that can be classified, with these responses being recorded sequentially in 30-second blocks. The basic unit is defined as a verbal or nonverbal response which is homogeneous in content, without regard for its duration or its arbitrary syntactical properties, such as division into words and sentences. Homogeneity of content is judged with reference to the 28 categories which have been created.