To read the full-text of this research, you can request a copy directly from the authors.
... Toto et al. [15] proved that machine learning has the potential to aid psychotherapy by improving the effectiveness of mental health screening. They presented Sliding Window Sub-clip Pooling and an audio classification method for shorter datasets to tackle the depression screening from voice. ...
... The reason behind choosing this corpus in our work is the extensive usage of it as the benchmarking dataset by the research community in depression diagnosis. Several recent researchers have used this corpus for their works [15,[22][23][24][25]. DAIC-WOZ is a subset of the DAIC multimodal depression corpus. ...
... 14: Store the sorted fisher score in ranked_features. 15: Select the required 'n' number of features from the ranked_features 16: Extract the subset of training data concerning the selected 'n' number of features and train the model. 17: Return 'n' number of features ...
Depression affects over 322 million people, and it is the most common source of disability worldwide. Literature in speech processing revealed that speech could be used for detecting depression. Depressed individuals exhibit varied acoustic characteristics compared to non-depressed. A four-staged machine learning classification system is developed to investigate the acoustic parameters to detect depression. Stage one uses speech recordings from a publicly available and clinically validated dataset DAIC-WOZ. The baseline acoustic feature vector, eGeMAPS, is extracted from the dataset in stage two. Adaptive synthetic (ADASYN) is performed along with data preprocessing to overcome the class imbalance. In stage three, we conducted feature selection (FS) using three techniques; Boruta FS, recursive feature elimination using support vector machine (SVM-RFE), and the fisher score-based FS. Experimentation with various machine learning base classifiers like gaussian naïve bayes (GNB), support vector machine (SVM), k-nearest neighbors (KNN), logistic regression (LR), and random forest classifier (RF) is performed in stage four. The hyperparameters of the classifiers are tuned using the GridSearchCV technique throughout the 10-fold stratified cross-validation (CV). Then we employed multiple dynamic ensemble selection of classifier algorithms (DES) with k=3 and k=5 utilizing the pool of aforementioned four base classifiers to improve the accuracy. We present a comparative study using eGeMAPS features against the base classifiers and the experimented DES classifiers. Our results on the DAIC-WOZ benchmark dataset suggested that K-Nearest Oracles Union (KNORA-U) DES with k=3 has superior accuracy using a subset of 15 features selected by fisher score-based FS than the individual base classifiers.
... The Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ) dataset DeVault et al., 2014) comprises voice and text samples from 189 interviewed healthy and control persons, as wells as their PHQ-8 depression detection questionnaire. This dataset is commonly used in many of the depression detection research works, including (Gong & Poellabauer, 2017;Sun et al., 2017) for text-based detection, (Dubagunta et al., 2019;Toto et al., 2020;Tlachac et al., 2020) for voice-based detection, and in multi-modal architectures such as (Alhanai et al., 2018;Yang et al., 2021). We also used both ...
With the availability of voice-enabled devices such as smart phones, mental health disorders could be detected and treated earlier, particularly post-pandemic. The current methods involve extracting features directly from audio signals. In this paper, two methods are used to enrich voice analysis for depression detection: graph transformation of voice signals, and natural language processing of the transcript based on representational learning, fused together to produce final class labels. The results of experiments with the DAIC-WOZ dataset suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection of mental disorders like depression.
Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD) are highly prevalent and burdensome. To increase mental health screening rates, the digital health research community has been exploring the ability to augment self reporting instruments with digital logs. Crowdsourced workers are being increasingly recruited for behavioral health research studies as demographically representative samples are desired for later translational applications. Overshadowed by predictive modeling, descriptive modeling has the ability to expand knowledge and understanding of the clinical generalizability of models trained on data from crowdsourced participants. In this study, we identify mobile communication profiles of a crowdsourced sample. To achieve this, we cluster features derived from time series of call and text logs. The psychiatric, behavioral, and demographic characteristics were notably different across the four identified mobile communication profiles. For example, the profile that had the lowest average depression and anxiety screening scores only shared incoming text logs. This cluster had statistically significantly different depression and anxiety screening scores in comparison to the cluster that shared the most outgoing text logs. These profiles expose important insights regarding the generalizability of crowdsourced samples to more general clinical populations and increase understanding regarding the limitations of crowdsourced samples for translational mental health research.
Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD) are both heterogeneous in their clinical presentations, manifesting with unique symptom profiles. Despite this, prior digital phenotype research has primarily focused on disorder-level detection rather than symptom-level detection. In this research, we predict the existence of individual symptoms of MDD and GAD with SMS log metadata, and ensemble these symptom-level classifiers to screen for depression and anxiety, thus accounting for disorder heterogeneity. Further, we collect an additional dataset of retrospectively harvested SMS logs to augment an existing dataset collected after COVID-19 altered communication patterns, and propose two new types of distribution features: consecutive messages and conversation ratio. Our symptom-level detectors achieved a balanced accuracy of 0.7 in 13 of the 16 MDD and GAD symptoms, with reply latency distribution features achieving a balanced accuracy of 0.78 when detecting anxiety symptom trouble relaxing. When combined into disorder-level ensembles, these symptom-level detectors achieved a balanced accuracy of 0.76 for depression screening and 0.73 for anxiety screening, with tree boosting methods demonstrating particular efficacy. Accounting for disorder heterogeneity, our research provides insight into the value of SMS logs for the assessment of depression and anxiety diagnostic criteria.
With the availability of voice-enabled devices such as smartphones, mental health disorders such as depression could be detected and treated earlier, particularly post-pandemic. The current methods involve extracting features directly from audio signals. In this paper, two methods are used to enrich voice analysis for depression detection: the transformation of voice signals into a visibility graph and the natural language processing of the transcript text based on representational learning. The results of processing text and voice with different features are fused to produce final class labels. Experimental evaluation with the DAIC-WOZ dataset suggests that integrating text-based voice classification and learning from low-level and graph-based voice signal features can improve the detection of mental disorders like depression. Our text-based method has achieved %72.7 F1-score, which is higher than other single-modal scores. The fusion of all prediction models based on voice and text has resulted in %82.4 F1-score that outperforms other models.
Mental disorders are rapidly increasing each year and have become a major challenge affecting the social and financial well-being of individuals. There is a need for phenotypic characterization of psychiatric disorders with biomarkers to provide a rich signature for Major Depressive Disorder, improving the understanding of the pathophysiological mechanisms underlying these mental disorders. This comprehensive review focuses on depression and relapse detection modalities such as self-questionnaires, audiovisuals, and EEG, highlighting noteworthy publications in the last ten years. The article concentrates on the literature that adopts machine learning by audiovisual and EEG signals. It also outlines preprocessing, feature extraction, and public datasets for depression detection. The review concludes with recommendations that will help improve the reliability of developed models and the determinism of computational intelligence-based systems in psychiatry. To the best of our knowledge, this survey is the first comprehensive review on depression and relapse prediction by self-questionnaires, audiovisual, and EEG-based approaches. The findings of this review will serve as a useful and structured starting point for researchers studying clinical and non-clinical depression recognition and relapse through machine learning-based approaches.
Depression is a common mental health disorder with large social and economic consequences. It can be costly and difficult to detect, traditionally requiring hours of assessment by a trained clinical. Recently, machine learning models have been trained to screen for depression with patient voice recordings collected during an interview with a virtual agent. To engage the patient in a conversation and increase the quantity of responses, the virtual interviewer asks a series of follow-up questions. However, asking fewer questions would reduce the time burden of screening for the participant. We, therefore, assess if these follow-up questions have a tangible impact on the performance of deep learning models for depression classification. Specifically, we study the effect of including the vocal and transcribed replies to one, two, three, four, five, or all follow-up questions in the depression screening models. We notably achieve this using unimodal and multimodal pre-trained transfer learning models. Our findings reveal that follow-up questions can help increase F1 scores for the majority of the interview questions. This research can be leveraged for the design of future mental illness screening applications by providing important information about both question selection and the best number of follow-up questions.
Depression is among the most prevalent mental health disorders with increasing prevalence worldwide. While early detection is critical for the prognosis of depression treatment, detecting depression is challenging. Previous deep learning research has thus begun to detect depression with the transcripts of clinical interview questions. Since approaches using Bidirectional Encoder Representations from Transformers (BERT) have demonstrated particular promise, we hypothesize that ensembles of BERT variants will improve depression detection. Thus, in this research, we compare the depression classification abilities of three BERT variants and four ensembles of BERT variants on the transcripts of responses to 12 clinical interview questions. Specifically, we implement the ensembles with different ensemble strategies, number of model components, and architectural layer combinations. Our results demonstrate that ensembles increase mean F1 scores and robustness across clinical interview data. Clinical relevance- This research highlights the potential of ensembles to detect depression with text which is important to guide future development of healthcare application ecosystems.
Mental illness screening instruments are increasingly being administered through online patient portals, making it vital to understand how the design of digital screening technologies could alter screening scores. Given the strong cross-cultural belief in the gender depression disparity, digital screening technologies are at particular risk of triggering stereotype threat, the phenomenon where a reminder of a stereotype impacts task performance. To assess this risk, we investigate if a reminder about the gender depression disparity influences the scores of digitally administered mental screening instruments. In a comprehensive study, we collect data from 440 participants with a mobile application that reminds half of the participants of the gender depression disparity prior to administering depression and anxiety screening instruments. Our statistical analysis evaluates differences in screening scores with t-tests, and determines credible values for difference of means, of standard deviations, and effect size using Bayesian estimation. While the gender depression disparity reminder had no statistically significant impact on men, it did alter the depression screening scores of women and nonbinary participants. Further, prior depression treatment increased the impact of stereotype threat on women. Our research demonstrates that digital screening technologies are subject to stereotype threat and should thus be designed to avoid biasing mental illness screening scores.
The rates of mental illness, especially anxiety and depression, have increased greatly since the start of the COVID-19 pandemic. Traditional mental illness screening instruments are too cumbersome and biased to screen an entire population. In contrast, smartphone call and text logs passively capture communication patterns and thus represent a promising screening alternative. To facilitate the advancement of such research, we collect and curate the DepreST Call and Text log (DepreST-CAT) dataset from over 365 crowdsourced participants during the COVID-19 pandemic. The logs are labeled with traditional anxiety and depression screening scores essential for training machine learning models. We construct time series ranging from 2 to 16 weeks in length from the retrospective smartphone logs. To demonstrate the screening capabilities of these time series, we then train a variety of unimodal and multimodal machine and deep learning models. These models provide insights into the relative screening value of the different types of logs, lengths of log time series, and classification methods. The DepreST-CAT dataset is a valuable resource for the research community to model communication patterns during the COVID-19 pandemic and further the development of machine learning algorithms for passive mental illness screening.
The growing prevalence of depression and suicidal ideation among college students further exacerbated by the Coronavirus pandemic is alarming, highlighting the need for universal mental illness screening technology. With traditional screening questionnaires too burdensome to achieve universal screening in this population, data collected through mobile applications has the potential to rapidly identify at-risk students. While prior research has mostly focused on collecting passive smartphone modalities from students, smartphone sensors are also capable of capturing active modalities. The general public has demonstrated more willingness to share active than passive modalities through an app, yet no such dataset of active mobile modalities for mental illness screening exists for students. Knowing which active modalities hold strong screening capabilities for student populations is critical for developing targeted mental illness screening technology. Thus, we deployed a mobile application to over 300 students during the COVID-19 pandemic to collect the Student Suicidal Ideation and Depression Detection (StudentSADD) dataset. We report on a rich variety of machine learning models including cutting-edge multimodal pretrained deep learning classifiers on active text and voice replies to screen for depression and suicidal ideation. This unique StudentSADD dataset is a valuable resource for the community for developing mobile mental illness screening tools.
Social networks have developed as a promising point for everybody to communicate with their interested friend and share their opinions, photos, and videos. Also, it has been an upcoming research field and has picked an established position globally. In this paper, we considered depression problems among various Facebook users. Already, a number of researchers have studied and applied many techniques to detect depression, but still need to detect accurately from social network data. So, we investigate the possibility to utilize Facebook data and apply KNN (k-nearest neighbors) classification technique for detecting depressive emotions. We do believe that our investigation and approach might be helpful to raise consciousness in online social network users.
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
We present in this paper a simple, yet efficient convolutional neural network (CNN) architecture for robust audio event recognition. Opposing to deep CNN architectures with multiple convolutional and pooling layers topped up with multiple fully connected layers, the proposed network consists of only three layers: convolutional, pooling, and softmax layer. It has two features to be distinguishable from the deep architectures that have been proposed for the task: varying-size convolutional filters at the convolutional layer and 1-max pooling scheme at the pooling layer. In intuition, the network tends to select the most discriminative features from the whole audio signals for recognition. Our proposed CNN not only shows state-of-the-art performance on the standard task of robust audio event recognition but also outperforms other deep architectures up to 4.5% in terms of recognition accuracy, which is equivalent to 76.3% relative error reduction.
Current methods for depression assessment depend almost entirely on clinical interview or self-report ratings. Such measures lack systematic and efficient ways of incorporating behavioral observations that are strong indicators of psychological disorder. We compared a clinical interview of depression severity with automatic measurement in 48 participants undergoing treatment for depression. Interviews were obtained at 7-week intervals on up to four occasions. Following standard cutoffs , participants at each session were classified as remitted, intermediate, or depressed. Logistic regression classifiers using leave-one-out validation were compared for facial movement dynamics, head movement dynamics, and vocal prosody individually and in combination. Accuracy (remitted versus depressed) for facial movement dynamics was higher than that for head movement dynamics; and each was substantially higher than that for vocal prosody. Accuracy for all three modalities together reached 88.93%, exceeding that for any single modality or pair of modalities. These findings suggest that automatic detection of depression from behavioral indicators is feasible and that multimodal measures afford most powerful detection.
This paper is the first review into the automatic analysis of speech for use as an objective predictor of depression and suicidality. Both conditions are major public health concerns; depression has long been recognised as a prominent cause of disability and burden worldwide, whilst suicide is a misunderstood and complex course of death that strongly impacts the quality of life and mental health of the families and communities left behind. Despite this prevalence the diagnosis of depression and assessment of suicide risk, due to their complex clinical characterisations, are difficult tasks, nominally achieved by the categorical assessment of a set of specific symptoms. However many of the key symptoms of either condition, such as altered mood and motivation, are not physical in nature; therefore assigning a categorical score to them introduces a range of subjective biases to the diagnostic procedure. Due to these difficulties, research into finding a set of biological, physiological and behavioural markers to aid clinical assessment is gaining in popularity. This review starts by building the case for speech to be considered a key objective marker for both conditions; reviewing current diagnostic and assessment methods for depression and suicidality including key non-speech biological, physiological and behavioural markers and highlighting the expected cognitive and physiological changes associated with both conditions which affect speech production. We then review the key characteristics; size associated clinical scores and collection paradigm, of active depressed and suicidal speech databases. The main focus of this paper is on how common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems. The paper concludes with an in-depth discussion on the key challenges – improving the generalisability through greater research collaboration and increased standardisation of data collection, and the mitigating unwanted sources of variability – that will shape the future research directions of this rapidly growing field of speech processing research.
Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence and arousal. In addition, psychologists and psychiatrists take the observation of expressive facial and vocal cues into account while evaluating a patient's condition. Depression could result in expressive behaviour such as dampened facial expressions, avoiding eye contact, and using short sentences with flat intonation. It is in this context that we present the third Audio-Visual Emotion recognition Challenge (AVEC 2013). The challenge has two goals logically organised as sub-challenges: the first is to predict the continuous values of the affective dimensions valence and arousal at each moment in time. The second sub-challenge is to predict the value of a single depression indicator for each recording in the dataset. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
Major depressive disorders are mental disorders of high prevalence, leading to a high impact on individuals, their families, society and the economy. In order to assist clinicians to better diagnose depression, we investigate an objective diagnostic aid using affective sensing technology with a focus on acoustic features. In this paper, we hypothesise that (1) classifying the general characteristics of clinical depression using spontaneous speech will give better results than using read speech, (2) that there are some acoustic features that are robust and would give good classification results in both spontaneous and read, and (3) that a 'thin-slicing' approach using smaller parts of the speech data will perform similarly if not better than using the whole speech data. By examining and comparing recognition results for acoustic features on a real-world clinical dataset of 30 depressed and 30 control subjects using SVM for classification and a leave-one-out cross-validation scheme, we found that spontaneous speech has more variability, which increases the recognition rate of depression. We also found that jitter, shimmer, energy and loudness feature groups are robust in characterising both read and spontaneous depressive speech. Remarkably, thin-slicing the read speech, using either the beginning of each sentence or the first few sentences performs better than using all reading task data.
Depression and other mood disorders are common and disabling disorders. We present work towards an objec-tive diagnostic aid supporting clinicians using affective sensing technology with a focus on acoustic and statis-tical features from spontaneous speech. This work in-vestigates differences in expressing positive and nega-tive emotions in depressed and healthy control subjects as well as whether initial gender classification increases the recognition rate. To this end, spontaneous speech from interviews of 30 subjects of each depressed and controls was analysed, with a focus on questions elicit-ing positive and negative emotions. Using HMMs with GMMs for classification with 30-fold cross-validation, we found that MFCC, energy and intensity features gave highest recognition rates when female and male sub-jects were analysed together. When the dataset was first split by gender, log energy and shimmer features, re-spectively, were found to give the highest recognition rates in females, while it was loudness for males. Over-all, correct recognition rates from acoustic features for depressed female subjects were higher than for male subjects. Using temporal features, we found that the re-sponse time and average syllable duration were longer in depressed subjects, while the interaction involvement and articulation rate wesre higher in control subjects.
Speaker emotion recognition is achieved through processing methods that include isolation of the speech signal and extraction of selected features for the final classification. In terms of acoustics, speech processing techniques offer extremely valuable paralinguistic information derived mainly from prosodic and spectral features. In some cases, the process is assisted by speech recognition systems, which contribute to the classification using linguistic information. Both frameworks deal with a very challenging problem, as emotional states do not have clear-cut boundaries and often differ from person to person. In this article, research papers that investigate emotion recognition from audio channels are surveyed and classified, based mostly on extracted and selected features and their classification methodology. Important topics from different classification techniques, such as databases available for experimentation, appropriate feature extraction and selection methods, classifiers and performance issues are discussed, with emphasis on research published in the last decade. This survey also provides a discussion on open trends, along with directions for future research on this topic.
Current methods of assessing psychopathology depend almost entirely on verbal report (clinical interview or questionnaire) of patients, their family, or caregivers. They lack systematic and efficient ways of incorporating behavioral observations that are strong indicators of psychological disorder, much of which may occur outside the awareness of either individual. We compared clinical diagnosis of major depression with automatically measured facial actions and vocal prosody in patients undergoing treatment for depression. Manual FACS coding, active appearance modeling (AAM) and pitch extraction were used to measure facial and vocal expression. Classifiers using leave-one-out validation were SVM for FACS and for AAM and logistic regression for voice. Both face and voice demonstrated moderate concurrent validity with depression. Accuracy in detecting depression was 88% for manual FACS and 79% for AAM. Accuracy for vocal prosody was 79%. These findings suggest the feasibility of automatic detection of depression, raise new issues in automated facial image analysis and machine learning, and have exciting implications for clinical theory and practice.
This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.
The brief Patient Health Questionnaire (PHQ-9) is commonly used to screen for depression with 10 often recommended as the cut-off score. We summarized the psychometric properties of the PHQ-9 across a range of studies and cut-off scores to select the optimal cut-off for detecting depression.
We searched Embase, MEDLINE and PsycINFO from 1999 to August 2010 for studies that reported the diagnostic accuracy of PHQ-9 to diagnose major depressive disorders. We calculated summary sensitivity, specificity, likelihood ratios and diagnostic odds ratios for detecting major depressive disorder at different cut-off scores and in different settings. We used random-effects bivariate meta-analysis at cutoff points between 7 and 15 to produce summary receiver operating characteristic curves.
We identified 18 validation studies (n = 7180) conducted in various clinical settings. Eleven studies provided details about the diagnostic properties of the questionnaire at more than one cut-off score (including 10), four studies reported a cut-off score of 10, and three studies reported cut-off scores other than 10. The pooled specificity results ranged from 0.73 (95% confidence interval [CI] 0.63-0.82) for a cut-off score of 7 to 0.96 (95% CI 0.94-0.97) for a cut-off score of 15. There was major variability in sensitivity for cut-off scores between 7 and 15. There were no substantial differences in the pooled sensitivity and specificity for a range of cut-off scores (8-11).
The PHQ-9 was found to have acceptable diagnostic properties for detecting major depressive disorder for cut-off scores between 8 and 11. Authors of future validation studies should consistently report the outcomes for different cut-off scores.
Diagnostic and treatment delay in depression are due to physician and patient factors. Patients vary in awareness of their depressive symptoms and ability to bring depression-related concerns to medical attention.
To inform interventions to improve recognition and management of depression in primary care by understanding patients' inner experiences prior to and during the process of seeking treatment.
Focus groups, analyzed qualitatively.
One hundred and sixteen adults (79% response) with personal or vicarious history of depression in Rochester NY, Austin TX and Sacramento CA. Neighborhood recruitment strategies achieved sociodemographic diversity.
Open-ended questions developed by a multidisciplinary team and refined in three pilot focus groups explored participants' "lived experiences" of depression, depression-related beliefs, influences of significant others, and facilitators and barriers to care-seeking. Then, 12 focus groups stratified by gender and income were conducted, audio-recorded, and analyzed qualitatively using coding/editing methods.
Participants described three stages leading to engaging in care for depression - "knowing" (recognizing that something was wrong), "naming" (finding words to describe their distress) and "explaining" (seeking meaningful attributions). "Knowing" is influenced by patient personality and social attitudes. "Naming" is affected by incongruity between the personal experience of depression and its narrow clinical conceptualizations, colloquial use of the word depression, and stigma. "Explaining" is influenced by the media, socialization processes and social relations. Physical/medical explanations can appear to facilitate care-seeking, but may also have detrimental consequences. Other explanations (characterological, situational) are common, and can serve to either enhance or reduce blame of oneself or others.
To improve recognition of depression, primary care physicians should be alert to patients' ill-defined distress and heterogeneous symptoms, help patients name their distress, and promote explanations that comport with patients' lived experience, reduce blame and stigma, and facilitate care-seeking.
Depression is a leading cause of disability and is associated with suicide risk. However, a quarter of patients with major depression remain undiagnosed. Prior work has demonstrated that a smartphone user's depression level can be detected by analyzing data gathered from their smartphone's sensors or from their social media posts over a few weeks after enrollment in a user study. These studies typically utilize a prospective study design, which is burdensome as it requires participants smartphone data to be gathered for prolonged periods before their depression level can be assessed. In contrast, we present a feasibility study of our Mood Assessment Capable Framework (Moodable) that facilitates almost instantaneous mood assessment by analyzing instantaneous voice samples provided by the user as well as historical sensor data harvested (scraped) from their smartphone and recent social media posts. Our retrospective, low-burden approach means that Moodable no longer requires study participants to engage with their phone for weeks before a depression score can be inferred. Moodable has the potential to minimize user data collection burden, increase user compliance, avoid study awareness bias and offer a near instantaneous depression screening. To lay a solid foundation for Moodable, we first surveyed 202 volunteer participants about their willingness to share voice samples and various smartphone and social media data types for mental health assessment. Based on these findings, we then developed the Moodable app. Thereafter, we utilized Moodable to collect short voice samples, and a rich array of retrospectively harvested data from users' smartphones (location, browser history, call logs) and social media accounts (instagram, twitter and facebook), with appropriate permissions, of 335 volunteer participants who also responded to 9 depression related questions of the Patient Health Questionaire (PHQ-9). Moodable then used machine learning to build classification models and classify the user's depression and suicidal ideation, for users which scores where unknown to the models. Results of Moodable's screening capability are promising. In particular, for the depression classification task we achieved F1 scores (the harmonic mean of the precision and recall) of 0.766, sensitivity of 0.750, and specificity of 0.792. For the suicidal ideation task we achieved F1 scores of 0.848, sensitivity of 0.864, and specificity of 0.725. This work could significantly increase depression-screening at the population level and opens numerous avenues for further research into this newly proposed paradigm of instantaneously screening depression and suicide risk levels from voice samples and retrospective smartphone and social media data.
Depression is the leading cause of disability, often undiagnosed, and one of the most treatable mood disorders. As such, unobtrusively diagnosing depression is important. Many studies are starting to utilize machine learning for depression sensing from social media and Smartphone data to replace the survey instruments currently employed to screen for depression. In this study, we compare the ability of a privately versus a publicly available modality to screen for depression. Specifically, we leverage between two weeks and a year of text messages and tweets to predict scores from the Patient Health Questionnaire-9, a prevalent depression screening instrument. This is the first study to leverage the retrospectively-harvested crowd-sourced texts and tweets within the combined Moodable and EMU datasets. Our approach involves comprehensive feature engineering, feature selection, and machine learning. Our 245 features encompass word category frequencies, part of speech tag frequencies, sentiment, and volume. The best model is Logistic Regression built on the top ten features from two weeks of text data. This model achieves an average F1 score of 0.806, AUC of 0.832, and recall of 0.925. We discuss the implications of the selected features, temporal quantity of data, and modality.
Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) [1], are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of BD patients are initially misdiagnosed as having UD [2]. As most previous research focused on long-term monitoring of mood disorders, short-term detection which could be used in early detection and intervention is thus desirable. This work proposes an approach to short-term detection of mood disorder based on the patterns in emotion of elicited speech responses. To the best of our knowledge, there is no database for short-term detection on the discrimination between BD and UD currently. This work collected two databases containing an emotional database (MHMC-EM) collected by the Multimedia Human Machine Communication (MHMC) lab and a mood disorder database (CHI-MEI) collected by the CHI-MEI Medical Center, Taiwan. As the collected CHI-MEI mood disorder database is quite small and emotion annotation is difficult, the MHMC-EM emotional database is selected as a reference database for data adaptation. For the CHI-MEI mood disorder data collection, six eliciting emotional videos are selected and used to elicit the participants' emotions. After watching each of the six eliciting emotional video clips, the participants answer the questions raised by the clinician. The speech responses are then used to construct the CHI-MEI mood disorder database. Hierarchical spectral clustering is used to adapt the collected MHMC-EM emotional database to fit the CHI-MEI mood disorder database for dealing with the data bias problem. The adapted MHMC-EM emotional data are then fed to a denoising autoencoder for bottleneck feature extraction. The bottleneck features are used to construct a long short term memory (LSTM)-based emotion detector for generation of emotion profiles from each speech response. The emotion profiles are then clustered into emotion codewords using the K-means algorithm. Finally, a class-specific latent affective structure model (LASM) is proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection. Leave-one-group-out cross validation scheme was employed for the evaluation of the proposed class-specific LASM-based approaches. Experimental results show that the proposed class-specific LASM-based method achieved an accuracy of 73.33% for mood disorder detection, outperforming the classifiers based on SVM and LSTM.
Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice because of its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to the critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined. Expected final online publication date for the Annual Review of Clinical Psychology Volume 14 is May 7, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
This book reports on an outstanding thesis that has significantly advanced the state-of-the-art in the automated analysis and classification of speech and music. It defines several standard acoustic parameter sets and describes their implementation in a novel, open-source, audio analysis framework called openSMILE, which has been accepted and intensively used worldwide. The book offers extensive descriptions of key methods for the automatic classification of speech and music signals in real-life conditions and reports on the evaluation of the framework developed and the acoustic parameter sets that were selected. It is not only intended as a manual for openSMILE users, but also and primarily as a guide and source of inspiration for students and scientists involved in the design of speech and music analysis methods that can robustly handle real-life conditions.
This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, \emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.
Machine learning addresses the question of how to build computers that improve automatically through experience. It is one
of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the
core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development
of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation.
The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading
to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial
modeling, policing, and marketing.
Objective:
While considerable attention has focused on improving the detection of depression, assessment of severity is also important in guiding treatment decisions. Therefore, we examined the validity of a brief, new measure of depression severity.
Measurements:
The Patient Health Questionnaire (PHQ) is a self-administered version of the PRIME-MD diagnostic instrument for common mental disorders. The PHQ-9 is the depression module, which scores each of the 9 DSM-IV criteria as "0" (not at all) to "3" (nearly every day). The PHQ-9 was completed by 6,000 patients in 8 primary care clinics and 7 obstetrics-gynecology clinics. Construct validity was assessed using the 20-item Short-Form General Health Survey, self-reported sick days and clinic visits, and symptom-related difficulty. Criterion validity was assessed against an independent structured mental health professional (MHP) interview in a sample of 580 patients.
Results:
As PHQ-9 depression severity increased, there was a substantial decrease in functional status on all 6 SF-20 subscales. Also, symptom-related difficulty, sick days, and health care utilization increased. Using the MHP reinterview as the criterion standard, a PHQ-9 score > or =10 had a sensitivity of 88% and a specificity of 88% for major depression. PHQ-9 scores of 5, 10, 15, and 20 represented mild, moderate, moderately severe, and severe depression, respectively. Results were similar in the primary care and obstetrics-gynecology samples.
Conclusion:
In addition to making criteria-based diagnoses of depressive disorders, the PHQ-9 is also a reliable and valid measure of depression severity. These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool.
We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical flow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.
OBJECTIVE: While considerable attention has focused on improving the detection of depression, assessment of severity is also important in guiding treatment decisions. Therefore, we examined the validity of a brief, new measure of depression severity.
MEASUREMENTS: The Patient Health Questionnaire (PHQ) is a self-administered version of the PRIME-MD diagnostic instrument for common mental disorders. The PHQ-9 is the depression module, which scores each of the 9 DSM-IV criteria as “0” (not at all) to “3” (nearly every day). The PHQ-9 was completed by 6,000 patients in 8 primary care clinics and 7 obstetrics-gynecology clinics. Construct validity was assessed using the 20-item Short-Form General Health Survey, self-reported sick days and clinic visits, and symptom-related difficulty. Criterion validity was assessed against an independent structured mental health professional (MHP) interview in a sample of 580 patients.
RESULTS: As PHQ-9 depression severity increased, there was a substantial decrease in functional status on all 6 SF-20 subscales. Also, symptom-related difficulty, sick days, and health care utilization increased. Using the MHP reinterview as the criterion standard, a PHQ-9 score ≥10 had a sensitivity of 88% and a specificity of 88% for major depression. PHQ-9 scores of 5, 10, 15, and 20 represented mild, moderate, moderately severe, and severe depression, respectively. Results were similar in the primary care and obstetrics-gynecology samples.
CONCLUSION: In addition to making criteria-based diagnoses of depressive disorders, the PHQ-9 is also a reliable and valid measure of depression severity. These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool.
Jitter and shimmer are measures of the cycle-to-cyc le variations of fundamental frequency and amplitude, respectivel y, which have been largely used for the description of patho logical voice quality. Since they characterise some aspects conce rning particular voices, it is a priori expected to find differences in the values of jitter and shimmer among speakers. In thi s paper, several types of jitter and shimmer measurements ha ve been analysed. Experiments performed with the Switchboard-I conversational speech database show that jitter and shimmer measurements give excellent results in speaker veri fication as complementary features of spectral and prosodic par ameters. Index Terms : speaker recognition, jitter, shimmer, prosody, voice spectrum, fusion
This article discusses the importance of screening students in schools for emotional/behavioral problems.
Elements relevant to planning and implementing effective mental health screening in schools are considered. Screening in schools is linked to a broader national agenda to improve the mental health of children and adolescents. Strategies for systematic planning for mental health screening in schools are presented.
Mental health screening in schools is a very important, yet sensitive, agenda that is in its very early stages. Careful planning and implementation of mental health screening in schools offers a number of benefits including enhancing outreach and help to youth in need, and mobilizing school and community efforts to promote student mental health while reducing barriers to their learning.
When implemented with appropriate family, school, and community involvement, mental health screening in schools has the potential to be a cornerstone of a transformed mental health system. Screening, as part of a coordinated and comprehensive school mental health program, complements the mission of schools, identifies youth in need, links them to effective services, and contributes to positive educational outcomes valued by families, schools, and communities.
The distress analysis interview corpus of human and computer interviews
J Gratch
R Artstein
G M Lucas
G Stratou
S Scherer
A Nazarian
Interweaving convolutions: An application to audio classification
Jan 2018
sinha
Improving emotion detection with sub-clip boosting
E Toto
B Foley
E A Rundensteiner
Instantaneous depression assessment using machine learning on voice samples and retrospectively harvested smartphone and social media data
A Dogrucu
A Peruic
D Ball
E Toto
E Rundensteiner
E Agu
E Boudreax
R Davis
Improving emotion detection with sub-clip boosting