Preprint

Multimodal Depression Severity Prediction from medical bio-markers using Machine Learning Tools and Technologies

Authors:
  • Edvora Inc.
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract and Figures

Depression has been a leading cause of mental-health illnesses across the world. While the loss of lives due to unmanaged depression is a subject of attention, so is the lack of diagnostic tests and subjectivity involved. Using behavioural cues to automate depression diagnosis and stage prediction in recent years has relatively increased. However, the absence of labelled behavioural datasets and a vast amount of possible variations prove to be a major challenge in accomplishing the task. This paper proposes a novel Custom CM Ensemble approach and focuses on a paradigm of a cross-platform smartphone application that takes multimodal inputs from a user through a series of pre-defined questions, sends it to the Cloud ML architecture and conveys back a depression quotient, representative of its severity. Our app estimates the severity of depression based on a multi-class classification model by utilizing the language, audio, and visual modalities. The given approach attempts to detect, emphasize, and classify the features of a depressed person based on the low-level descriptors for verbal and visual features, and context of the language features when prompted with a question. The model achieved a precision value of 0.88 and an accuracy of 91.56%. Further optimization reveals the intramodality and intermodality relevance through the selection of the most influential features within each modality for decision making.
Content may be subject to copyright.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The advent of social media has presented a promising new opportunity for the early detection of depression. To do so effectively, there are two challenges to overcome. The first is that textual and visual information must be jointly considered to make accurate inferences about depression. The second challenge is that due to the variety of content types posted by users, it is difficult to extract many of the relevant indicator texts and images. In this work, we propose the use of a novel cooperative multi-agent model to address these challenges. From the historical posts of users, the proposed method can automatically select related indicator texts and images. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods by a large margin (over 30% error reduction). In several experiments and examples, we also verify that the selected posts can successfully indicate user depression, and our model can obtained a robust performance in realistic scenarios.
Conference Paper
Full-text available
This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio, video and text streams. For each modality, handcrafted feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, then the learned features are fed to a DNN to predict the PHQ-8 scores. For multi-modal fusion, the estimated PHQ-8 scores from the three modalities are integrated in a DNN to obtain the final PHQ-8 score. Moreover, in this work, we propose new feature descriptors for text and video. For the text descriptors, we select the participant»s answers to the questions associated with psychoanalytic aspects of depression, such as sleep disorder, and make use of the Paragraph Vector (PV) to learn the distributed representations of these sentences. For the video descriptors, we propose a new global descriptor, the Histogram of Displacement Range (HDR), calculated directly from the facial landmarks to measure their displacements and speed. Experiments have been carried out on the AVEC2017 depression sub-challenge dataset. The obtained results show that the proposed depression recognition framework obtains very promising accuracy, with the root mean square error (RMSE) as 4.653, mean absolute error (MAE) as 3.980 on the development set, and RMSE as 5.974, MAE as 5.163 on the test set.
Article
Full-text available
This paper is the first review into the automatic analysis of speech for use as an objective predictor of depression and suicidality. Both conditions are major public health concerns; depression has long been recognised as a prominent cause of disability and burden worldwide, whilst suicide is a misunderstood and complex course of death that strongly impacts the quality of life and mental health of the families and communities left behind. Despite this prevalence the diagnosis of depression and assessment of suicide risk, due to their complex clinical characterisations, are difficult tasks, nominally achieved by the categorical assessment of a set of specific symptoms. However many of the key symptoms of either condition, such as altered mood and motivation, are not physical in nature; therefore assigning a categorical score to them introduces a range of subjective biases to the diagnostic procedure. Due to these difficulties, research into finding a set of biological, physiological and behavioural markers to aid clinical assessment is gaining in popularity. This review starts by building the case for speech to be considered a key objective marker for both conditions; reviewing current diagnostic and assessment methods for depression and suicidality including key non-speech biological, physiological and behavioural markers and highlighting the expected cognitive and physiological changes associated with both conditions which affect speech production. We then review the key characteristics; size associated clinical scores and collection paradigm, of active depressed and suicidal speech databases. The main focus of this paper is on how common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems. The paper concludes with an in-depth discussion on the key challenges – improving the generalisability through greater research collaboration and increased standardisation of data collection, and the mitigating unwanted sources of variability – that will shape the future research directions of this rapidly growing field of speech processing research.
Article
Full-text available
Background: Many studies have explored associations between depression and facial emotion recognition (ER). However, these studies have used various paradigms and multiple stimulus sets, rendering comparisons difficult. Few studies have attempted to determine the magnitude of any effect and whether studies are properly powered to detect it. We conducted a meta-analysis to synthesize the findings across studies on ER in depressed individuals compared to controls. Method: Studies of ER that included depressed and control samples and published before June 2013 were identified in PubMed and Web of Science. Studies using schematic faces, neuroimaging studies and drug treatment studies were excluded. Results: Meta-analysis of k = 22 independent samples indicated impaired recognition of emotion [k = 22, g = -0.16, 95% confidence interval (CI) -0.25 to -0.07, p < 0.001]. Critically, this was observed for anger, disgust, fear, happiness and surprise (k's = 7-22, g's = -0.42 to -0.17, p's < 0.08), but not sadness (k = 21, g = -0.09, 95% CI -0.23 to +0.06, p = 0.23). Study-level characteristics did not appear to be associated with the observed effect. Power analysis indicated that a sample of approximately 615 cases and 615 controls would be required to detect this association with 80% power at an alpha level of 0.05. Conclusions: These findings suggest that the ER impairment reported in the depression literature exists across all basic emotions except sadness. The effect size, however, is small, and previous studies have been underpowered.
Conference Paper
Full-text available
Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning. 1.
Article
Depression is becoming a serious global health problem worldwide, with an increasing number of patients suffering from anxiety and other disorders. Our work aims to provide the appropriate social support (SS) to the prevention of depression for highly anxious undergraduates. We used 1425 undergraduates from 18 universities in China via a cluster random sampling method for the survey on the self-rating anxiety scale, the self-rating depression scale, and the SS scale for anxiety and depression. Based on the collected questionnaire data, we first reveal that the distribution of both anxiety data and depression data follows a Gaussian distribution. Then, a Gaussian mixture model is adopted for clustering these data in terms of anxiety index and depression index. According to the observations extracted from the clusters, the correlation among anxiety, depression, and SS is investigated by a correlation analysis method. Finally, the corresponding moderating effect of SS between anxiety and depression is figured out via the hierarchical multiple regression analysis. The detailed analysis indicates that the high-level SS, such as the help and support from individual's friends or family members, could reduce the risk for depression from highly anxious undergraduates.
Article
Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and even feeling of self-hatred. This report describes work done by us for Audio Visual Emotion Challenge (AVEC) 2017 during our second year BTech summer internship. With the increase in demand to detect depression automatically with the help of machine learning algorithms, we present our multimodal feature extraction and decision level fusion approach for the same. Features are extracted by processing on the provided Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) database. Gaussian Mixture Model (GMM) clustering and Fisher vector approach were applied on the visual data; statistical descriptors on gaze, pose; low level audio features and head pose and text features were also extracted. Classification is done on fused as well as independent features using Support Vector Machine (SVM) and neural networks. The results obtained were able to cross the provided baseline on validation data set by 17% on audio features and 24.5% on video features.
Conference Paper
Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically-motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
Article
The spectral and energy properties of speech have consistently been observed to change with a speaker's level of clinical depression. This has resulted in spectral and energy based features being a key component in many speech-based classification and prediction systems. However there has been no in-depth investigation into understanding how acoustic models of spectral features are affected by depression. This paper investigates the hypothesis that the effects of depression in speech manifest as a reduction in the spread of phonetic events in acoustic space as modelled by Gaussian Mixture Models (GMM) in combination with Mel Frequency Cepstral Coefficients (MFCC). Our investigation uses three measures of acoustic variability: Average Weighted Variance (AWV), Acoustic Movement (AM) and Acoustic Volume, which attempt to model depression specific acoustic variations (AWV and Acoustic Volume), or the trajectory of a speech in the acoustic space (AM). Within our analysis we present the Probabilistic Acoustic Volume (PAV) a novel method for robustly estimating Acoustic Volume using a Monte Carlo sampling of the feature distribution being modelled. We show that using an array of PAV points we gain insights into how the concentration of the feature vectors in the feature space changes with depression. Key results - found on two commonly used depression corpora - consistently indicate that as a speaker's level of depression increases there are statistically significantly reductions in both AWV (-0.44 ≤ rs ≤ -0.18 with p <.05) and AM (-0.26 ≤ rs ≤ -0.19 with p <.05) values, indicating a decrease in localised acoustic variance and smoothing in acoustic trajectory respectively. Further there are also statistically significant reductions (-0.32 ≤ rs ≤ -0.20 with p <.05) in Acoustic Volume measures and strong statistical evidence (-0.48 ≤ rs ≤ -0.23 with p <.05) that the MFCC feature space becomes more concentrated. Quantifying these effects is expected to be a key step towards building an objective classification or prediction system which is robust to many of the unwanted - in terms of depression analysis - sources of variability modulated into a speech signal.
Article
Background: Schizophrenia is associated with impaired face processing. N170 and N250 are two event-related potentials that have been studied in relation to face processing in schizophrenia, but the results have been mixed. The aim of this article was to conduct a meta-analysis of N170 and N250 in schizophrenia to evaluate trends and resolve the inconsistencies. Methods: Twenty-one studies of N170 (n = 438 schizophrenia patients, n = 418 control subjects) and six studies of N250 (n = 149 schizophrenia patients, n = 151 control subjects) were evaluated. Hedges' g was calculated for each study, and the overall weighted mean effect size (ES) was calculated for N170 and N250. Homogeneity of the ES distributions, potential publication bias, and impact of potential moderators were also assessed. Results: The amplitude of both N170 and N250 to face stimuli was smaller in patients than control subjects (N170 ES = .64; N250 ES = .49; ps < .001). The distributions of the ES were homogeneous (ps > .90), and there was no indication of a publication bias. We found no significant effect of task requirements regarding judgments of the face stimuli. Moreover, we found no significant difference between the ES for N170 and N250. Conclusions: Though findings of individual studies have been mixed, the results of the meta-analysis strongly support disruption of N170 and N250 in schizophrenia. The comparable effect sizes across the two waveforms suggest that the well-established behavioral deficit in face emotion processing is mirrored in an underlying neural impairment for processing faces.
Topic modeling based multi-modal depression detection
  • Y Gong
  • C Poellabauer
Gong, Y. & Poellabauer, C. Topic modeling based multi-modal depression detection, DOI: http://doi.acm.org/ 10.1145/3133944.3133945 (2017).
Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences
  • M Trotzek
  • S Koitka
  • C M Friedrich
Trotzek, M., Koitka, S. & Friedrich, C. M. Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences, DOI: https://doi.org/10.1109/TCSS.2019.2894144 (2020).
Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter
  • V Sanh
  • L Debut
  • J Chaumond
  • T Wolf
Sanh, V., Debut, L., Chaumond, J. & Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, DOI: https://arxiv.org/abs/1910.01108 (2019).