Conference Paper

DeepScreen: Boosting Depression Screening Performance with an Auxiliary Task

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
We explore the architecture of recurrent neural networks (RNNs) by studying the complexity of string sequences that it is able to memorize. Symbolic sequences of different complexity are generated to simulate RNN training and study parameter configurations with a view to the network’s capability of learning and inference. We compare Long Short-Term Memory (LSTM) networks and gated recurrent units (GRUs). We find that an increase in RNN depth does not necessarily result in better memorization capability when the training time is constrained. Our results also indicate that the learning rate and the number of units per layer are among the most important hyper-parameters to be tuned. Generally, GRUs outperform LSTM networks on low-complexity sequences while on high-complexity sequences LSTMs perform better.KeywordsRecurrent Neural NetworkLSTMGRUSequence Learning
Conference Paper
Full-text available
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.
Article
Full-text available
Depression is the most prevalent mental disorder in the world. One of the most adopted tools for depression screening is the Beck Depression Inventory-II (BDI-II) questionnaire. Patients may minimize or exaggerate their answers. Thus, to further examine the patient’s mood while filling in the questionnaire, we propose a mobile application that captures the BDI-II patient’s responses together with their images and speech. Deep learning techniques such as Convolutional Neural Networks analyze the patient’s audio and image data. The application displays the correlation between the patient’s emotional scores and DBI-II scores to the clinician at the end of the questionnaire, indicating the relationship between the patient’s emotional state and the depression screening score. We conducted a preliminary evaluation involving clinicians and patients to assess (i) the acceptability of proposed application for use in clinics and (ii) the patient user experience. The participants were eight clinicians who tried the tool with 21 of their patients. The results seem to confirm the acceptability of the app in clinical practice.
Conference Paper
Full-text available
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels. To better capture long-range global context, a recent trend is to add a self-attention mechanism on top of the CNN, forming a CNN-attention hybrid model. However, it is unclear whether the reliance on a CNN is necessary, and if neural networks purely based on attention are sufficient to obtain good performance in audio classification. In this paper, we answer the question by introducing the Audio Spectrogram Transformer (AST), the first convolution-free, purely attention-based model for audio classification. We evaluate AST on various audio classification benchmarks, where it achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, and 98.1% accuracy on Speech Commands V2.
Article
Full-text available
Radiographic imaging is routinely used to evaluate treatment response in solid tumors. Current imaging response metrics do not reliably predict the underlying biological response. Here, we present a multi-task deep learning approach that allows simultaneous tumor segmentation and response prediction. We design two Siamese subnetworks that are joined at multiple layers, which enables integration of multi-scale feature representations and in-depth comparison of pre-treatment and post-treatment images. The network is trained using 2568 magnetic resonance imaging scans of 321 rectal cancer patients for predicting pathologic complete response after neoadjuvant chemoradiotherapy. In multi-institution validation, the imaging-based model achieves AUC of 0.95 (95% confidence interval: 0.91–0.98) and 0.92 (0.87–0.96) in two independent cohorts of 160 and 141 patients, respectively. When combined with blood-based tumor markers, the integrated model further improves prediction accuracy with AUC 0.97 (0.93–0.99). Our approach to capturing dynamic information in longitudinal images may be broadly used for screening, treatment response evaluation, disease monitoring, and surveillance.
Article
Full-text available
Recognition of emotional facial expressions is considered to be atypical in autism. This difficulty is thought to be due to the way that facial expressions are visually explored. Evidence for atypical visual exploration of emotional faces in autism is, however, equivocal. We propose that, where observed, atypical visual exploration of emotional facial expressions is due to alexithymia, a distinct but frequently co-occurring condition. In this eye-tracking study we tested the alexithymia hypothesis using a number of recent methodological advances to study eye gaze during several emotion processing tasks (emotion recognition, intensity judgements, free gaze), in 25 adults with, and 45 without, autism. A multilevel polynomial modelling strategy was used to describe the spatiotemporal dynamics of eye gaze to emotional facial expressions. Converging evidence from traditional and novel analysis methods revealed that atypical gaze to the eyes is best predicted by alexithymia in both autistic and non-autistic individuals. Information theoretic metrics also revealed differential effects of task on gaze patterns as a function of alexithymia, but not autism. These findings highlight factors underlying atypical emotion processing in autistic individuals, with wide-ranging implications for emotion research.
Article
Full-text available
Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges, one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and Resnet models (Resnet-18 and Resnet-34), and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.
Article
Full-text available
The coronavirus disease 2019 (COVID-19) pandemic has been associated with mental health challenges related to the morbidity and mortality caused by the disease and to mitigation activities, including the impact of physical distancing and stay-at-home orders.* Symptoms of anxiety disorder and depressive disorder increased considerably in the United States during April-June of 2020, compared with the same period in 2019 (1,2). To assess mental health, substance use, and suicidal ideation during the pandemic, representative panel surveys were conducted among adults aged ≥18 years across the United States during June 24-30, 2020. Overall, 40.9% of respondents reported at least one adverse mental or behavioral health condition, including symptoms of anxiety disorder or depressive disorder (30.9%), symptoms of a trauma- and stressor-related disorder (TSRD) related to the pandemic† (26.3%), and having started or increased substance use to cope with stress or emotions related to COVID-19 (13.3%). The percentage of respondents who reported having seriously considered suicide in the 30 days before completing the survey (10.7%) was significantly higher among respondents aged 18-24 years (25.5%), minority racial/ethnic groups (Hispanic respondents [18.6%], non-Hispanic black [black] respondents [15.1%]), self-reported unpaid caregivers for adults§ (30.7%), and essential workers¶ (21.7%). Community-level intervention and prevention efforts, including health communication strategies, designed to reach these groups could help address various mental health conditions associated with the COVID-19 pandemic.
Article
Full-text available
Multi-task learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple-related tasks. It has shown great success in natural language processing (NLP). Currently, a number of MTL architectures and learning mechanisms have been proposed for various NLP tasks, including exploring linguistic hierarchies, orthogonality constraints, adversarial learning, gate mechanism, and label embedding. However, there is no systematic exploration and comparison of different MTL architectures and learning mechanisms for their strong performance in-depth. In this paper, we conduct a thorough examination of five typical MTL methods with deep learning architectures for a broad range of representative NLP tasks. Our primary goal is to understand the merits and demerits of existing MTL methods in NLP tasks, thus devising new hybrid architectures intended to combine their strengths. Following the empirical evaluation, we offer our insights and conclusions regarding the MTL methods we have considered.
Article
Full-text available
Facial emotion recognition (FER) has been an active research topic in the past several years. One of difficulties in FER is the effective capture of geometrical and temporary information from landmarks. In this paper, we propose a graph convolution neural network that utilizes landmark features for FER, which we called a directed graph neural network (DGNN). Nodes in the graph structure were defined by landmarks, and edges in the directed graph were built by the Delaunay method. By using graph neural networks, we could capture emotional information through faces’ inherent properties, like geometrical and temporary information. Also, in order to prevent the vanishing gradient problem, we further utilized a stable form of a temporal block in the graph framework. Our experimental results proved the effectiveness of the proposed method for datasets such as CK+ (96.02%), MMI (69.4%), and AFEW (32.64%). Also, a fusion network using image information as well as landmarks, is presented and investigated for the CK+ (98.47% performance) and AFEW (50.65% performance) datasets.
Article
Full-text available
Recently, new emphasis was put on reducing waiting times in mental health services as there is an ongoing concern that longer waiting time for treatment leads to poorer health outcomes. However, little is known about delays within the mental health service system and its impact on patients. We explore the impact of waiting times on patient outcomes in the context of Early Intervention in Psychosis (EIP) services in England from April 2012 to March 2015. We use the Mental Health Services Data Set and the routine outcome measure the Health of the Nation Outcome Scale. In a generalised linear regression model, we control for baseline outcomes, previous service use and treatment intensity to account for possible endogeneity in waiting time. We find that longer waiting time is significantly associated with a deterioration in patient outcomes 12 months after acceptance for treatment for patients that are still in EIP care. Effects are strongest for waiting times longer than 3 months and effect sizes are small to moderate. Patients with shorter treatment periods are not affected. The results suggest that policies should aim to reduce excessively long waits in order to improve outcomes for patients waiting for treatment for psychosis.
Article
Full-text available
Depression is a common mood disorder that causes severe medical problems and interferes negatively with daily life. Identifying human behavior patterns that are predictive or indicative of depressive disorder is important. Clinical diagnosis of depression relies on costly clinician assessment using survey instruments which may not objectively reflect the fluctuation of daily behavior. Self-administered surveys, such as the Quick Inventory of Depressive Symptomatology (QIDS) commonly used to monitor depression, may show disparities from clinical decision. Smartphones provide easy access to many behavioral parameters, and Fitbit wrist bands are becoming another important tool to assess variables such as heart rates and sleep efficiency that are complementary to smartphone sensors. However, data used to identify depression indicators have been limited to a single platform either iPhone, or Android, or Fitbit alone due to the variation in their methods of data collection. The present work represents a large-scale effort to collect and integrate data from mobile phones, wearable devices, and self reports in depression analysis by designing a new machine learning approach. This approach constructs sparse mappings from sensing variables collected by various tools to two separate targets: self-reported QIDS scores and clinical assessment of depression severity. We propose a so-called heterogeneous multi-task feature learning method that jointly builds inference models for related tasks but of different types including classification and regression tasks. The proposed method was evaluated using data collected from 103 college students and could predict the QIDS score with an R2 reaching 0.44 and depression severity with an F1-score as high as 0.77. By imposing appropriate regularizers, our approach identified strong depression indicators such as time staying at home and total time asleep.
Article
Full-text available
We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) framework, with gender prediction as an additional auxiliary task. We demonstrate the effectiveness of multi-task learning by comparison to a well-tuned single-task baseline with the same number of parameters. Our best MTL model predicts potential suicide attempt, as well as the presence of atypical mental health, with AUC > 0.8. We also find additional large improvements using multi-task learning on mental health tasks with limited training data.
Article
Full-text available
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
Conference Paper
Full-text available
The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.
Article
Full-text available
Many multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. It has been noted that the missing patterns and values are often correlated with the target labels, a.k.a., missingness is informative, and there is significant interest to explore methods which model them for time series prediction and other related tasks. In this paper, we develop novel deep learning models based on Gated Recurrent Units (GRU), a state-of-the-art recurrent neural network, to handle missing observations. Our model takes two representations of missing patterns, i.e., masking and time duration, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to improve the prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-art performance on these tasks and provide useful insights for time series with missing values.
Conference Paper
Full-text available
We present SimSensei Kiosk, an implemented virtual human interviewer designed to create an engaging face-to-face interaction where the user feels comfortable talking and sharing information. SimSensei Kiosk is also designed to create interactional situations favorable to the automatic assessment of distress indicators, defined as verbal and nonverbal behaviors correlated with depression, anxiety or post-traumatic stress disorder (PTSD). In this paper, we summarize the design methodology, performed over the past two years, which is based on three main development cycles: (1) analysis of face-to-face human interactions to identify potential distress indicators, dialogue policies and virtual human gestures, (2) development and analysis of a Wizard-of-Oz prototype system where two human operators were deciding the spoken and gestural responses, and (3) development of a fully automatic virtual interviewer able to engage users in 15-25 minute interactions. We show the potential of our fully automatic virtual human interviewer in a user study, and situate its performance in relation to the Wizard-of-Oz prototype. Copyright © 2014, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Article
Full-text available
Depression is prevalent in patients with physical disorders, particularly in those with severe disorders such as cancer, stroke, and acute coronary syndrome. Depression has an adverse impact on the courses of these diseases that includes poor quality of life, more functional impairments, and a higher mortality rate. Patients with physical disorders are at higher risk of depression. This is particularly true for patients with genetic and epigenetic predictors, environmental vulnerabilities such as past depression, higher disability, and stressful life events. Such patients should be monitored closely. To appropriately manage depression in these patients, comprehensive and integrative care that includes antidepressant treatment (with considerations for adverse effects and drug interactions), treatment of the physical disorder, and collaborative care that consists of disease education, cognitive reframing, and modification of coping style should be provided. The objective of the present review was to present and summarize the prevalence, risk factors, clinical correlates, current pathophysiological aspects including genetics, and treatments for depression comorbid with physical disorders. In particular, we tried to focus on severe physical disorders with high mortality rates, such as cancer, stroke, and acute coronary syndrome, which are highly comorbid with depression. This review will enhance our current understanding of the association between depression and serious medical conditions, which will allow clinicians to develop more advanced and personalized treatment options for these patients in routine clinical practice.
Article
Full-text available
This paper is the first review into the automatic analysis of speech for use as an objective predictor of depression and suicidality. Both conditions are major public health concerns; depression has long been recognised as a prominent cause of disability and burden worldwide, whilst suicide is a misunderstood and complex course of death that strongly impacts the quality of life and mental health of the families and communities left behind. Despite this prevalence the diagnosis of depression and assessment of suicide risk, due to their complex clinical characterisations, are difficult tasks, nominally achieved by the categorical assessment of a set of specific symptoms. However many of the key symptoms of either condition, such as altered mood and motivation, are not physical in nature; therefore assigning a categorical score to them introduces a range of subjective biases to the diagnostic procedure. Due to these difficulties, research into finding a set of biological, physiological and behavioural markers to aid clinical assessment is gaining in popularity. This review starts by building the case for speech to be considered a key objective marker for both conditions; reviewing current diagnostic and assessment methods for depression and suicidality including key non-speech biological, physiological and behavioural markers and highlighting the expected cognitive and physiological changes associated with both conditions which affect speech production. We then review the key characteristics; size associated clinical scores and collection paradigm, of active depressed and suicidal speech databases. The main focus of this paper is on how common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems. The paper concludes with an in-depth discussion on the key challenges – improving the generalisability through greater research collaboration and increased standardisation of data collection, and the mitigating unwanted sources of variability – that will shape the future research directions of this rapidly growing field of speech processing research.
Conference Paper
Full-text available
Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also their families, friends and the economy overall. Our ultimate aim is to develop an automatic objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. Here, we analyse the performance of head pose and movement features extracted from face videos using a 3D face model projected on a 2D Active Appearance Model (AAM). In a binary classification task (depressed vs. non-depressed), we modelled low-level and statistical functional features for an SVM classifier using real-world clinically validated data. Although the head pose and movement would be used as a complementary cue in detecting depression in practice, their recognition rate was impressive on its own, giving 71.2% on average, which illustrates that head pose and movement hold effective cues in diagnosing depression. When expressing positive and negative emotions, recognising depression using positive emotions was more accurate than using negative emotions. We conclude that positive emotions are expressed less in depressed subjects at all times, and that negative emotions have less discriminatory power than positive emotions in detecting depression. Analysing the functional features statistically illustrates several behaviour patterns for depressed subjects: (1) slower head movements, (2) less change of head position, (3) longer duration of looking to the right, (4) longer duration of looking down, which may indicate fatigue and eye contact avoidance. We conclude that head movements are significantly different between depressed patients and healthy subjects, and could be used as a complementary cue.
Conference Paper
Full-text available
Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also on their families, friends and the economy overall. Despite its high prevalence, current diagnosis relies almost exclusively on patient self-report and clinical opinion, leading to a number of subjective biases. Our aim is to develop an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. In this paper, we analyse the performance of eye movement features extracted from face videos using Active Appearance Models for a binary classification task (depressed vs. non-depressed). We find that eye movement low-level features gave 70% accuracy using a hybrid classifier of Gaussian Mixture Models and Support Vector Machines, and 75% accuracy when using statistical measures with SVM classifiers over the entire interview. We also investigate differences while expressing positive and negative emotions, as well as the classification performance in gender-dependent versus gender-independent modes. Interestingly, even though the blinking rate was not significantly different between depressed and healthy controls, we find that the average distance between the eyelids ('eye opening') was significantly smaller and the average duration of blinks significantly longer in depressed subjects, which might be an indication of fatigue or eye contact avoidance.
Conference Paper
Full-text available
Alzheimer's disease (AD) is the most common form of dementia that causes progressive impairment of memory and other cognitive functions. Multivariate regression models have been studied in AD for revealing relationships between neuroimaging measures and cognitive scores to understand how structural changes in brain can influence cognitive status. Existing regression methods, however, do not explicitly model dependence relation among multiple scores derived from a single cognitive test. It has been found that such dependence can deteriorate the performance of these methods. To overcome this limitation, we propose an efficient sparse Bayesian multi-task learning algorithm, which adaptively learns and exploits the dependence to achieve improved prediction performance. The proposed algorithm is applied to a real world neuroimaging study in AD to predict cognitive performance using MRI scans. The effectiveness of the proposed algorithm is demonstrated by its superior prediction performance over multiple state-of-the-art competing methods and accurate identification of compact sets of cognition-relevant imaging biomarkers that are consistent with prior knowledge.
Article
Full-text available
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.
Article
Full-text available
In this review, we provide an update of recent studies on the age of onset (AOO) of the major mental disorders, with a special focus on the availability and use of services providing prevention and early intervention. The studies reviewed here confirm previous reports on the AOO of the major mental disorders. Although the behaviour disorders and specific anxiety disorders emerge during childhood, most of the high-prevalence disorders (mood, anxiety and substance use) emerge during adolescence and early adulthood, as do the psychotic disorders. Early AOO has been shown to be associated with a longer duration of untreated illness, and poorer clinical and functional outcomes. Although the onset of most mental disorders usually occurs during the first three decades of life, effective treatment is typically not initiated until a number of years later. There is increasing evidence that intervention during the early stages of disorder may help reduce the severity and/or the persistence of the initial or primary disorder, and prevent secondary disorders. However, additional research is needed on effective interventions in early-stage cases, as well as on the long-term effects of early intervention, and for an appropriate service design for those with emerging mental disorders. This will mean not only the strengthening and re-engineering of existing systems, but is also crucial the construction of new streams of care for young people in transition to adulthood.
Conference Paper
Full-text available
Alzheimer's Disease (AD), the most common type of dementia, is a severe neurodegenerative disorder. Identifying markers that can track the progress of the disease has recently received increasing attentions in AD research. A definitive diagnosis of AD requires autopsy confirmation, thus many clinical/cognitive measures including Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog) have been designed to evaluate the cognitive status of the patients and used as important criteria for clinical diagnosis of probable AD. In this paper, we propose a multi-task learning formulation for predicting the disease progression measured by the cognitive scores and selecting markers predictive of the progression. Specifically, we formulate the prediction problem as a multi-task regression problem by considering the prediction at each time point as a task. We capture the intrinsic relatedness among different tasks by a temporal group Lasso regularizer. The regularizer consists of two components including an L2,1-norm penalty on the regression weight vectors, which ensures that a small subset of features will be selected for the regression models at all time points, and a temporal smoothness term which ensures a small deviation between two regression models at successive time points. We have performed extensive evaluations using various types of data at the baseline from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database for predicting the future MMSE and ADAS-Cog scores. Our experimental studies demonstrate the effectiveness of the proposed algorithm for capturing the progression trend and the cross-sectional group differences of AD severity. Results also show that most markers selected by the proposed algorithm are consistent with findings from existing cross-sectional studies.
Conference Paper
Full-text available
Standard recurrent nets cannot deal with long minimal time lags between relevant signals. Several recent NIPS papers propose alternative methods. We first show: problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms. We then use LSTM, our own recent algorithm, to solve a hard problem that can neither be quickly solved by random search nor by any other recurrent net algorithm we are aware of.
Chapter
Depression is a common mental health disorder with large social and economic consequences. It can be costly and difficult to detect, traditionally requiring hours of assessment by a trained clinical. Recently, machine learning models have been trained to screen for depression with patient voice recordings collected during an interview with a virtual agent. To engage the patient in a conversation and increase the quantity of responses, the virtual interviewer asks a series of follow-up questions. However, asking fewer questions would reduce the time burden of screening for the participant. We, therefore, assess if these follow-up questions have a tangible impact on the performance of deep learning models for depression classification. Specifically, we study the effect of including the vocal and transcribed replies to one, two, three, four, five, or all follow-up questions in the depression screening models. We notably achieve this using unimodal and multimodal pre-trained transfer learning models. Our findings reveal that follow-up questions can help increase F1 scores for the majority of the interview questions. This research can be leveraged for the design of future mental illness screening applications by providing important information about both question selection and the best number of follow-up questions.
Conference Paper
Depression is among the most prevalent mental health disorders with increasing prevalence worldwide. While early detection is critical for the prognosis of depression treatment, detecting depression is challenging. Previous deep learning research has thus begun to detect depression with the transcripts of clinical interview questions. Since approaches using Bidirectional Encoder Representations from Transformers (BERT) have demonstrated particular promise, we hypothesize that ensembles of BERT variants will improve depression detection. Thus, in this research, we compare the depression classification abilities of three BERT variants and four ensembles of BERT variants on the transcripts of responses to 12 clinical interview questions. Specifically, we implement the ensembles with different ensemble strategies, number of model components, and architectural layer combinations. Our results demonstrate that ensembles increase mean F1 scores and robustness across clinical interview data. Clinical relevance- This research highlights the potential of ensembles to detect depression with text which is important to guide future development of healthcare application ecosystems.
Chapter
The prevalence of suicide has been on the rise since the 20th century, causing severe emotional damage to individuals, families, and communities alike. Despite the severity of this suicide epidemic, there is so far no reliable and systematic way to assess suicide intent of a given individual. Through efforts to automate and systematize diagnosis of mental illnesses over the past few years, verbal and acoustic behaviors have received increasing attention as biomarkers, but little has been done to study eyelids, gaze, and head pose in evaluating suicide risk. This study explores statistical analysis, feature selection, and machine learning classification as means of suicide risk evaluation and nonverbal behavioral interpretation. Applying these methods to the eye and head signals extracted from our unique dataset, this study finds that high-risk suicidal individuals experience psycho-motor retardation and symptoms of anxiety and depression, characterized by eye contact avoidance, slower blinks and a downward eye gaze. By comparing results from different methods of classification, we determined that these features are highly capable of automatically classifying different levels of suicide risk consistently and with high accuracy, above 98%. Our conclusion corroborates psychological studies, and shows great potential of a systematic approach in suicide risk evaluation that is adoptable by both healthcare providers and naïve observers.KeywordsAffective computingSuicide riskNonverbal behaviourExplainable AI
Article
Artificial intelligence (AI) has incorporated various automatic systems and frameworks to diagnose the severity of depression using hand-crafted features. However, process of feature selection needs domain knowledge and is still time-consuming and subjective. Deep learning technology has been successfully adopted for depression recognition. Most previous works pre-train the deep models on large databases followed by fine-tuning with depression databases (i.e., AVEC2013, AVEC2014). In the present paper we propose an integrated framework – Deep Local Global Attention Convolutional Neural Network (DLGA-CNN) for depression recognition, which adopts CNN with attention mechanism as well as weighted spatial pyramid pooling (WSPP) to learn a deep and global representation. Two branches are introduced: Local Attention based CNN (LA-CNN) focuses on the local patches, while Global Attention based CNN (GA-CNN) learns the global patterns from the entire facial region. To capture the complementary information between the two branches, Local–Global Attention-based CNN (LGA-CNN) is proposed. After feature aggregation, WSPP is used to learn the depression patterns. Comprehensive experiments on AVEC2013 and AVEC2014 depression databases have demonstrated that the proposed method is capable of mining the underlying depression patterns of facial videos and outperforms the most of the state-of-the-art video-based depression recognition approaches.
Conference Paper
Depression is a common, but serious mental disorder that affects people all over the world. Besides providing an easier way of diagnosing the disorder, a computer-aided automatic depression assessment system is demanded in order to reduce subjective bias in the diagnosis. We propose a multimodal fusion of speech and linguistic representation for depression detection. We train our model to infer the Patient Health Questionnaire (PHQ) score of subjects from AVEC 2019 DDS Challenge database, the E-DAIC corpus. For the speech modality, we use deep spectrum features extracted from a pretrained VGG-16 network and employ a Gated Convolutional Neural Network (GCNN) followed by a LSTM layer. For the textual embeddings, we extract BERT textual features and employ a Convolutional Neural Network (CNN) followed by a LSTM layer. We achieved a CCC score equivalent to 0.497 and 0.608 on the E-DAIC corpus development set using the unimodal speech and linguistic models respectively. We further combine the two modalities using a feature fusion approach in which we apply the last representation of each single modality model to a fully-connected layer in order to estimate the PHQ score. With this multimodal approach, it was possible to achieve the CCC score of 0.696 on the development set and 0.403 on the testing set of the E-DAIC corpus, which shows an absolute improvement of 0.283 points from the challenge baseline.
Conference Paper
The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.
Article
Missing value imputation is a fundamental problem in modeling spatiotemporal sequences, from motion tracking to the dynamics of physical systems. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NAOMI) for imputing long-range spatiotemporal sequences given arbitrary missing patterns. In particular, NAOMI exploits the multiresolution structure of spatiotemporal data to interpolate recursively from coarse to fine-grained resolutions. We further enhance our model with adversarial training using an imitation learning objective. When trained on billiards and basketball trajectories, NAOMI demonstrates significant improvement in imputation accuracy (reducing average prediction error by 60% compared to autoregressive counterparts) and generalization capability for long range trajectories in systems of both deterministic and stochastic dynamics.
Article
Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) [1], are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of BD patients are initially misdiagnosed as having UD [2]. As most previous research focused on long-term monitoring of mood disorders, short-term detection which could be used in early detection and intervention is thus desirable. This work proposes an approach to short-term detection of mood disorder based on the patterns in emotion of elicited speech responses. To the best of our knowledge, there is no database for short-term detection on the discrimination between BD and UD currently. This work collected two databases containing an emotional database (MHMC-EM) collected by the Multimedia Human Machine Communication (MHMC) lab and a mood disorder database (CHI-MEI) collected by the CHI-MEI Medical Center, Taiwan. As the collected CHI-MEI mood disorder database is quite small and emotion annotation is difficult, the MHMC-EM emotional database is selected as a reference database for data adaptation. For the CHI-MEI mood disorder data collection, six eliciting emotional videos are selected and used to elicit the participants' emotions. After watching each of the six eliciting emotional video clips, the participants answer the questions raised by the clinician. The speech responses are then used to construct the CHI-MEI mood disorder database. Hierarchical spectral clustering is used to adapt the collected MHMC-EM emotional database to fit the CHI-MEI mood disorder database for dealing with the data bias problem. The adapted MHMC-EM emotional data are then fed to a denoising autoencoder for bottleneck feature extraction. The bottleneck features are used to construct a long short term memory (LSTM)-based emotion detector for generation of emotion profiles from each speech response. The emotion profiles are then clustered into emotion codewords using the K-means algorithm. Finally, a class-specific latent affective structure model (LASM) is proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection. Leave-one-group-out cross validation scheme was employed for the evaluation of the proposed class-specific LASM-based approaches. Experimental results show that the proposed class-specific LASM-based method achieved an accuracy of 73.33% for mood disorder detection, outperforming the classifiers based on SVM and LSTM.
Article
Objective: Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different - and often irregular - times. Accurate estimation of those missing measurements is critical for many reasons, including diagnosis, prognosis and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network (M-RNN) that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion and several RNN methods); typical improvements in Root Mean Square Error are between 35% - 50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust.
Article
Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Conference Paper
This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, \emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.
Article
In this book, Andrew Harvey sets out to provide a unified and comprehensive theory of structural time series models. Unlike the traditional ARIMA models, structural time series models consist explicitly of unobserved components, such as trends and seasonals, which have a direct interpretation. As a result the model selection methodology associated with structural models is much closer to econometric methodology. The link with econometrics is made even closer by the natural way in which the models can be extended to include explanatory variables and to cope with multivariate time series. From the technical point of view, state space models and the Kalman filter play a key role in the statistical treatment of structural time series models. The book includes a detailed treatment of the Kalman filter. This technique was originally developed in control engineering, but is becoming increasingly important in fields such as economics and operations research. This book is concerned primarily with modelling economic and social time series, and with addressing the special problems which the treatment of such series poses. The properties of the models and the methodological techniques used to select them are illustrated with various applications. These range from the modellling of trends and cycles in US macroeconomic time series to to an evaluation of the effects of seat belt legislation in the UK.
Article
OBJECTIVE: While considerable attention has focused on improving the detection of depression, assessment of severity is also important in guiding treatment decisions. Therefore, we examined the validity of a brief, new measure of depression severity. MEASUREMENTS: The Patient Health Questionnaire (PHQ) is a self-administered version of the PRIME-MD diagnostic instrument for common mental disorders. The PHQ-9 is the depression module, which scores each of the 9 DSM-IV criteria as “0” (not at all) to “3” (nearly every day). The PHQ-9 was completed by 6,000 patients in 8 primary care clinics and 7 obstetrics-gynecology clinics. Construct validity was assessed using the 20-item Short-Form General Health Survey, self-reported sick days and clinic visits, and symptom-related difficulty. Criterion validity was assessed against an independent structured mental health professional (MHP) interview in a sample of 580 patients. RESULTS: As PHQ-9 depression severity increased, there was a substantial decrease in functional status on all 6 SF-20 subscales. Also, symptom-related difficulty, sick days, and health care utilization increased. Using the MHP reinterview as the criterion standard, a PHQ-9 score ≥10 had a sensitivity of 88% and a specificity of 88% for major depression. PHQ-9 scores of 5, 10, 15, and 20 represented mild, moderate, moderately severe, and severe depression, respectively. Results were similar in the primary care and obstetrics-gynecology samples. CONCLUSION: In addition to making criteria-based diagnoses of depressive disorders, the PHQ-9 is also a reliable and valid measure of depression severity. These characteristics plus its brevity make the PHQ-9 a useful clinical and research tool.
Article
Brain-computer interfaces (BCIs) are limited in their applicability in everyday settings by the current necessity to record subjectspecific calibration data prior to actual use of the BCI for communication. In this paper, we utilize the framework of multitask learning to construct a BCI that can be used without any subject-specific calibration process. We discuss how this out-of-the-box BCI can be further improved in a computationally efficient manner as subject-specific data becomes available. The feasibility of the approach is demonstrated on two sets of experimental EEG data recorded during a standard two-class motor imagery paradigm from a total of 19 healthy subjects. Specifically, we show that satisfactory classification results can be achieved with zero training data, and combining prior recordings with subjectspecific calibration data substantially outperforms using subject-specific data only. Our results further show that transfer between recordings under slightly different experimental setups is feasible.