Conference Paper

Research on Depression Recognition Using Machine Learning from Speech

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Consider the values n_estimators= [5,10,15,20], learning rate=[0.01, 0.1], and max depth= [5,15,2]. ...
Article
Full-text available
In terms of severity and prevalence, depression is the worst. Suicide rates have risen because of this and are on the rise universally. Consequently, effective diagnosis and therapy must reduce the impact of depression. There is often more than one factor at play when determining why someone has been diagnosed with depression. In addition to alcohol and substance abuse, other possible causes include problems with physical health, adverse reactions to medications, life-changing events, and social circumstances. In this paper, exploratory data analysis is conducted to understand the insights of the sensorimotor database depression comprising depressive experiences in individuals who are either unipolar or bipolar. This study proposes a robust tuned extreme gradient boosting model generator to automatically predict the state of depression. The performance is optimized by determining the best combination of hyperparameters for the extreme gradient boosting model. By harnessing the power of advanced machine learning methodologies, this study underscores comparative analysis and the importance of data-driven innovation in mental health and clinical practice. Future developments for the robust tuned extreme gradient boosting model’s application and study to forecast depression in the sensorimotor database depression can be used to track changes in depressed states over time by integrating it with longitudinal and multimodal data.
... From human speech, the author developed a depression dataset. The author obtained the data from Northwest Normal University (NNU) and used the self-reference effect (SRE) as a conventional paradigm (D. Shi et al., 2021). The author produced an achievable result of 65% using Support Vector Machine (SVM). ...
Chapter
Full-text available
The research work presented focuses on utilizing social media platforms as a source of data to diagnose depression-related issues. The popularity of social platforms such as LinkedIn, Instagram, Twitter, YouTube, and Facebook, gave researchers an opportunity to analyse user experiences and gain insights into depression. Depression is a significant problem that affects individuals' lives, disrupts normal functioning, and impacts their perspectives. The primary objective of this research is to employ machine learning (ML) approaches for classifying tweets. Additionally, the research addresses the issue of data imbalance by using sampling techniques. This research work utilizes a sampling technique to normalize the dataset. The study explores four techniques that helps to extract meaningful information from the tweets. The research work conducts an empirical study to evaluate the performance of various ML techniques. Based on the experimental results, it is found that the AdaBoost classifier with the BoW feature extraction technique achieves the best results among all the classifiers tested.
... Shi et al. [37] analyzed the audio track of an ad-hoc created dataset starting from 66 subjects. They extracted specific audio-features, such as the average of Zerocrossing Rate, Energy, Entropy of Energy, Spectral Centroid, and Spectral Spread. ...
Article
Full-text available
Depression is the most prevalent mental disorder in the world. One of the most adopted tools for depression screening is the Beck Depression Inventory-II (BDI-II) questionnaire. Patients may minimize or exaggerate their answers. Thus, to further examine the patient’s mood while filling in the questionnaire, we propose a mobile application that captures the BDI-II patient’s responses together with their images and speech. Deep learning techniques such as Convolutional Neural Networks analyze the patient’s audio and image data. The application displays the correlation between the patient’s emotional scores and DBI-II scores to the clinician at the end of the questionnaire, indicating the relationship between the patient’s emotional state and the depression screening score. We conducted a preliminary evaluation involving clinicians and patients to assess (i) the acceptability of proposed application for use in clinics and (ii) the patient user experience. The participants were eight clinicians who tried the tool with 21 of their patients. The results seem to confirm the acceptability of the app in clinical practice.
... In general, short-time and long-time methods are applied as two categories for feature extraction from the speech signal. Two widely accepted methods, namely Short Time Energy (STE) and Zeros Crossing Rate (ZCR) [2,3], are used as short-time feature extraction methods. In our research however, we used different long-time parameters, such as Fundamental Frequency (F 0 ), Shimmer (%), Jitter (%), Harmonicto-Noise Ratio (HNR), and MFCC [1,4,5], to evaluate vocal tract health. ...
Article
Full-text available
The recognition of pathological voice is considered a difficult task for speech analysis. Moreover, otolaryngologists needed to rely on oral communication with patients to discover traces of voice pathologies like dysphonia that are caused by voice alteration of vocal folds and their accuracy is between 60%-70%. To enhance detection accuracy and reduce processing speed of dysphonia detection , a novel approach is proposed in this paper. We have leveraged Linear Dis-criminant Analysis (LDA) to train multiple Machine Learning (ML) models for dysphonia detection. Several ML models are utilized like Support Vector Machine (SVM), Logistic Regression, and K-nearest neighbor (K-NN) to predict the voice pathologies based on features like Mel-Frequency Cepstral Coefficients (MFCC), Fundamental Frequency (F 0), Shimmer (%), Jitter (%), and Harmonic to Noise Ratio (HNR). The experiments were performed using Saarbrucken Voice Database (SVD) and a privately collected dataset. The K-fold cross-validation approach was incorporated to increase the robustness and stability of the ML models. According to the experimental results, our proposed approach has a 70% increase in processing speed over Principal Component Analysis (PCA) and performs remarkably well with a recognition accuracy of 95.24% on the SVD dataset surpassing the previous best accuracy of 82.37%. In the case of the private dataset, our proposed method achieved an accuracy rate of 93.37%. It can be an effective non-invasive method to detect dysphonia.
Article
Aim To synthesise existing evidence concerning the application of AI methods in detecting depression through behavioural cues among adults in healthcare and community settings. Design This is a diagnostic accuracy systematic review. Methods This review included studies examining different AI methods in detecting depression among adults. Two independent reviewers screened, appraised and extracted data. Data were analysed by meta‐analysis, narrative synthesis and subgroup analysis. Data Sources Published studies and grey literature were sought in 11 electronic databases. Hand search was conducted on reference lists and two journals. Results In total, 30 studies were included in this review. Twenty of which demonstrated that AI models had the potential to detect depression. Speech and facial expression showed better sensitivity, reflecting the ability to detect people with depression. Text and movement had better specificity, indicating the ability to rule out non‐depressed individuals. Heterogeneity was initially high. Less heterogeneity was observed within each modality subgroup. Conclusions This is the first systematic review examining AI models in detecting depression using all four behavioural cues: speech, texts, movement and facial expressions. Implications A collaborative effort among healthcare professionals can be initiated to develop an AI‐assisted depression detection system in general healthcare or community settings. Impact It is challenging for general healthcare professionals to detect depressive symptoms among people in non‐psychiatric settings. Our findings suggested the need for objective screening tools, such as an AI‐assisted system, for screening depression. Therefore, people could receive accurate diagnosis and proper treatments for depression. Reporting Method This review followed the PRISMA checklist. Patients or Public Contribution No patients or public contribution.
Article
Full-text available
The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
Article
After giving birth, postpartum depression is a serious mental health problem that affects moms. Symptoms may include trouble bonding with the child, difficulty sleeping, loss of appetite, and extreme irritability. Among 50 % of the mothers experience major changes in their mental health and approximately 1 in 10 women will seek help.The typical duration of postpartum depression is 3 to 6 months, however this might vary depending on a number of factors. The Edinburgh Postnatal Depression Scale (EPDS) data set, which is gathered after a week from the mothers who gave birth to their children, is used in this study to train the model using Random forest algorithm.This model has two phases, where the first phase predicts the status of the mother in any one of the four ranges (Depressed, Most Likely Depressed, Likely Depressed, No Depression) based on the trained data. Where the second phase predicts the level of depression(0-4 none, 5-9 mild, 10-14 Moderate, 15-19 moderately severe, 20- 27 severe.) based on the PHQ-9 Questionnaire, and it is suggested to take post-6 weeks of delivery. The results of phase one and two will be sent to the user via mail when they take the assessments in week one and week 6 respectively. The key benefit of this model is that there is no need for a third party because user and model evaluations take place directly between them. Since postpartum depression is not properly understood, many mothers do not receive treatment for their sickness, which can result in terrible circumstances and can be avoided by following this approach.
Article
Full-text available
Background The 17-item Hamilton Depression Rating Scale (HDRS17) is used world-wide as an observer-rated measure of depression in randomised controlled trials (RCTs) despite continued uncertainty regarding its factor structure. This study investigated the dimensionality of HDRS17 for patients undergoing treatment in UK mental health settings with moderate to severe persistent major depressive disorder (PMDD). Methods Exploratory Structural Equational Modelling (ESEM) was performed to examine the HDRS17 factor structure for adult PMDD patients with HDRS17 score ≥16. Participants (n = 187) were drawn from a multicentre RCT conducted in UK community mental health settings evaluating the outcomes of a depression service comprising CBT and psychopharmacology within a collaborative care model, against treatment as usual (TAU). The construct stability across a 12-month follow-up was examined through a measurement equivalence/invariance (ME/I) procedure via ESEM. Results ESEM showed HDRS17 had a bi-factor structure for PMDD patients (baseline mean (sd) HDRS17 22.6 (5.2); 87% PMDD >1 year) with an overall depression factor and two group factors: vegetative-worry and retardation-agitation, further complicated by negative item loading. This bi-factor structure was stable over 12 months follow up. Analysis of the HDRS6 showed it had a unidimensional structure, with positive item loading also stable over 12 months. Conclusions In this cohort of moderate-severe PMDD the HDRS17 had a bi-factor structure stable across 12 months with negative item loading on domain specific factors, indicating that it may be more appropriate to multidimensional assessment of settled clinical states, with shorter unidimensional subscales such as the HDRS6 used as measures of change.
Article
Full-text available
Depression is one of the most common mental illnesses in the world. It is estimated that there are 350 million people worldwide who have some form of depression. In the United States, 16 million people had a depressive episode in the past year. A condition affecting one’s mood and action, depression can affect one’s life substantially. According to the most recent World Health Organization, depression is the leading cause of disability worldwide and is believed to be a major contributor to the overall global burden of disease. The purpose of this case study was to systematically review yoga interventions aimed at improving depressive symptoms. The duration of the intervention period varied greatly, with the majority being 30 days or longer. Despite the limitations, it can be concluded that the yoga interventions with acupressure, proper diet and natural remedies were effective in reducing depression.
Article
Full-text available
We present the results of short-term forecasting of Henry Hub spot natural gas prices based on the performance of classical time series models and machine learning methods, specifically; neural networks (NN) and strategic seasonality-adjusted support vector regression machines(SSA-SVR). We introduce several improvements to the forecasting method based on SVR. A procedure for generation of model inputs and model input selection using feature selection (FS) algorithms is suggested. The use of FS algorithms for automatic selection of model input and the use of advanced global optimization technique PSwarm for the optimization of SVR hyper parameters reduce the subjective inputs. Our results show that the machine learning results reported in the literature often over exaggerate the successfulness of these models since, in some cases, we record only slight improvements over the time series approaches. We have to emphasize that our findings apply to Henry Hub, a market which is known among traders as the “widow maker”. We find definite advantages of using FS algorithms to preselect the variables both in NN and SVR. Machine learning models without the preselection of variables are often inferior to time-series models in forecasting spot prices and in this case FS algorithms show their usefulness and strength.
Article
Full-text available
This article presents the issue of Polish emotional speech recognition based on Polish database prepared by Medical Electronic Division of the Lodz University of Technology. The main goal of this article was to show the differences is artificial neuron networks learning processes. Researches were conducted on the basis of the five most popular variants of the back propagation algorithm. The neuron activation function was the second analyzed issue. © 2016, Wydawnictwo SIGMA - N O T Sp. z o.o. All rights reserved.
Article
Objective: The purpose of this study was the diagnostic evaluation of the hospital anxiety and depression scale total score, its depression subscale and the Beck depression inventory II in adults with congenital heart disease. Methods: This cross-sectional study evaluated 206 patients with congenital heart disease (mean age 35.3 ± 11.7 years; 58.3% men). Major depressive disorder was diagnosed by a structured clinical interview for the Diagnostic and Statistical Manual of Mental Disorders IV and disease severity with the Montgomery-Åsberg depression rating scale. Receiver operating characteristics provided assessment of diagnostic accuracy. Youden's J statistic identified optimal cut-off points. Results: Fifty-three participants (25.7%) presented with major depressive disorder. Of these, 28 (52.8%) had mild and 25 (47.2%) had moderate to severe symptoms. In the total cohort, the optimal cut-off of values was >11 in the Beck depression inventory II, >11 in the hospital anxiety and depression scale and >5 in the depression subscale. Optimal cut-off points for moderate to severe major depressive disorder were similar. The cut-offs for mild major depressive disorder were lower (Beck depression inventory II >4; hospital anxiety and depression scale >8; >2 in its depression subscale). In the total cohort the calculated area under the curve varied between 0.906 (hospital anxiety and depression scale) and 0.93 (Beck depression inventory II). Detection of moderate to severe major depressive disorder (area under the curve 0.965-0.98) was excellent; detection of mild major depressive disorder (area under the curve 0.851-0.885) was limited. Patients with major depressive disorder had a significantly lower quality of life, even when they had mild symptoms. Conclusion: All scales were excellent for detecting moderate to severe major depressive disorder. Classification of mild major depressive disorder, representing 50% of cases, was limited. Therapy necessitating loss of quality of life is already present in major depressive disorder with mild symptoms. Established cut-off points may still be too high to identify patients with major depressive disorder requiring therapy. External validation is needed to confirm our data.
Chapter
In this paper, we present our exploration of different machine-learning algorithms for detecting depression by analyzing the acoustic features of a person’s voice. We have conducted our study on benchmark datasets, in order to identify the best framework for the task, in anticipation of deploying it in a future application.
Conference Paper
We developed an algorithm to estimate the depression status from a person’s voice signal. In the experiment, we collected voice samples from patients with major depression. In addition, questionnaires concerning the patients’ depressed mood were obtained. The voice signals were collected for the subjects’ vocalizations of three types of long vowels. Next, acoustic features were calculated based on the speech. Subsequently, an algorithm was developed to estimate the severity of depression, judged by the HAM-D score, from the recorded voice samples. The results indicated that the algorithm performed well at estimating the severity of the HAM-D score using the acoustic features of the long vowels. Consequently, the algorithm also performed well at estimating the depressed mood, thus suggesting the utility of the algorithm for estimating depression conditions based on speech.
Article
Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and even feeling of self-hatred. This report describes work done by us for Audio Visual Emotion Challenge (AVEC) 2017 during our second year BTech summer internship. With the increase in demand to detect depression automatically with the help of machine learning algorithms, we present our multimodal feature extraction and decision level fusion approach for the same. Features are extracted by processing on the provided Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) database. Gaussian Mixture Model (GMM) clustering and Fisher vector approach were applied on the visual data; statistical descriptors on gaze, pose; low level audio features and head pose and text features were also extracted. Classification is done on fused as well as independent features using Support Vector Machine (SVM) and neural networks. The results obtained were able to cross the provided baseline on validation data set by 17% on audio features and 24.5% on video features.
Conference Paper
In this paper, we propose a two level hierarchical ensemble of classifiers for improved recognition of emotion from speech. At the first level, Mel Frequency Cepstral Coefficients (MFCC) of input speech are classified independently by suitably trained Support Vector Machine (SVM) and Gaussian Mixer Model (GMM) classifiers. From these first level classifiers, posterior probabilities of GMM and discriminate function values of SVM are extracted and given as input to second level SVM classifier, which classifies emotion based on these values. Extensive experiments were carried out using the Berlin database Emo-DB for seven emotions (anger, fear, bore, happy, neutral, disgust and sad). While the SVM and GMM classifiers produced only 67% and 66% accuracy respectively, 75% accuracy was achieved with our fusion approach.
Conference Paper
This study represents an extension work of the Weighted Ordered Classes-Nearest Neighbors (WOC-NN), a class-similarity based method introduced in our previous work [1]. WOC-NN computes similarities between a test instance and a class pattern of each emotion class in the likelihood space. An emotion class pattern is a representation of its ranked neighboring classes weighted according to their discrimination capability. In this study the class ranks weights are normalized inside each class pattern. We have also studied a new model of distance pattern based on a double class ranks introduced in order to take into account the interaction between the rank variables. The performance of the system based on double class ranks exceeds those based on a single class rank. Furthermore, using likelihood score rank of all class models in the decision rule of WOC-NN adds valuable information for data discrimination. The experiments on FAU AIBO corpus show that WOC-NN approach enhances the relative performance with 5.1% compared to Bayes decision rule. Also, the obtained result outperforms the state-of-the art ones.
Conference Paper
Incorporating multimodal information and temporal context from speakers during an emotional dialog can contribute to improving performance of automatic emotion recognition systems. Motivated by these issues, we propose a hierarchical framework which models emotional evolution within and between emotional utterances, i.e., at the utterance and dialog level respectively. Our approach can incorporate a variety of generative or discriminative classifiers at each level and provides flexibility and extensibility in terms of multimodal fusion; facial, vocal, head and hand movement cues can be included and fused according to the modality and the emotion classification task. Our results using the multimodal, multi-speaker IEMOCAP database indicate that this framework is well-suited for cases where emotions are expressed multimodally and in context, as in many real-life situations.
The self-reference processing of cognitive vulnerability of depression individuals: Evidence from ERPs
  • zhong
Experimental Study on the Intervention of Different Exercise Training for College Students with Depressive Tendency
  • M Zhang
Speech” and I. T. o. Language Processing
  • yun
A Novel Decision Tree for Depression Recognition in Speech
  • D Liu
  • L Wang
  • B Zhang
  • Hu
The self-reference processing of cognitive vulnerability of depression individuals: Evidence from ERPs
  • Y Zhong
  • H Yin
  • S Wang
  • F Wei
  • Y Zhan
  • R Cai
Depression recognition using machine learning methods with different feature generation strategies
  • X Li
  • X Zhang
  • J Zhu
  • W Mao
  • B J A I I M Hu
Speech” and I. T. o. Language Processing
  • Sungrack Yun
  • D J A C Yoo
Python machine learning: machine learning and deep learning with Python, scikit-learn, and TensorFlow
  • S Raschka