ArticlePublisher preview available

Prediction of postpartum depression using machine learning techniques from social media text

To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Early screening of mental disorders plays a crucial role in diagnosis and treatment. This study explores how data‐driven methods can leverage the information available on social media platforms to predict postpartum depression (PPD). A generalized approach is proposed where linguistic features are extracted from user‐generated textual posts on social media and categorized as general, depressive, and PPD representative using multiple machine learning techniques. We find that techniques used in our study exhibit strong predictive capabilities for PPD content. Holdout validation showed that multilayer perceptron outperformed other techniques such as support vector machine and logistic regression used in this study with 91.7% accuracy for depressive content identification and up to 86.9% accuracy for PPD content prediction. This work adopts a hierarchical approach to predict PPD. Therefore, the reported PPD accuracy represents the performance of the model to correctly classify PPD content from non‐PPD depressive content.
This content is subject to copyright. Terms and conditions apply.
Received: 18 November 2018 Revised: 20 February 2019 Accepted: 15 March 2019
DOI: 10.1111/exsy.12409
Prediction of postpartum depression using machine learning
techniques from social media text
Iram Fatima1Burhan Ud Din Abbasi2Sharifullah Khan2Majed Al-Saeed1
Hafiz Farooq Ahmad1Rafia Mumtaz2
1College of Computer Sciences and
Information Technology, King Faisal
University, Hofuf, Saudi Arabia
2Schoo l of Elec trical Engineering and
Computer S cience, National University of
Sciences and Technology, Islamabad, Pakistan
Corres pon den ce
Iram Fatima, College of Com puter Sc iences
and Information Technology, King Faisal
University, Hofuf, Saudi Arabia.
Email: ialrehman@kfu.
Funding information
Deanship o f Scientific Research, King Faisal
University, Grant/Award Number: 180055
Early screening of mental disorders plays a crucial role in diagnosis and treatment. This
study explores how data-driven methods can leverage the information available on social
media platforms to predict postpartum depression (PPD). A generalized approach is proposed
where linguistic features are extracted from user-generated textual posts on social media and
categorized as general, depressive, and PPD representative using multiple machine learning
techniques. We find that techniques used in our study exhibit strong predictive capabilities
for PPD content. Holdout validation showed that multilayer perceptron outperformed other
techniques such as support vector machine and logistic regression used in this study with
91.7% accuracy for depressive content identification and up to 86.9% accuracy for PPD content
prediction. This work adopts a hierarchical approach to predict PPD. Therefore, the reported
PPD accuracy represents the performance of the model to correctly classify PPD content from
non-PPD depressive content.
machine learning, mental health, moods and emotions, postpartum depression, social media
Transition to parenthood is one of the major phases in lives of people impacting various aspects of life, at times even causing negative emotional
impact (Hudson, Elek, & Campbell-Grossman, 2000). These changes seem to affect mothers and fathers both because of their inability to resolve
differences between personal, social, and professional lives (Genesoni & Tallandini, 2009; Woolhouse, McDonald, & Brown, 2012). Postpartum
depression (PPD) is one of the more common disorders diagnosed in parents. Diagnostic and Statistical Manual of Mental Disorders defines PPD
as a major depress ive diso rder with p erip artu m onse t with the mo st recent ep isode o cc urrin g from anyw he re du ring pregnanc y till 4 we eks afte r
childbirth (American Psychiatric Association, 2013). International Classification of Diseases recognizes this disorder up to the period of 6 weeks
after childbirth (World Health Organization, 2004). Percentage of individuals affected from this disorder shows large variation around the world
and can be as high as 63% (Kalyani, Saeed, Rehman, & Mubbashar, 2001). Although mothers are more susceptible to PPD, an estimated 4% of
fathers also experience this disorder (Davé, Petersen, Sherr, & Nazareth, 2010). A study found that 8% of adoptive mothers also experienced
depression possibly due to lifestyle changes (Mott, Schiller, Richards, O'Hara, & Stuart, 2011). On average, 15% of mothers are expected to
be suffering from PPD all over the world. Because no biological measure has been identified to be the cause of PPD, it becomes a challenge
to diagnose PPD considering that changes in appetite, sleep patterns, and excessive fatigue are a norm for women after childbirth (Pearlstein,
Howard, Salisbury, & Zlotnick, 2009).
Researchers have identified various factors as predictors of PPD in individuals who are suffering or at risk (Beck, 1998, 1998, 2001; Reck,
Stehle, Reinig, & Mundt, 2009). Some researchers have designed a series of questions to be answered in order to simplify the process of
identification of PPD (Cox, Holden, & Sagovsky, 1987). Similarly, some have worked on measuring the severity of the depression (Kroenke, Spitzer,
Abbreviations: D-CC, depressive content classification; PPD, postpartum depression; PPD-CC, postpartum depression content classification.
Expert Systems. 2019;36:e12409. © 2019 John Wiley & Sons, Ltd. 1of13
https:// 1111/exsy.12409
... The model utilized SVM and multilayer perceptron (MLP) algorithms for classification. A hierarchical model for postpartum depression prediction, making use of textual posts shared on the Reddit forum is proposed in [13]. The model extracted the features utilizing the LIWC dictionary and the Least Absolute Shrinkage and Selection Operator (LASSO) technique. ...
... The explanation of these processes is elucidated in the subsections. Features extracted Prediction models [9] Twitter N-grams using TF-IDF & LR LR [10] Sina micro blog N-gram with pearson correlation coefficient SVM [11] Reddit Combined N-gram + LDA + LIWC LR, SVM, NN [12] Reddit Bigram, LIWC, LDA SVM, MLP [13] Reddit LIWC SVM, MLP [14] COVID 19 Tweets Psycholinguistic features IChOA-LSTM-CNN [15] Reddit One-hot encoding LSTM Figure 1: The schematic diagram of the proposed intelligent depression detection framework ...
... The forget vector f t multiplies the previous cell state C tÀ1 and discards the values with 0 outcomes. The network then executes elementwise addition on the output of the input vector i t , updating the cell state and creating a new cell state C t as mentioned in Eq. (13). ...
... The model was able to achieve an accuracy of 81%. Fatima et al. (61) conducted a similar study to predict PPD using social media text. The authors showed that Multilayer Perceptron (MLP) outperformed Support Vector Machine (SVM) and Logistic Regression (LR) in prediction when using a hold-out validation technique by achieving an accuracy of 81%. ...
Full-text available
A significant challenge for hospitals and medical practitioners in low- and middle-income nations is the lack of sufficient health care facilities for timely medical diagnosis of chronic and deadly diseases. Particularly, maternal and neonatal morbidity due to various non-communicable and nutrition related diseases is a serious public health issue that leads to several deaths every year. These diseases affecting either mother or child can be hospital-acquired, contracted during pregnancy or delivery, postpartum and even during child growth and development. Many of these conditions are challenging to detect at their early stages, which puts the patient at risk of developing severe conditions over time. Therefore, there is a need for early screening, detection and diagnosis, which could reduce maternal and neonatal mortality. With the advent of Artificial Intelligence (AI), digital technologies have emerged as practical assistive tools in different healthcare sectors but are still in their nascent stages when applied to maternal and neonatal health. This review article presents an in-depth examination of digital solutions proposed for maternal and neonatal healthcare in low resource settings and discusses the open problems as well as future research directions.
... ML utilizing health care records can predict postpartum depression as well as need for postpartum psychiatric admission with impressive psychometric accuracy [61-63, 64•, 65-67]. Similar predictive capacity with ML for postpartum depression has been demonstrated when utilizing data from social media posting in mothers [68] and fathers [69]. While studies still differ on variables of importance (ranging from well-established risk factors such as prior mental health history and obstetric history to laboratory results) and appropriate algorithms, this remains a promising potential avenue for identification and treatment. ...
Full-text available
Purpose of Review This review explores advances in the utilization of technology to address perinatal mood and anxiety disorders (PMADs). Specifically, we sought to assess the range of technologies available, their application to PMADs, and evidence supporting use. Recent Findings We identified a variety of technologies with promising capacity for direct intervention, prevention, and augmentation of clinical care for PMADs. These included wearable technology, electronic consultation, virtual and augmented reality, internet-based cognitive behavioral therapy, and predictive analytics using machine learning. Available evidence for these technologies in PMADs was almost uniformly positive. However, evidence for use in PMADs was limited compared to that in general mental health populations. Summary Proper attention to PMADs has been severely limited by issues of accessibility, affordability, and patient acceptance. Increased use of technology has the potential to address all three of these barriers by facilitating modes of communication, data collection, and patient experience.
Depression is a mental illness of the human body that continuously affects human activities such as thinking capacity and physical appearance of the body. The emotional feeling of feeling low and dull toward the situation breaks down the career growth. Psychologists face a major problem in detecting depression at early stages. Patients find them difficult to interact with and share every thought regarding feeling they have. Major depressive disorder is characterized by sadness, worthlessness, disturbed sleeping patterns and eating habits, and lethargy in activities that were once enjoyed. Social sites such as Facebook, Reddit, Twitter, Snapchat, etc., turn out a helpful way to express ideas and negative thoughts to feel free. Many researches have been completed on the dataset to detect depression. It turns out the part of the sentimental analysis by applying the machine algorithm such decision tree, random forest, naïve Bayes, Ensemble model, KNN, maximum entropy, etc. In this paper, the author studied various research to enhance and conclude the best algorithm and high accuracy, precision, and recall of depression detection. KeywordsDepressionMachine learningSocial sites
Background Postpartum depression (PPD) presents a serious health problem among women and their families. Machine learning (ML) is a rapidly advancing field with increasing utility in predicting PPD risk. We aimed to synthesize and evaluate the quality of studies on application of ML techniques in predicting PPD risk. Methods We conducted a systematic search of eight databases, identifying English and Chinese studies on ML techniques for predicting PPD risk and ML techniques with performance metrics. Quality of the studies involved was evaluated using the Prediction Model Risk of Bias Assessment Tool. Results Seventeen studies involving 62 prediction models were included. Supervised learning was the main ML technique employed and the common ML models were support vector machine, random forest and logistic regression. Five studies (30 %) reported both internal and external validation. Two studies involved model translation, but none were tested clinically. All studies showed a high risk of bias, and more than half showed high application risk. Limitations Including Chinese articles slightly reduced the reproducibility of the review. Model performance was not quantitatively analyzed owing to inconsistent metrics and the absence of methods for correlation meta-analysis. Conclusions Researchers have paid more attention to model development than to validation, and few have focused on improvement and innovation. Models for predicting PPD risk continue to emerge. However, few have achieved the acceptable quality standards. Therefore, ML techniques for successfully predicting PPD risk are yet to be deployed in clinical environments.
As COVID-19 crisis is settling down in countries, whether or not a person has been affected personally by the disease, he fights with issues such as anxiety, panic attacks, grief, low mood, and many other psychotic disorders. Mental fitness is one of the major strengths in the development of the individual. Development of social sites turns out to be one platform where the person feels free to vent out their thoughts and to easily interact with people. Extracting useful information from those posts is a part of sentimental analysis, which is the technique of machine learning that helps to know the mental condition of the individual. In this paper, various machine learning algorithms such as random forest, Naive Bayes, decision tree, multilayer perceptron, maximum entropy, KNN, gradient boosted decision tree, adaptive boosting, bagged logistic regression, tree ensemble model, Liblinear, convolutional neural network, and long short-term memory are applied on the dataset, and different mathematical scales such as accuracy, precision, recall, and F1 score concluded that bagged logistic regression has given the better accuracy results.
Depression has become a public health issue. The high prevalence rate worsens all scopes of life irrespective of age and gender, affects psychological functioning, and results in loss of productivity. Early detection is crucial for expanding individuals’ lifespan and more effective mental health interventions. Social networks that expose personal sharing and feelings have enabled the automatic identification of specific mental conditions, particularly depression. This review aims to explore the sentiment analysis to the psychology area for detecting depressed users from the datasets originating from social media. Sentiment analysis involves five research tasks, but this study investigates the sentiment analysis that focuses on emotion detection in the text data. This paper surveys existing work on the most common classification approach in machine learning to classify linguistic, behavioral, and emotional features and presents a comparative study of different approaches.
Full-text available
Absolutist thinking is considered a cognitive distortion by most cognitive therapies for anxiety and depression. Yet, there is little empirical evidence of its prevalence or specificity. Across three studies, we conducted a text analysis of 63 Internet forums (over 6,400 members) using the Linguistic Inquiry and Word Count software to examine absolutism at the linguistic level. We predicted and found that anxiety, depression, and suicidal ideation forums contained more absolutist words than control forums (ds > 3.14). Suicidal ideation forums also contained more absolutist words than anxiety and depression forums (ds > 1.71). We show that these differences are more reflective of absolutist thinking than psychological distress. It is interesting that absolutist words tracked the severity of affective disorder forums more faithfully than negative emotion words. Finally, we found elevated levels of absolutist words in depression recovery forums. This suggests that absolutist thinking may be a vulnerability factor.
Full-text available
The identification of a mental disorder at its early stages is a challenging task because it requires clinical interventions that may not be feasible in many cases. Social media such as online communities and blog posts have shown some promising features to help detect and characterise mental disorder at an early stage. In this work, we make use of user-generated content to identify depression and further characterise its degree of severity. We used the user-generated post contents and its associated mood tag to understand and differentiate the linguistic style and sentiments of the user content. We applied machine learning and statistical analysis methods to discriminate the depressive posts and communities from non-depressive ones. The depression degree of a depressed post is identified using variations of valence values based on the mood tag. The proposed methodology achieved 90%, 95% and 92% accuracy for the classification of depressive posts, depressive communities and depression degree, respectively.
Conference Paper
Full-text available
Worldwide the Mental illness is a primary cause of disability. It affects millions of people each year and whom of few receives cure. We found that social networking sites (SNS) can be used as a screening tool for discovering an affective mental illness in individuals. SNS posting truly depicts user’s current behavior, thinking style, and mood. We consider a set of behavioral attributes concerning to socialization, socioeconomics, familial, marital status, feeling, language use, and references of antidepressant treatments. We take advantage of these behavioral attributes to envision a tool that can provide prior alerts to an individual based on their SNS data regarding Major Depression Disorder (MDD). We propose a method, to automatically classify individuals into displayer and non-displayer depression using ensemble learning techniquefrom theirFacebook profile. Our developed tool is used for MDD diagnosis of individuals in additional to questioner techniques such as Beck Depression Inventory (BDI) and CESD-R.
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Background Many women experience moderate-to-severe depression and anxiety in the postpartum period for which pharmacotherapy is often the first-line treatment. Many breastfeeding mothers are reticent to increase their dose or consider additional medication, despite incomplete response, due to potential adverse effects on their newborn. These mothers are amenable to non-pharmacological intervention for complete symptom remission. The current study evaluated the feasibility of an eight-week mindfulness-based cognitive therapy (MBCT) intervention as an adjunctive treatment for postpartum depression and anxiety. Methods Women were recruited at an outpatient reproductive mental health clinic based at a maternity hospital. Participants had a diagnosis of postpartum depression/anxiety within the first year following childbirth. They were enrolled in either the MBCT intervention group (n=14) or the treatment-as-usual control group (n=16), and completed the Patient Health Questionnaire-9 (PHQ-9), the Generalized Anxiety Disorder-7 (GAD-7) questionnaire, and the Mindful Attention Awareness Scale (MAAS) at baseline and at 4 weeks, 8 weeks, and 3 months following baseline. Results Multivariate analyses demonstrated that depression and anxiety levels decreased, and mindfulness levels increased, in the MBCT group, but not in the control group. Many of the between-group and over time comparisons displayed trends towards significance, although these differences were not always statistically significant. Additionally, the effect sizes for anxiety, depression, and mindfulness were frequently large, indicating that the MBCT intervention may have had a clinically significant effect on participants. Limitations Limitations include small sample size and the non-equivalent control group design. Conclusions We demonstrated that MBCT has potential as an adjunctive, non-pharmacological treatment for postpartum depression/anxiety that does not wholly remit with pharmacotherapy. (249 words)
Objective: This study tries to find the incidence, and the contributing factors in our own setting, for highlighting the strategies, which need to be adopted as there are about 11 to 12 million women of childbearing age at risk for developing mental illness such as postnatal depression a commonest disorder quoted in literature. Design: This is two-stage cross-sectional prospective study. Place and Duration: The study was carried out in the Obstetrics Unit of Rawalpindi General Hospital and was conducted from May to July, 1998. Subject/Methods: One hundred and twenty consecutive women admitted in the Obstetrics Unit at Rawalpindi General Hospital with full term pregnancy were administered Edinburgh Postnatal Depression Scale (EPDS) during the 2nd postnatal week by trained raters. All the women scoring above the cut off point of 10 (n = 72) were interviewed by the authors using Psychiatric Assessment Schedule (PAS), based on ICD-10 Diagnostic Criteria for Research (ICD-10 DCR) who were blind to the scores on EPDS. One tenth of the low scorers were also interviewed in a similar fashion. Results: The women (54.17%) were diagnosed to be suffering from postnatal blues and 37.50% from depressive illness, 22.2% of the 54.17% remitted spontaneously. Mean age of women suffering from postnatal depression was 22.61 years with the range of 20-35 years as compared to 18.81 years and range of 17-25 years among controls. Unwanted pregnancy, being primipara, living in extended family, interpersonal marital difficulties and early loss of mother were associated with higher rates of postnatal depression. Lower level of education was found associated with higher rates of postnatal depression. Conclusion: It is safe to conclude that incidence of postnatal depression is probably higher in Pakistan when compared with developed countries as the psychosocial and economic factors are crucial. The study highlights that strategies to improve the reproductive health can decrease the incidence of postnatal depression and improve quality of life of female population of our country.