ArticlePDF Available

Abstract and Figures

Depression, the most prevalent mental illness, is underdiagnosed and undertreated, highlighting the need to extend the scope of current screening methods. Here, we use language from Facebook posts of consenting individuals to predict depression recorded in electronic medical records. We accessed the history of Facebook statuses posted by 683 patients visiting a large urban academic emergency department, 114 of whom had a diagnosis of depression in their medical records. Using only the language preceding their first documentation of a diagnosis of depression, we could identify depressed patients with fair accuracy [area under the curve (AUC) = 0.69], approximately matching the accuracy of screening surveys benchmarked against medical records. Restricting Facebook data to only the 6 months immediately preceding the first documented diagnosis of depression yielded a higher prediction accuracy (AUC = 0.72) for those users who had sufficient Facebook data. Significant prediction of future depression status was possible as far as 3 months before its first documentation. We found that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination) processes. Unobtrusive depression assessment through social media of consenting individuals may become feasible as a scalable complement to existing screening and monitoring procedures.
Content may be subject to copyright.
Facebook language predicts depression in
medical records
Johannes C. Eichstaedt
, Robert J. Smith
, Raina M. Merchant
, Lyle H. Ungar
, Patrick Crutchley
Daniel Preot
, David A. Asch
, and H. Andrew Schwartz
Positive Psychology Center, University of Pennsylvania, Philadelphia, PA 19104;
Penn Medicine Center for Digital Health, University of Pennsylvania,
Philadelphia, PA 19104;
Department of Emergency Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104;
The Center for Health
Equity Research and Promotion, Philadelphia Veterans Affairs Medical Center, Philadelphia, PA 19104; and
Computer Science Department, Stony Brook
University, Stony Brook, NY 11794
Edited by Susan T. Fiske, Princeton University, Princeton, NJ, and approved September 11, 2018 (received for review February 26, 2018)
Depression, the most prevalent mental illness, is underdiagnosed and
undertreated, highlighting the need to extend the scope of current
screening methods. Here, we use language from Facebook posts of
consenting individuals to predict depression recorded in electronic
medical records. We accessed the history of Facebook statuses posted
by 683 patients visiting a large urban academic emergency de-
partment, 114 of whom had a diagnosis of depression in their
medical records. Using only the language preceding their first
documentation of a diagnosis of depression, we could identify
depressed patients with fair accuracy [area under the curve
(AUC) =0.69], approximately matching the accuracy of screening
surveys benchmarked against medical records. Restricting Face-
book data to only the 6 months immediately preceding the first
documented diagnosis of depression yielded a higher prediction ac-
curacy (AUC =0.72) for those users who had sufficient Facebook data.
Significant prediction of future depression status was possible as far
as 3 months before its first documentation. We found that language
predictors of depression include emotional (sadness), interpersonal
(loneliness, hostility), and cognitive (preoccupation with the self, ru-
mination) processes. Unobtrusive depression assessment through so-
cial media of consenting individuals may become feasible as a scalable
complement to existing screening and monitoring procedures.
big data
social media
Each year, 726% of the US population experiences de-
pression (1, 2), of whom only 1349% receive minimally
adequate treatment (3). By 2030, unipolar depressive disorders
are predicted to be the leading cause of disability in high-income
countries (4). The US Preventive Services Task Force recom-
mends screening adults for depression in circumstances in which
accurate diagnosis, treatment, and follow-up can be offered (5).
These high rates of underdiagnosis and undertreatment suggest
that existing procedures for screening and identifying depressed
patients are inadequate. Novel methods are needed to identify
and treat patients with depression.
By using Facebook language data from a sample of consenting
patients who presented to a single emergency department, we
built a method to predict the first documentation of a diagnosis
of depression in the electronic medical record (EMR). Previous
research has demonstrated the feasibility of using Twitter (6, 7)
and Facebook language and activity data to predict depres-
sion (8), postpartum depression (9), suicidality (10), and post-
traumatic stress disorder (11), relying on self-report of diagnoses
on Twitter (12, 13) or the participantsresponses to screening
surveys (6, 7, 9) to establish participantsmental health status. In
contrast to this prior work relying on self-report, we established a
depression diagnosis by using medical codes from an EMR.
As described by Padrez et al. (14), patients in a single urban
academic emergency department (ED) were asked to share access
to their medical records and the statuses from their Facebook
timelines. We used depression-related International Classification
of Diseases (ICD) codes in patientsmedical records as a proxy for
the diagnosis of depression, which prior research has shown is fea-
sible with moderate accuracy (15). Of the patients enrolled in the
study, 114 had a diagnosis of depression in their medical records. For
these patients, we determined the date at which the first docu-
mentation of a diagnosis of depression was recorded in the EMR of
the hospital system. We analyzed the Facebook data generated
by each user before this date. We sought to simulate a realistic
screening scenario, and so, for each of these 114 patients, we iden-
tified 5 random control patients without a diagnosis of depression in
the EMR, examining only the Facebook data they created before the
corresponding depressed patients first date of a recorded diagnosis
of depression. This allowed us to compare depressed and control
patientsdata across the same time span and to model the preva-
lence of depression in the larger population (16.7%).
Prediction of Depression. To predict the future diagnosis of de-
pression in the medical record, we built a prediction model by using
the textual content of the Facebook posts, post length, frequency of
posting, temporal posting patterns, and demographics (Materials
and Methods). We then evaluated the performance of this model by
comparing the probability of depression estimated by our algorithm
against the actual presence or absence of depression for each pa-
tient in the medical record (using 10-fold cross-validation to avoid
overfitting). Varying the threshold of this probability for diagnosis
Depression is disabling and treatable, but underdiagnosed. In
this study, we show that the content shared by consenting
users on Facebook can predict a future occurrence of de-
pression in their medical records. Language predictive of de-
pression includes references to typical symptoms, including
sadness, loneliness, hostility, rumination, and increased self-
reference. This study suggests that an analysis of social media
data could be used to screen consenting individuals for de-
pression. Further, social media content may point clinicians to
specific symptoms of depression.
Author contributions: J.C.E., R.M. M., L.H.U., and H.A.S. designed research; J.C.E., P.C.,
D.P.-P., and H.A.S. performed research; J.C.E. and H.A.S. contributed new reagents/ana-
lytic tools; J.C.E., P.C., D.P.-P., and H.A.S. analyzed data; and J.C.E., R.J.S., R.M.M., L.H.U.,
D.A.A., and H.A.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This open access article is distributed under Creative Commons Attribution-NonCommercial-
NoDeriv atives Licen se 4.0 (CC BY-NC-N D).
Data deposition: The data reported in this paper have been deposited in the Open Science
J.C.E.and R.J.S. contributed equally to this work.
To whom correspondence should be addressed. Email:
This article contains supporting information online at
1073/pnas.1802331115/-/DCSupplemental. PNAS Latest Articles
uniquely determines a combination of true and false positive rates
that form the points of a receiver operating characteristic (ROC)
curve; overall prediction performance can be summarized as the
area under the ROC curve. To yield interpretable and fine-grained
language variables, we extracted 200 topics by using latent
dirichlet allocation (LDA; ref. 16), a method akin to factor analysis
but appropriate for word frequencies. We trained a predictive
model based on the relative frequencies with which patients
expressed these topics, as well as one-word and two-word phrases,
obtaining an area under the curve (AUC) of 0.69, which falls just
short of the customary threshold for good discrimination (0.70). As
shown in Fig. 1, language features outperform other posting fea-
tures and demographic characteristics, which do not improve pre-
dictive accuracy when added to the language-based model.
How do these prediction performances compare against other
methods of screening for depression? Noyes et al. (17) assessed
the concordance of screening surveys with diagnoses of de-
pression recorded in EMRs as in this study*; the results are
shown in Fig. 2 together with our Facebook model. The results
suggest that the Facebook prediction model yields prediction
accuracies comparable to validated self-report depression scales.
Previous work observed that depressed users are more likely to
tweet during night hours (6). However, patients with and without
a diagnosis of depression in our study differed only modestly in
their temporal posting patterns (diurnally and across days of the
week; AUC =0.54). Post length and posting frequency (meta-
features) were approximately as predictive of depression in the
medical record as demographic characteristics (AUCs of 0.59
and 0.57, respectively), with the median annual word count
across posts being 1,424 words higher for users who ultimately
had a diagnosis of depression (Wilcoxon W=26,594, P=0.002).
Adding temporal pattern features and metafeatures to the language-
based prediction model did not substantially increase prediction
performance, suggesting that the language content captures the
depression-related variance in the other feature groups.
Comparison with Previous Prediction Studies. In our sample, pa-
tients with and without a diagnosis of depression in the medical
record were balanced at a 1:5 ratio to simulate true depression
prevalence. In previous work, this balance has been closer to 1:1
(e.g., 0.94:1 in ref. 7, 1.78:1 in ref. 6). When limiting our sample
to balanced classes (1:1), we obtain an AUC of 0.68 and F
(the harmonic mean of precision and recall) of 0.66, which is
comparable to the F
scores of 0.65 reported in ref. 7 and 0.68
reported in ref. 6 based on Twitter data and survey-reported de-
pression. The fact that language content captures the depression-
related variance in the other feature groups is consistent with what
has been seen in previous work (6, 7, 18). However, this work
shows that social media can predict diagnoses in medical records,
rather than self-report surveys.
Predicting Depression in Advance of the Medical Record. We sought
to investigate how far in advance Facebook may be able to yield
a prediction of future depression. To that end, we considered
language data for depressed patients from seven 6-mo windows
preceding the first documentation of depression (or its matched
time for controls) for the subset of 307 users who had at least 20
words in all seven windows. The results, shown in Fig. 3, suggest
that the closer in time the Facebook data are to the docu-
mentation of depression, the better their predictive power, with
data from the 6 mo immediately preceding the documentation
of depression yielding an accuracy (i.e., AUC) of 0.72, sur-
passing the customary threshold of good discrimination (0.70).
These results lend plausibility to the estimates of predictive
power because one would expect just such a temporal trend. A
minimal prediction of future depression (AUC =0.62) above
chance (P=0.002) can be obtained approximately 3 mo in
advance (39-mo window). Although this prediction accuracy is
relatively modest, it suggests that, perhaps in conjunction with
other forms of unobtrusive digital screening, the potential exists
to develop burdenless indicators of mental illness that precede
the medical documentation of depression (which may often be
delayed) and which, as a result, could reduce the total extent
of functional impairment experienced during the depressive
Language Markers of Depression. To better understand what spe-
cific language may serve as markers of future depression and
underlay the prediction performances of the aforementioned
machine learning models, we determined how users with and
Fig. 1. Prediction performances of future diagnosis of depression in the
EMR based on demographics and Facebook posting activity, reported as
cross-validated out-of-sample AUCs.
Fig. 2. ROC curve for a Facebook activity-based prediction model (all pre-
dictors combined; blue), and points as combinations of true and false posi-
tive rates reported by Noyes et al. (17) for different combinations of
depression surveys (aand b, 9-item Mini-International Neuropsychiatric In-
terviewMajor Depressive Episode Module; cand d, 15-item Geriatric De-
pression Scale with a cutoff >6) and time windows in Medicare claims data (a
and c, within 6 mo before and after survey; band d, within 12 mo).
*Noyes et al. (17) sought to benchmark claims data against self-report depression scales as
the criterion variable in a sample of 1,551 elderly adults; we have derived the points
given in Fig. 2 from the confusion matrices they published. They included the ICD-9
codes used in this study (296.2 and 311) among their extended setof codes.
| Eichstaedt et al.
without a diagnosis of depression differed in the expression of
the 200 data-driven LDA topics derived from their text.
In Fig.
4, we show the 10 topics most strongly associated with future
depression status when controlling for age, gender, and race: 7
(of 200) topics were individually significant at P<0.05 with
BenjaminiHochberg correction for multiple comparisons.
To complement this data-driven approach, we also exam-
ined the use of 73 prespecified dictionaries (lists of words)
from the Linguistic Inquiry and Word Count (LIWC) soft-
ware (2015; ref. 19) that is widely used in psychological re-
search. Nine LIWC dictionaries predicted future depression
status at BenjaminiHochberg-corrected significance levels
controlling for demographics (Table 1).
We observed emotional language markers of depressed mood
(topic: tears,cry,pain; standardized regression coefficient β=
0.15; P<0.001), loneliness (topic: miss,much,baby;β=0.14; P=
0.001) and hostility (topic: hate,ugh,fuckin;β=0.12; P=0.012).
The LIWC negative emotion (β=0.14; P=0.002; most frequent
words: smh,fuck,hate) and sadness dictionaries (β=0.17; P<
0.001; miss,lost,alone) captured similar information.
We observed that users who ultimately had a diagnosis of
depression used more first-person singular pronouns (LIWC
dictionary: β=0.19; P<0.001; I,my,me), suggesting a pre-
occupation with the self. First-person singular pronouns were
found by a recent meta-analysis (20) to be one of the most robust
language markers of cross-sectional depression (meta-analytic
r=0.13) and by a preliminary longitudinal study a marker of
future depression, as observed in this study (21). Although there
is substantial evidence that the use of first-person singular pro-
nouns is associated with depression in private writings (22), this
study extends the evidence for this association into the semi-
public context of social media.
Cognitively, depression is thought to be associated with per-
severation and rumination, specifically on self-relevant infor-
mation (23), which manifests as worry and anxiety when directed
toward the future (21). In line with these conceptualizations,
we observed language markers suggestive of increased rumina-
tion (topic: mind,alot,lot;β=0.11; P=0.009) and anxiety
(LIWC dictionary: β=0.08; P=0.043; scared,upset,worry), al-
beit not meeting BenjaminiHochberg-corrected significance
Depression often presents itself with somatic complaints in
primary care settings (24, 25). In our sample, we observed that
the text of users who ultimately received a diagnosis of de-
pression contained markers of somatic complaints (topic: hurt,
head,bad;β=0.15; P<0.001; LIWC dictionary, health: β=0.11;
P=0.004; life,tired,sick). We also observed increased medical
references (topic: hospital,pain,surgery;β=0.20; P<0.001),
which is consistent with the finding that individuals with de-
pression are known to visit the ED more frequently than indi-
viduals without depression (26).
Our results show that Facebook language-based prediction
models perform similarly to screening surveys in identifying pa-
tients with depression when using diagnostic codes in the EMR
to identify diagnoses of depression. The profile of depression-
associated language markers is nuanced, covering emotional
(sadness, depressed mood), interpersonal (hostility, loneliness),
and cognitive (self-focus, rumination) processes, which previous
research has established as congruent with the determinants and
consequences of depression. The growth of social media and
continuous improvement of machine-learning algorithms suggest
that social media-based screening methods for depression may
become increasingly feasible and more accurate.
We chose to examine depression because it is prevalent,
disabling, underdiagnosed, and treatable. As a major driver of
medical morbidity and mortality, it is important to more
thoroughly diagnose and treat depression across the pop-
ulation. Patients with depression exhibit poorer medical out-
comes after acute inpatient care, increased utilization of
emergency care resources, and increased all-cause mortality
(2528). Identifying patients at an earlier stage in their mental
Fig. 3. AUC prediction accuracies of future depression status as a function of time before the documentation of depression in the medical record. Shown in
blue are the 6-mo time windows of Facebook data used for the predictions; the blue dots indicate the AUCs obtained for these windows. Error bars indicate
SEs (based on the 10 cross-validation folds). Logarithmic trendline is shown to guide the eye.
A language prediction model using only the 200 LDA topics (and not the relative fre-
quencies of words and phrases) reaches an accuracy of AUC of 0.65, so the topics capture
most of the language variance.
No topic or dictionary is negatively associated with future depression status (controlling
for demographic characteristics) at significance levels corrected for multiple compari-
sons. The 10 LDA topics most negatively associated with depression status are shown
in SI Appendix, Fig. S1. They cover language suggestive of gratitude, faith, school and
work, and fitness and music consumption (SI Appendix, Table S1 includes an extended
set of LIWC associations).
Eichstaedt et al. PNAS Latest Articles
illness through novel means of detection creates opportunities
for patients to be connected more readily with appropriate
care resources. The present analysis suggests that social media-
based prediction of future depression status may be possible as
early as 3 mo before the first documentation of depression in
the medical record.
In the primary care setting, a diagnosis of depression is often
missed (29). The reason for such underdetection is multifac-
torial: depression has a broad array of possible presenting
symptoms, and its severity changes across time. Primary care
providers are also tasked with addressing many facets of health
within a clinical visit that may be as brief as 15 min. Previous
research has recommended improving detection of depression
through the routine use of multistep assessment processes (30).
Initial identification of individuals who may be developing de-
pression via analysis of social media may serve as the first step
in such a process (using a detection threshold favoring high true
positive rates). With the increasing integration of social media
platforms, smartphones, and other technologies into the lives of
patients, novel avenues are becoming available to detect de-
pression unobtrusively. These methods include the algorithmic
analysis of phone sensor, usage, and GPS position data on
smartphones (31), and of facial expressions in images and vid-
eos, such as those shared on social media platforms (32, 33). In
principle, these different screening modalities could be com-
bined in a way that improves overall screening to identify in-
dividuals to complete self-report inventories (34) or be assessed
by a clinician.
In the present study, patients permitted researchers to collect
several years of retroactive social media data. These longitudinal
data may allow clinicians to capture the evolution of depression
severity over time with a richness unavailable to traditional
clinical surveys delivered at discrete time points. The language
exhibited by patients who ultimately developed depression was
nuanced and varied, covering a wide array of depression-related
symptoms. Changes in language patterns about specific symp-
toms could alert clinicians to specific depression symptoms
among their consenting patients.
This study illustrates how social media-based detection tech-
nologies may optimize diagnosis within one facet of health.
These technologies raise important question related to patient
privacy, informed consent, data protection, and data ownership.
Clear guidelines are needed about access to these data, reflecting
the sensitivity of content, the people accessing it, and their
purpose (35). Developers and policymakers need to address the
challenge that the application of an algorithm may change social
media posts into protected health information, with the corre-
sponding expectation of privacy and the right of patients to re-
main autonomous in their health care decisions. Similarly, those
who interpret the data need to recognize that people may change
what they write based on their perceptions of how that in-
formation might be observed and used.
The key contribution of this study is that it links mental
health diagnoses with social media content, and that it used this
linkage to reveal associations between the content and symp-
toms of a prevalent, underdiagnosed, and treatable condition.
This suggests that, one day, the analysis of social media lan-
cation of depressed individuals. Together with the growing
Fig. 4. Ten language topics most positively associated with a future depression diagnosis controlling for demographics (*P<0.05, **P<0.01, and ***P<
P<0.05 after BenjaminiHochberg correction for multiple comparisons). Font size reflects relative prevalence of words within topics. Color shading is
to aid readability and carries no meaning.
Table 1. LIWC Dictionaries Associated with Depression
LIWC dictionary βPvalue
First pers singular (I,me) 0.19 ***
Feel (perceptual process) 0.15 ***
Negative emotions 0.14 **
Sadness 0.17 ***
Cognitive processes
Discrepancy 0.12 **
Health 0.11 **
Shown are all pronoun and psychological process LIWC 2015 dictionaries
significantly associated with future depression status controlling for de-
mographics, with strengths of associations given as standardized regression
coefficients. All coefficients meet the P<0.05 significance threshold when
corrected for multiple comparisons by BenjaminiHochberg method. Signif-
icantly correlated superordinate personal pronoun and pronoun dictionaries
are not shown, which include the first-person singular pronoun dictionary
shown here.
**P<0.01, ***P<0.001.
| Eichstaedt et al.
sophistication, scalability, and efficacy of technology-supported
treatments for depression (36, 37), detection and treatment of
mental illness may soon meet individuals in the digital spaces
they already inhabit.
Materials and Methods
Participant Recruitment and Data Collection. This study was approved by
the institutional review board at the University of Pennsylvania. The flow of
the data collection is described in ref. 14. In total, 11,224 patients were
approached in the ED over a 26-mo period. Patients were excluded if they
were under 18 y old, suffered from severe trauma, were incoherent, or
exhibited evidence of severe illness. Of these, 2,903 patients consented to
share their social media data and their EMRs, which resulted in 2,679 (92%)
unique EMRs. These EMRs were not specific to the ED but covered all patient
encounters across the entire health care system. A total of 1,175 patients
(44%) were able to log in to their Facebook accounts, and our Facebook app
was able to retrieve any Facebook information and posts for as much as 6 y
earlier, ranging from July 2008 through September 2015. These users shared
a total of 949,530 Facebook statuses, which we used to model the 200
LDA topics.
From the health systems EMRs, we retrieved demographic data (age, sex,
and race) and prior diagnoses (by ICD-9 codes). We considered patients as
having a diagnosis of depression if their EMRs included documentation of
ICD codes 296.2 (Major Depression) or 311 (Depressive Disorder, not else-
where classified), resulting in 176 patients with any Facebook language
(base rate 176/1,175 =15.0%, or 1:5.68). Of the 176 depressed patients, 114
(63%) had at least 500 words in status updates preceding their first docu-
mentation of a diagnosis of depression. A total of 49 patients had no lan-
guage preceding their first documentation, suggesting that, for 28% of the
sample, their first documentation of depression preceded joining or the
posting on Facebook. Notably, a depression-related ICD code could reflect
self-report by the patient of a history of depression and did not necessarily
imply clinical assessment or current depressive symptoms, treatment, or
management [Trinh et al. (15) suggest that using ICD codes as a proxy for a
diagnosis of depression is feasible with moderate accuracy].
To model the application in a medical setting and control for annual
patterns in depression, for each patient with depression, we randomly se-
lected another five patients without a history of depression who had at least
500 words in status updates preceding the same day as the first recorded
diagnosis of depression. This yielded a sample of 114 +5×114 =684 patients
who shared a total of 524,292 Facebook statuses in the included temporal
We excluded one patient from the sample for having less than 500
words after excluding unicode tokens (such as emojis), for a final sample of
683 patients.
Sample Description. Sample characteristics are shown in Table 2. Among all
683 patients, the mean age was 29.9 y (SD =8.57); most were female (76.7%)
and black (70.1%). Depressed patients were more likely to have posted more
words on Facebook (difference between medians =3,794 words; Wilcoxon
W=27,712; P=0.014) and be female [χ
(1, n=583) =7.18; P=0.007],
matching national trends in presentations to urban academic EDs (26,
38, 39).
Word and Phrase Extraction. We determined the relative frequency with
which users used words (unigrams) and two-word phrases (bigrams) by using
our open-source Python-based language analysis infrastructure (dlatk.wwbp.
org). We retained as variables the 5,381 words and phrases that were used
by at least 20% of the sample across their 524,292 Facebook statuses.
Topic Modeling. As the coherence of topics increases when modeled over a
larger number of statuses, we modeled 200 topics from the 949,530 Facebook
statuses of all patients who agreed to share their Facebook statuses by using
an implementation of LDA provided by the MALLET package (40). Akin to
factor analysis, LDA produces clusters of words that occur in the same con-
text across Facebook posts, yielding semantically coherent topics. It is
appropriate for the highly nonnormal frequency distributions observed in
language use. After modeling, we derived the use of 200 topics (200 values
per user) for every user in the sample, which summarize their language use.
Temporal Feature Extraction. We split the time of the day into six bins of 4 h in
length, and, for every user, calculated the fraction of statuses posted in each
of these bins. Similarly, we determined the fraction of posts made on each day
of the week.
Metafeature Extraction. For every user, we determined how many unigrams
were posted per year, the average length of the posts (in unigrams), and the
average length of unigrams.
Dictionary Extraction. LIWC 2015 (41) provides dictionaries (lists of words)
widely used in psychological research. We matched the extracted word
frequencies against these dictionaries to determine the usersrelative fre-
quency of use of the 73 LIWC dictionaries.
Prediction Models. We used machine learning to train predictive models using
the unigrams, bigrams, and 200 topics, using 10-fold cross-validation to avoid
overfitting (similar to ref. 42). In this cross-validation procedure, the data are
randomly partitioned into 10 stratified folds, keeping depressed users and
their five control userswithin the same fold. Logistic regression models
with a ridge penalty and their hyperparameters were fit within 9 folds and
evaluated across the remaining held-out fold. The procedure was repeated
10 times to estimate an out-of-sample probability of depression for every
patient. Varying the threshold of this probability for depression classification
uniquely determines a combination of true and false positive rates that form
the points of a ROC curve. We summarize overall prediction performance as
the area under this ROC curve (i.e., AUC), which is suitable for describing
classification accuracies over unbalanced classes.
Prediction in Advance of Documentation. We carried out the prediction as
outlined earlier but truncated the available language data to time windows
ranging from 06 mo before diagnosis (excluding the 24 h immediately
before diagnosis) to 17, 39, 915, 1521, 2127, and 2733 mo before the
first documentation of depression in the medical records. We truncated the
data analogously for control users. For this analysis, we limited the sample
to those with data in each of the seven time windows, specifically thresh-
olding at a total of 20 words total in each window. Because this lower
threshold results in less stable measures of language use, we employed
outlier removal, replacing feature observations that were more than 2
standard deviations from the mean with the features mean. This resulted in
307 patients (56 depressed) with the same users represented in each of the
time windows (average word counts for depressed and nondepressed users
in these windows are shown in SI Appendix,Fig.S2). AUCs were tested for
significance against the null distribution through permutation tests with
100,000 permutations.
Language Associations. To determine if a language feature (topic or LIWC
category) was associated with (future) depression status, we individually
tested it as a predictor in an in-sample linear regression model controlling for
demographic characteristics (binary variables for age quartile, ethnicity, and
gender), and report its standardized regression coefficient (β) with the as-
sociated significance. We explored language correlations separately by
gender but found that we had insufficient power to find language corre-
lations among male users in the sample.
Table 2. Sample Descriptives
Sample descriptive Depressed Nondepressed Pvalue
No. of subjects 114 569
Mean age (SD) 30.9 (8.1) 29.7 (8.65)
Female, % 86.8 74.7 **
Black, % 75.4 69.1
Mean word count (SD) 19,784 (27,736) 14,802 (21,789) *
Median word count 10,655 6,861 *
Differences in age and mean word count were tested for significance by
using ttests, percent female and black by using χ
tests with continuity
correction, and median word counts by using Wilcoxon rank-sum test with
continuity correction.
*P<0.05, **P<0.01.
We excluded 40 users with any Facebook language from the set of possible controls if
they did not have the aforementioned ICD codes but only depression-like diagnoses that
were not temporally limited, i.e., recurrent Depression (296.3) or Dysthymic Disorders
(300.4), Bipolar Disorders (296.4296.8), or Adjustment Disorders or Posttraumatic Stress
Disorder (309). We additionally excluded 36 patients from the possible control group if
they had been prescribed any antidepressant medications (i.e., selective serotonin reup-
take inhibitors) without having been given an included depression ICD code.
Eichstaedt et al. PNAS Latest Articles
Controlling for Multiple Comparisons. In addition to the customary signifi-
cance thresholds, we also report whether a given language feature meets a
P<0.05 significance threshold corrected with the BenjaminiHochberg
procedure (43) for multiple comparisons.
Data Sharing. Medical record outcomes and the linked social media data
are considered Protected Health Information and cannot be shared.
However, for the main language features (200 DLA topics and 73 LIWC
dictionaries), we are able to share mean levels and SDs for depressed
and nondepressed users (deposited in Open Science Framework, https://
ACKNOWLEDGMENTS. We thank anonymous Reviewer 1 for her or his
insightful suggestions. Support for this research was provided by a
Robert Wood Johnson Foundation Pioneer Award; Templeton Religion
Trust Grant TRT0048.
1. Demyttenaere K, et al.; WHO World Mental Health Survey Consortium (2004) Prev-
alence, severity, and unmet need for treatment of mental disorders in the world
health organization world mental health surveys. JAMA 291:25812590.
2. Kessler RC, et al.; National Comorbidity Survey Replication (2003) The epidemiology
of major depressive disorder: Results from the national comorbidity survey replication
(NCS-R). JAMA 289:30953105.
3. Wang PS, et al. (2005) Twelve-month use of mental health services in the United
States: Results from the national comorbidity survey replication. Arch Gen Psychiatry
4. Mathers CD, Loncar D (2006) Projections of global mortality and burden of disease
from 2002 to 2030. PLoS Med 3:e442.
5. OConnor EA, Whitlock EP, Beil TL, Gaynes BN (2009) Screening for depression in adult
patients in primary care settings: A systematic evidence review. Ann Intern Med 151:
6. De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via
social media. ICWSM 13:110.
7. Reece AG, et al. (2016) Forecasting the onset and course of mental illness with Twitter
data. Sci Rep 7:13006.
8. Schwartz HA, et al. (2014) Towards assessing changes in degree of depression through
Facebook. Proceedings of the Workshop on Computational Linguistics and Clinical
Psychology: From Linguistic Signal to Clinical Reality (Association for Computational
Linguistics, Stroudsburg, PA), pp 118125.
9. De Choudhury M, Counts S, Horvitz EJ, Hoff A (2014) Characterizing and predicting
postpartum depression from shared Facebook data. Proceedings of the 17th ACM
Conference on Computer Supported Cooperative Work & Social Computing (Associ-
ation for Computational Linguistics, Stroudsburg, PA), pp 626638.
10. Homan CM, et al. (2014) Toward Macro-insights for Suicide Prevention: Analyzing
Fine-grained Distress at Scale (Association for Computational Linguistics, Stroudsburg,
11. Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in
twitter. Proceedings of the Workshop on Computational Linguistics and Clinical
Psychology: From Linguistic Signal to Clinical Reality (Association for Computational
Linguistics, Stroudsburg, PA), pp 5160.
12. Coppersmith G, Dredze M, Harman C, Hollingshead K, Mitchell M (2015) CLPsych 2015
shared task: Depression and PTSD on Twitter. Proceedings of the 2nd Workshop on
Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical
Reality (Association for Computational Linguistics, Stroudsburg, PA), pp 3139.
13. Pedersen T (2015) Screening Twitter users for depression and PTSD with lexical de-
cision lists. Proceedings of the 2nd Workshop on Computational Linguistics and
Clinical Psychology: From Linguistic Signal to Clinical Reality (Association for Com-
putational Linguistics, Stroudsburg, PA), pp 4653.
14. Padrez KA, et al. (2015) Linking social media and medical record data: A study of
adults presenting to an academic, urban emergency department. BMJ Qual Saf 25:
15. Trinh NHT, et al. (2011) Using electronic medical records to determine the diagnosis of
clinical depression. Int J Med Inform 80:533540.
16. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:
17. Noyes K, Liu H, Lyness JM, Friedman B (2011) Medicare beneficiaries with depression:
Comparing diagnoses in claims data with the results of screening. Psychiatr Serv 62:
18. Preotiuc-Pietro D, et al. (2015) The role of personality, age and gender in tweeting
about mental illnesses. Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistics Signal to Clinical Reality (Associ-
ation for Computational Linguistics, Stroudsburg, PA), pp 2130.
19. Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psy-
chometric properties of LIWC2015 (University of Texas, Austin).
20. Edwards T, Holtzman NS (2017) Ameta-analysis of correlations between depression
and first person singular pronoun use. J Res Pers 68:6368.
21. Zimmermann J, Brockmeyer T, Hunn M, Schauenburg H, Wolf M (2017) First-person
pronoun use in spoken language as a predictor of future depressive symptoms:
Preliminary evidence from a clinical sample of depressed patients. Clin Psychol
Psychother 24:384391.
22. Tackman AM, et al. (2018) Depression, negative emotionality, and self-referential
language: A multi-lab, multi-measure, and multi-language-task research synthesis.
J Pers Soc Psychol, 10.1037/pspp0000187.
23. Sorg S, Vögele C, Furka N, Meyer AH (2012) Perseverative thinking in depression and
anxiety. Front Psychol 3:20.
24. Rush AJ; Agency for Health Care Policy and Research (1993) Depression in primary
care: Detection, diagnosis and treatment. Am Fam Physician 47:17761788.
25. Simon GE, VonKorff M, Piccinelli M, Fullerton C, Ormel J (1999) An international study
of the relation between somatic symptoms and depression. N Engl J Med 341:
26. Boudreaux ED, Cagande C, Kilgannon H, Kumar A, Camargo CA (2006) A prospective
study of depression among adult patients in an urban emergency department. Prim
Care Companion J Clin Psychiatry 8:6670.
27. Fan H, et al. (2014) Depression after heart failure and risk of cardiovascular and all-
cause mortality: A meta-analysis. Prev Med 63:3642.
28. Zheng D, et al. (1997) Major depression and all-cause mortality among white adults in
the United States. Ann Epidemiol 7:213218.
29. Perruche F, et al. (2010) Anxiety and depression are unrecognized in emergency pa-
tients admitted to the observation care unit. Emerg Med J 28:662665.
30. Mitchell AJ, Vaze A, Rao S (2009) Clinical diagnosis of depression in primary care: A
meta-analysis. Lancet 374:609619.
31. Saeb S, et al. (2015) Mobile phone sensor correlates of depressive symptom severity in
daily-life behavior: An exploratory study. J Med Internet Res 17:e175.
32. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural
networks and mapped binary patterns. Proceedings of the 2015 ACM on International
Conference on Multimodal Interaction (Association for Computational Linguistics,
Stroudsburg, PA), pp 503510.
33. Ringeval F, et al. (2017) AVEC 2017: Real-life depression, and affect recognition
workshop and challenge. Proceedings of the 7th Annual Workshop on Audio/Visual
Emotion Challenge (Association for Computing Machinery, New York), pp 39.
34. Gilbody S, Sheldon T, House A (2008) Screening and case-finding instruments for
depression: A meta-analysis. CMAJ 178:9971003.
35. Grande D, Mitra N, Shah A, Wan F, Asch DA (2013) Public preferences about sec-
ondary uses of electronic health information. JAMA Intern Med 173:17981806.
36. Foroushani PS, Schneider J, Assareh N (2011) Meta-review of the effectiveness of
computerised CBT in treating depression. BMC Psychiatry 11:131.
37. Newman MG, Szkodny LE, Llera SJ, Przeworski A (2011) A review of technology-
assisted self-help and minimal contact therapies for anxiety and depression: Is hu-
man contact necessary for therapeutic efficacy? Clin Psychol Rev 31:89103.
38. Rhodes KV, et al. (2001) Better health while you wait: A controlled trial of a
computer-based intervention for screening and health promotion in the emergency
department. Ann Emerg Med 37:284291.
39. Kumar A, Clark S, Boudreaux ED, Camargo CA, Jr (2004) A multicenter study of de-
pression among emergency department patients. Acad Emerg Med 11:12841289.
40. McCallum AK (2002) Mallet: A Machine Learning for Language Toolkit. mallet.cs.
41. Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psy-
chometric properties of LIWC2015 (University of Texas, Austin).
42. Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable
from digital records of human behavior. Proc Natl Acad Sci USA 110:58025805.
43. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and
powerful approach to multiple testing. J R Stat Soc B 57:289300.
| Eichstaedt et al.

Supplementary resource (1)

... This new proactive approach scans all users' posts for patterns of suicidal thoughts, and when indicated contacts their friends or local emergency services. A recent study showed that applying these methods enabled prediction of future depression with considerable accuracy up to three months before its overt onset (37). The language predictors of depression have been found to include certain emotional (sadness), interpersonal (loneliness, hostility) and cognitive processes (preoccupation with the self, rumination) (37). ...
... A recent study showed that applying these methods enabled prediction of future depression with considerable accuracy up to three months before its overt onset (37). The language predictors of depression have been found to include certain emotional (sadness), interpersonal (loneliness, hostility) and cognitive processes (preoccupation with the self, rumination) (37). ...
Currently, the world is entering the fourth industrial revolution - marked by artificial intelligence (AI) powered technologies. The growing ubiquity of AI technologies is already present in many sectors of modern society, but caution still prevails in medicine where their application is far from routine, although it is on the constant rise. Psychiatry has been recognized as one of the disciplines where significant contribution of AI technologies is expected for prediction, diagnosis, treatment and monitoring of persons with psychiatric disorders. Nearly half of the world's population live in countries that have fewer than one psychiatrist per 100 000 inhabitants, which is far below the health needs as the prevalence of psychiatric disorders is within the range of 10-20%. Thus, the question arises - whether AI technologies can help to fill the gap in unmet needs in psychiatry? The main types of autonomous technologies currently applied in psychiatry are machine learning and its subsets deep learning and computer vision, alongside natural language processing and chatbots. The present review will focus on the brief history of the concept, the utility of AI technologies in psychiatry, clinicians' attitudes, ethical dilemmas, clinical and scientific challenges. This review emphasizes that the psychiatric community should not be ignorant but could try to leave the comfort zone and do more to raise the awareness of AI technologies development achievements.
... Second, reduced LSM of affiliation and biological words was specific to depression symptoms. While again, the direction of mismatch is unknown, prior literature on language and depression would suggest that people reporting elevated depressive symptomatology might be more likely to discuss biological processes and less likely to discuss social ties when their conversational partners bring up these topics [27][28][29][30]. At the same time, it is important to consider potential gender effects in the association between LSM of affiliation words and depression, as there was also lower LSM of affiliation words among participants identifying as female. ...
Context: Impairment in social functioning is a feature and consequence of depression and anxiety disorders. For example, in depression, anhedonia and negative feelings about the self may impact relationships; in anxiety, fear of negative evaluation may interfere with getting close to others. It is unknown whether social impairment associated with depression and anxiety symptoms is reflected in day-to-day language exchanges with others, such as through reduced language style matching (LSM). Methods: Over 16 weeks, we collected text message data from 458 adults and evaluated differences in LSM between people with average scores above/below the clinical cutoff for depression, generalized anxiety, and social anxiety in text message conversations. Text message sentiment scores were computed across 73 Linguistic Inquiry and Word Count (LIWC) categories for each participant. T-tests were used to compare LSM across two groups (average scores above/below clinical cutoff) for each of the 3 diagnostic categories (depression, generalized anxiety, social anxiety), and each of the 73 LIWC categories, with correction for multiple comparisons. Results: We found reduced LSM of function words (namely, prepositions [t=-2.82, p=.032], articles [t=-5.26, p<.001], and auxiliary verbs [t=-2.64, p=.046]) in people with average scores above the clinical cutoff for generalized anxiety, and reduced LSM of prepositions (t=-4.26, p<.001) and articles (t=-3.39, p=.010) in people with average scores above the clinical cutoff for social anxiety. There were no significant differences in LSM of function words between people with average scores above and below the clinical cutoff for depression. Across all symptom categories, elevated affective psychopathology was associated with being more likely to style match on formality, including netspeak (generalized anxiety, t=5.77, p<.001; social anxiety, t=4.14, p<.001; depression, t=3.13, p=.021) and informal language (generalized anxiety, t=6.65, p<.001; social anxiety, t=5.14, p<.001; depression, t=3.20, p=.020).We also observed content-specific LSM differences across the three groups. Conclusions: Reduced LSM of function words among patients reporting elevated anxiety symptoms suggests that anxiety-related psychosocial difficulties may be perceptible in subtle cues from day-to-day language. Conversely, the absence of differences in the LSM of function words among people with average scores above and below the clinical cutoff for depression indicates a potentially distinct mechanism of social impairment. Implications: Results point to potential markers of psychosocial difficulties in daily conversations, particularly among those experiencing heightened anxiety symptoms. Future studies may consider the degree to which LSM is associated with self-reported psychosocial impairment, with the promise of informing cognitive-behavioral mechanisms and tailoring digital interventions for social skills.
... Novel biomarkers predicting pre-diabetes, depression, and postpartum depression were discovered via statistical analysis of Facebook data. 15,16 Manual analysis of brief, unstructured home videos on YouTube by non-clinical raters was able to detect and classify autism in children with a high performance, even outside of traditional clinical environments. 17 Similar to the Twitter analyses, YouTube audio, visual, and search-history data have successfully detected mental illnesses, including depression and OCD. ...
Full-text available
Social media data can boost artificial intelligence (AI) systems designed for clinical applications by expanding data sources that are otherwise limited in size. Currently, deep learning methods applied to large social media datasets are used for a variety of biomedical tasks, including forecasting the onset of mental illness and detecting outbreaks of new diseases. However, exploration of online data as a training component for diagnostics tools remains rare, despite the deluge of information that is available through various APIs. In this study, data from YouTube was used to train a model to detect the Omicron variants of SARS-CoV-2 from changes in the human voice. According to the ZOE Health Study, laryngitis and hoarse voice were among the most common symptoms of the Omicron variant, regardless of vaccination status. Omicron is characterized by pre-symptomatic transmission as well as mild or absent symptoms. Therefore, impactful screening methodologies may benefit from speed, convenience, and non-invasive ergonomics. We mined YouTube to collect voice data from individuals with self-declared positive COVID-19 tests during time periods where the Omicron variant (or sub-variants, including BA.4/5) consisted of more than 95% of cases. Our dataset contained 183 distinct Omicron samples (28.39 hours), 192 healthy samples (33.90 hours), 138 samples from other upper respiratory infections (8.09 hours), and 133 samples from non-Omicron variants of COVID-19 (22.84 hours). We used a flexible data collection protocol and implemented a simple augmentation strategy that leveraged intra-sample variance arising from the diversity of unscripted speech (different words, phrases, and tones). This approach led to enhanced model generalization despite a relatively small number of samples. We trained a DenseNet model to detect Omicron in subjects with self-declared positive COVID-19 tests. Our model achieved 86% sensitivity and 81% specificity when detecting healthy voices (asymptomatic negative vs. all positive). We also achieved 76% sensitivity and 70% specificity separating between symptomatic negative samples and all positive samples. This result showed that social media data may be used to counterbalance the limited amount of well-curated data commonly available for deep learning tasks in clinical medicine. Our work demonstrates the potential of digital, non-invasive diagnostic methods trained with public online data and explores novel design paradigms for diagnostic tools that rely on audio data.
Depression is a common mental illness that has to be detected and treated at an early stage to avoid serious consequences. There are many methods and modalities for detecting depression that involves physical examination of the individual. However, diagnosing mental health using their social media data is more effective as it avoids such physical examinations. Also, people express their emotions well in social media, it is desirable to diagnose their mental health using social media data. Though there are many existing systems that detects mental illness of a person by analysing their social media data, detecting the level of depression is also important for further treatment. Thus, in this research, we developed a gold standard data set that detects the levels of depression as ‘not depressed’, ‘moderately depressed’ and ‘severely depressed’ from the social media postings. Traditional learning algorithms were employed on this data set and an empirical analysis was presented in this paper. Data augmentation technique was applied to overcome the data imbalance. Among the several variations that are implemented, the model with Word2Vec vectorizer and Random Forest classifier on augmented data outperforms the other variations with a score of 0.877 for both accuracy and F1 measure.KeywordsDepressionData setData augmentationLevels of depressionRandom Forest
Full-text available
When combined with an appropriate level of human judgement, machine learning applications were crucial resources insupporting decision-making in the context of the Covid-19 crisis, resulting in more efficient and better-informed responses to ethicalissues. This paper focusses on four social dimensions (bioethical, political, psychological, and economic) from which the decisionstaken in the context of the Covid-19 crisis derived major ethical implications. On the one hand, I argue against the possibility ofaddressing these issues from a purely algorithmic approach, elaborating on two types of limitations for automated systems toaddress ethical issues. This leads me to discuss how different ethical situations call for different performance metrics with regards tothe 'contextual explicability and performance issue', as well as to enunciate a gold principle: 'legitimacy trumps accuracy'. On the otherhand, I present practical examples of machine learning applications which enhance, instead of dilute, human moral agency in betteraddressing these issues. I also suggest a 'moral perimeter' framework to ensure the responsibility of algorithms-assisted decisionmakersfor critical decisions. The unique potential of AI to 'solve' moral dilemmas by intervening on their conditions of possibility thenprompts me to discuss a new type of moral situation: AI-generated meta-dilemmas.
Full-text available
Zusammenfassung. Digitale Phänotypisierung stellt einen neuen, leistungsstarken Ansatz zur Realisierung psychodiagnostischer Aufgaben in vielen Bereichen der Psychologie und Medizin dar. Die Grundidee besteht aus der Nutzung digitaler Spuren aus dem Alltag, um deren Vorhersagekraft für verschiedenste Anwendungsmöglichkeiten zu überprüfen und zu nutzen. Voraussetzungen für eine erfolgreiche Umsetzung sind elaborierte Smart Sensing Ansätze sowie Big Data-basierte Extraktions- (Data Mining) und Machine Learning-basierte Analyseverfahren. Erste empirische Studien verdeutlichen das hohe Potential, aber auch die forschungsmethodischen sowie ethischen und rechtlichen Herausforderungen, um über korrelative Zufallsbefunde hinaus belastbare Befunde zu gewinnen. Hierbei müssen rechtliche und ethische Richtlinien sicherstellen, dass die Erkenntnisse in einer für Einzelne und die Gesellschaft als Ganzes wünschenswerten Weise genutzt werden. Für die Psychologie als Lehr- und Forschungsdomäne bieten sich durch Digitale Phänotypisierung vielfältige Möglichkeiten, die zum einen eine gelebte Zusammenarbeit verschiedener Fachbereiche und zum anderen auch curriculare Erweiterungen erfordern. Die vorliegende narrative Übersicht bietet eine theoretische, nicht-technische Einführung in das Forschungsfeld der Digitalen Phänotypisierung, mit ersten empirischen Befunden sowie einer Diskussion der Möglichkeiten und Grenzen sowie notwendigen Handlungsfeldern.
Mental disorders are an important public health issue. Computational methods have the potential to aid with the detection of risky behaviors online, through extracting information from social media in order to detect users at risk of developing mental disorders. In this domain, understanding the behavior of the computational models used is crucial. Exploring the explainability of neural network models for mental disorder detection can make their decisions more reliable and easier to trust, and can help identify specific patterns in the data which are indicative of mental disorders. We present our attempts at predicting mental disorders in social media. We focus on depression detection and build models to detect users suffering from depression in multiple social media datasets, and additionally include experiments attempting to detect post-traumatic stress disorder (PTSD) users from similar social media data. We use approaches that combine deep learning with linguistic features including the emotions expressed and stylometry features. We continue with an in-depth analysis of the results and findings through different explainability techniques used to interpret the deep learning model’s behavior, including feature analysis, hidden layer analysis and ablation studies. We include psychological interpretations for all our findings. We additionally collect a new multimodal dataset: multiRedditDep, consisting of social media data including texts and images posted by depressed users, and present our attempts at automatically discovering signs of depression in a multimodal setting. We include an in-depth analysis of the content of images posted by depressed users based on a semantic taxonomy of objects depicted in the images. We finalize by proposing directions for the future including approaches for depression detection based on conversational data.
Depression and anxiety are now the 1st and 10th leading causes of disability worldwide. However, their variegated presentation and symptoms complicate efforts to develop a better understanding of the complex factors that shape the dynamics of their development within individuals. The development of personalized detection, diagnostics, and treatment options has been hindered by the lack of within-subject longitudinal observations at high temporal resolution and for large samples of individuals across the spectrum of internalizing disorders. Here, we discuss our efforts in the burgeoning field of precision mental health which leverages large-scale data of the behavioral, cognitive, emotional and social traces that billions of individuals leave behind when they interact with social media platforms. Our results point towards the possibility of modeling individualized mental health trajectories at population scale to identify high-precision targets for the detection, intervention, and mitigation of internalizing disorders.
This chapter describes the participation of the RELAI team in the eRisk 2020 second task. The 2020 edition of eRisk proposed two tasks: (T1) Early assessment of the risk of self-harm and (T2) Signs of depression in social media users. The second task focused on automatically filling a depression questionnaire given the user’s writing history. The RELAI team addressed it using Latent Dirichlet Allocation (LDA), a topic modelling algorithm, and an approach based on writing styles. The proposed system based on LDA performed well according to all the evaluation metrics. According to the Average Difference between Overall Depression Levels (ADODL), it achieved the best performance among participants with a score of 83.15%. Overall, the submitted systems achieved promising results and suggested that social media evidence could help early mental health risk assessment.
Full-text available
Depressive symptomatology is manifested in greater first-person singular pronoun use (i.e., I-talk), but when and for whom this effect is most apparent, and the extent to which it is specific to depression or part of a broader association between negative emotionality and I-talk, remains unclear. Using pooled data from N = 4,754 participants from 6 labs across 2 countries, we examined, in a preregistered analysis, how the depression–I-talk effect varied by (a) first-person singular pronoun type (i.e., subjective, objective, and possessive), (b) the communication context in which language was generated (i.e., personal, momentary thought, identity-related, and impersonal), and (c) gender. Overall, there was a small but reliable positive correlation between depression and I-talk (r = .10, 95% CI [.07, .13]). The effect was present for all first-person singular pronouns except the possessive type, in all communication contexts except the impersonal one, and for both females and males with little evidence of gender differences. Importantly, a similar pattern of results emerged for negative emotionality. Further, the depression–I-talk effect was substantially reduced when controlled for negative emotionality but this was not the case when the negative emotionality–I-talk effect was controlled for depression. These results suggest that the robust empirical link between depression and I-talk largely reflects a broader association between negative emotionality and I-talk. Self-referential language using first-person singular pronouns may therefore be better construed as a linguistic marker of general distress proneness or negative emotionality rather than as a specific marker of depression.
Conference Paper
Full-text available
The Audio/Visual Emotion Challenge and Workshop (AVEC 2017) "Real-life depression, and affect" will be the seventh competition event aimed at comparison of multimedia processing and machine learning methods for automatic audiovisual depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the depression and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of the various approaches to depression and emotion recognition from real-life data. This paper presents the novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline system on the two proposed tasks: dimensional emotion recognition (time and value-continuous), and dimensional depression estimation (value-continuous).
Full-text available
We developed computational models to predict the emergence of depression and Post-Traumatic Stress Disorder in Twitter users. Twitter data and details of depression history were collected from 204 individuals (105 depressed, 99 healthy). We extracted predictive features measuring affect, linguistic style, and context from participant tweets (N=279,951) and built models using these features with supervised learning algorithms. Resulting models successfully discriminated between depressed and healthy content, and compared favorably to general practitioners' average success rates in diagnosing depression. Results held even when the analysis was restricted to content posted before first depression diagnosis. State-space temporal analysis suggests that onset of depression may be detectable from Twitter data several months prior to diagnosis. Predictive results were replicated with a separate sample of individuals diagnosed with PTSD (174 users, 243,775 tweets). A state-space time series model revealed indicators of PTSD almost immediately post-trauma, often many months prior to clinical diagnosis. These methods suggest a data-driven, predictive approach for early screening and detection of mental illness.
Conference Paper
Full-text available
Suicide is a leading cause of death in the United States. One of the major challenges to suicide prevention is that those who may be most at risk cannot be relied upon to report their conditions to clinicians. This paper takes an initial step toward the automatic detection of suicidal risk factors through social media activity, with no reliance on self-reporting. We consider the performance of annotators with various degrees of expertise in suicide prevention at annotating microblog data for the purpose of training text-based models for detecting suicide risk behaviors. Consistent with crowdsourcing literature, we found that novice-novice annotator pairs underperform expert annotators and outperform automatic lexical analysis tools, such as Linguistic Inquiry and Word Count.
Depression is a burden. We discuss how theories, identification, assessment, and treatment of depression are at least partially tied to the correlation between first person singular pronoun use and individual differences in depression. We conducted a meta-analysis (k = 21, N = 3,758) of these correlations, including numerous unpublished correlations from the file drawer. Our fixed effects analysis revealed a small correlation (r = .13, 95% CI = [.10 to .16]) by modern standards. The correlation was not moderated by gender, nor by whether the effect had been published. These results more firmly establish first person singular pronoun use as a linguistic marker of depression—a marker that appears to be useful across demographic lines.
Conference Paper
We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).