About
501
Publications
200,134
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
26,152
Citations
Publications
Publications (501)
Self-reported rating scales have been central to social science for decades, even though language is our primary form of communication. We used language analysis with machine learning to compare self-reported ratings with language-based responses of experienced well-being (i.e., daily emotions) linked with human traits, states, and behaviors. In a...
People now commonly interact with Artificial Intelligence (AI) agents. How do these interactions shape how humans perceive each other? In two pre-registered studies (total N = 1,261), we show that people evaluate other humans more harshly after interacting with an AI (compared with an unrelated purported human). In Study 1, participants who worked...
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks, such as data annotation, synthetic data creation, and engaging in dialog. However, these tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences. Thus, employing LLMs (which do...
Trust is predictive of civic cooperation and economic growth. Recently, the U.S. public has demonstrated increased partisan division and a surveyed decline in trust in institutions. There is a need to quantify individual and community levels of trust unobtrusively and at scale. Using observations of language across more than 16,000 Facebook users,...
Treatment decisions for patients receiving invasive mechanical ventilation (IMV) are complex and depend simultaneously on the current ventilator settings, the function of multiple interrelated organ systems, and other treatments. An artificial intelligence (AI)-based clinical decision support system (CDSS) offers a promising approach to alleviate u...
The COVID pandemic placed a spotlight on alcohol use and the hardships of working within the food and beverage industry, with millions left jobless. Following previous studies that have found elevated rates of alcohol problems among bartenders and servers, here we studied the alcohol use of bartenders and servers who were employed during COVID. Fro...
Large language models (LLMs) such as Open AI’s GPT-4 (which power ChatGPT) and Google’s Gemini, built on artificial intelligence, hold immense potential to support, augment, or even eventually automate psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient m...
Depression has robust natural language correlates and can increasingly be measured in language using predictive models. However, despite evidence that language use varies as a function of individual demographic features (e.g., age, gender), previous work has not systematically examined whether and how depression’s association with language varies b...
Background
Sensors within smartphones, such as accelerometer and location, can describe longitudinal markers of behavior as represented through devices in a method called digital phenotyping. This study aimed to assess the feasibility of digital phenotyping for patients with alcohol-associated liver disease and alcohol use disorder, determine corre...
Depression has robust natural language correlates, and can increasingly be measured in language using predictive models. However, despite evidence that language use varies as a function of individual demographic features (e.g., age, gender), previous work has not systematically examined whether and how depression’s association with language varies...
Full national coverage below the state level is difficult to attain through survey-based data collection. Even the largest survey-based data collections, such as the CDC’s Behavioral Risk Factor Surveillance System or the Gallup-Healthways Well-being Index (both with more than 300,000 responses p.a.) only allow for the estimation of annual averages...
Background
Prior literature links passively sensed information about a person's location, movement, and communication with social anxiety. These findings hold promise for identifying novel treatment targets, informing clinical care, and personalizing digital mental health interventions. However, social anxiety symptoms are heterogeneous; to identif...
We used natural language processing to analyze a billion words to study cultural differences on Weibo, one of China's largest social media platforms. We compared predictions from two common explanations about cultural differences in China (economic development and urban-rural differences) against the less-obvious legacy of rice versus wheat farming...
There is growing scientific excitement about detecting depression from people’s language use, but this work rarely accounts for anxiety, which overlaps substantially and co-occurs frequently with depression. Using clinical interviews with individuals with varying levels of depression and anxiety, we found that some language patterns are shared by t...
The current classification of anxiety is cumbersome; does not align with evidence that anxiety problems cut across disorder categories; and fails to acknowledge that severity of anxiety matters, even at low levels. We developed a new classification that distills key features of anxiety – intensity, avoidance, pervasiveness, and onset – across disor...
Wellbeing is predominantly measured through self-reports, which is time-consuming and costly. It can also be measured by automatically analysing language expressed on social media platforms, through social media text mining (SMTM). We present a systematic review based on 45 studies, and a meta-analysis of 32 convergent validities from 18 studies re...
We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens: conversational agents express a diversity of both states (short-term factors like emotions) and traits (longer-term factors like personality) just as people do. These interpretable metrics consist of five measures from established psychology constructs...
Wellbeing is predominantly measured through surveys but is increasingly measured by analysing individuals' language on social media platforms using social media text mining (SMTM). To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks derived from survey items and social media languag...
Background:
Relatively little is known about how communication changes as a function of depression severity and interpersonal closeness. We examined the linguistic features of outgoing text messages among individuals with depression and their close- and non-close contacts.
Methods:
419 participants were included in this 16-week-long observationa...
Depression is known to have heterogeneous symptom manifestations. Investigating various symptoms of depression is essential to understanding underlying mechanisms and personalizing treatments. Reddit, an online peer-to-peer social media platform, contains varied communities (subreddits) where individuals discuss their detailed mental health experie...
Large language models (LLMs) such as ChatGPT and GPT-3/4, built on artificial intelligence, hold immense potential to support, augment, or even replace psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individ...
Targeting of location-specific aid for the U.S. opioid epidemic is difficult due to our inability to accurately predict changes in opioid mortality across heterogeneous communities. AI-based language analyses, having recently shown promise in cross-sectional (between-community) well-being assessments, may offer a way to more accurately longitudinal...
How well can social scientists predict societal change, and what processes
underlie their predictions? To answer these questions, we ran two
forecasting tournaments testing the accuracy of predictions of societal
change in domains commonly studied in the social sciences: ideological
preferences, political polarization, life satisfaction, sentiment...
Objective
Language patterns may elucidate mechanisms of mental health conditions. To inform underlying theory and risk models, we evaluated prospective associations between in vivo text messaging language and differential symptoms of depression, generalized anxiety, and social anxiety.
Methods
Over 16 weeks, we collected outgoing text messages fro...
We study the language differentially associated with loneliness and depression using 3.4-million Facebook posts from 2986 individuals, and uncover the statistical associations of survey-based depression and loneliness with both dictionary-based (Linguistic Inquiry Word Count 2015) and open-vocabulary linguistic features (words, phrases, and topics)...
Context:
Impairment in social functioning is a feature and consequence of depression and anxiety disorders. For example, in depression, anhedonia and negative feelings about the self may impact relationships; in anxiety, fear of negative evaluation may interfere with getting close to others. It is unknown whether social impairment associated with...
How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on s...
Background: Wellbeing is predominantly measured through surveys. Alternatively, wellbeing can be measured by analysing individuals’ language on social media platforms using social media language text mining (SMTM). Methods: To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks based o...
In a recent longitudinal study of U.S. adolescents, grit predicted rank-order increases in growth mindset and, to a lesser degree, growth mindset predicted rank-order increases in grit. The current investigation replicated and extended these findings in a younger non-Western, educated, industrialized, rich, and democratic (non-WEIRD) population. Tw...
Background: Early indicators of who will remain in – or leave – treatment for substance use disorder (SUD) can drive targeted interventions to support long-term recovery.
Objectives: To conduct a comprehensive study of linguistic markers of SUD treatment outcomes, the current study integrated features produced by machine learning models known to ha...
Mapping individual differences in behavior is fundamental to personalized neuroscience, but quantifying complex behavior in real world settings remains a challenge. While mobility patterns captured by smartphones have increasingly been linked to a range of psychiatric symptoms, existing research has not specifically examined whether individuals hav...
Modelling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. Using Weibo posts from over 65,000 users across 329 counties, we build hierarchical mixed-effects models...
Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population --- a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where obse...
Empathy is a cognitive and emotional reaction to an observed situation of others. Empathy has recently attracted interest because it has numerous applications in psychology and AI, but it is unclear how different forms of empathy (e.g., self-report vs counterpart other-report, concern vs. distress) interact with other affective phenomena or demogra...
Wellbeing is an important concept that concerns researchers, policy makers, and the broader general public. The measurement of individuals’ wellbeing levels has predominantly been done through self-reports (e.g., survey questionnaires), which is time-consuming for respondents and costly. Alternatively, wellbeing can be measured in real-time by auto...
Background
Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking.
Methods
Using data collected on 3664 r...
Background
Digital technology, the internet and social media are increasingly investigated as a promising means for monitoring symptoms and delivering mental health treatment. These apps and interventions have demonstrated preliminary acceptability and feasibility, but previous reports suggests that access to technology may still be limited among i...
Background
The quality of care in labor and delivery is traditionally measured through the Hospital Consumer Assessment of Healthcare Providers and Systems but less is known about the experiences of care reported by patients and caregivers on online sites that are more easily accessed by the public.
Objective
The aim of this study was to generate...
An understanding of healthcare super-utilizers’ online behaviors could better identify experiences to inform interventions. In this retrospective case-control study, we analyzed patients’ social media posts to better understand their day-to-day behaviors and emotions expressed online. Patients included those receiving care in an urban academic emer...
Objective
Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings.
Materials and Methods
We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the English Wikipedia corpus, the Me...
Background
Personal sensing has shown promise for detecting behavioral correlates of depression, but there is little work examining personal sensing of cognitive and affective states. Digital language, particularly through personal text messages, is one source that can measure these markers.
Methods
We correlated privacy-preserving sentiment analy...
Supplementary analyses mentioned in Atanasov et al. (2020).
Modeling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. This paper studies linguistic differences in the experiences and expressions of stress in urban-rural Chi...
Significance
On May 25, 2020, George Floyd, an unarmed Black American male, was murdered by a White police officer in Minneapolis. Footage of his death was widely shared and caused widespread protests. Using data from Gallup before and after his death, we found an unprecedented level of anger and sadness in the population, particularly among Black...
Background
Online platforms are used to manage aspects of our lives including health outside clinical settings. Little is known about the effectiveness of using online platforms to manage hypertension. We assessed effects of tweeting/retweeting cardiovascular health content by individuals with poorly controlled hypertension on systolic blood pressu...
Background
The assessment of behaviors related to mental health typically relies on self-report data. Networked sensors embedded in smartphones can measure some behaviors objectively and continuously, with no ongoing effort.
Objective
This study aims to evaluate whether changes in phone sensor–derived behavioral features were associated with subseq...
Objective
We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment.
Method
We applied a...
The content of images users post to their social media is driven in part by personality. In this study, we analyze how Twitter profile images vary with the personality of the users posting them. In our main analysis, we use profile images from over 66,000 users whose personality we estimate based on their tweets. To facilitate interpretability, we...
Background
Machine learning (ML) has garnered increasing attention as a means to quantitatively analyze the growing and complex medical data to improve individualized patient care. We herein aim to critically examine the current state of ML in predicting surgical outcomes, evaluate the quality of currently available research, and propose areas of i...
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but thes...
Background
Recommendations for promoting mental health during the COVID-19 pandemic include maintaining social contact, through virtual rather than physical contact, moderating substance/alcohol use, and limiting news and media exposure. We seek to understand if these pandemic-related behaviors impact subsequent mental health.
Methods
Daily online...
BACKGROUND
Digital technology, the internet and social media are increasingly investigated as a promising means for monitoring symptoms and delivering mental health treatment. These apps and interventions have demonstrated preliminary acceptability and feasibility, but previous reports suggests that access to technology may still be limited among i...
Religion and spirituality are multidimensional constructs including practices, rituals, and experiences, though they are often treated solely in terms of belief. In this study (N = 2,389), we investigate dimensions examined in previous linguistic analysis studies—religious affiliation and experiences of unity—and new dimensions: religious services,...
Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across cou...
BACKGROUND
The quality of care in labor and delivery is traditionally measured through the Hospital Consumer Assessment of Healthcare Providers and Systems but less is known about the experiences of care reported by patients and caregivers on online sites that are more easily accessed by the public.
OBJECTIVE
The aim of this study was to generate...
Estimating causality from observational data is essential in many data science questions but can be a challenging task. Here we review approaches to causality that are popular in econometrics and that exploit (quasi) random variation in existing data, called quasi-experiments, and show how they can be combined with machine learning to answer causal...
UNSTRUCTURED
As of December 2020, the SARS-CoV-2 virus has been responsible for over 78 million cases of COVID-19 worldwide, resulting in over 1.7 million deaths. In the United States in particular, protective measures against the COVID-19 pandemic have been hampered by political polarization and discrepancies among federal, state, and local polici...
Unstructured:
By March 2021, the SARS-CoV-2 virus has been responsible for over 115 million cases of COVID-19 worldwide, resulting in over 2.5 million deaths. As the virus grew exponentially, so did its media coverage, resulting in a proliferation of conflicting information on social media platforms - a so-called "infodemic." In this mixed scoping...
Users’ information-seeking and information-sharing behavior provide socioeconomic and psychological insights that are useful to understand regional trends in health. We study the spatial variations in aggregate Google Search and Twitter trends across 208 Designated Market Areas (DMAs) in the United States and their association with regional health....
Psychologists routinely assess people's emotions and traits, such as their personality, by collecting their responses to survey questionnaires. Such assessments can be costly in terms of both time and money, and often lack generalizability, as existing data cannot be used to predict responses for new survey questions or participants. In this study,...
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but thes...
In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COV...
Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across American English and Mandarin Chinese. First, we annotate 5,300 Twier posts from the United States (US) and 5,300 Sina Weibo posts from China fo...
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limi