Lyle H. Ungar

Lyle H. Ungar
University of Pennsylvania | UP · Department of Computer and Information Science

About

478
Publications
159,320
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
22,363
Citations
Citations since 2017
188 Research Items
13406 Citations
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500
201720182019202020212022202305001,0001,5002,0002,500

Publications

Publications (478)
Article
Full-text available
Wellbeing is predominantly measured through self-reports, which is time-consuming and costly. It can also be measured by automatically analysing language expressed on social media platforms, through social media text mining (SMTM). We present a systematic review based on 45 studies, and a meta-analysis of 32 convergent validities from 18 studies re...
Preprint
Full-text available
We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens: conversational agents express a diversity of both states (short-term factors like emotions) and traits (longer-term factors like personality) just as people do. These interpretable metrics consist of five measures from established psychology constructs...
Article
Full-text available
Wellbeing is predominantly measured through surveys but is increasingly measured by analysing individuals' language on social media platforms using social media text mining (SMTM). To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks derived from survey items and social media languag...
Article
Background: Relatively little is known about how communication changes as a function of depression severity and interpersonal closeness. We examined the linguistic features of outgoing text messages among individuals with depression and their close- and non-close contacts. Methods: 419 participants were included in this 16-week-long observationa...
Conference Paper
Full-text available
Depression is known to have heterogeneous symptom manifestations. Investigating various symptoms of depression is essential to understanding underlying mechanisms and personalizing treatments. Reddit, an online peer-to-peer social media platform, contains varied communities (subreddits) where individuals discuss their detailed mental health experie...
Preprint
Large language models (LLMs) such as ChatGPT and GPT-3/4, built on artificial intelligence, hold immense potential to support, augment, or even replace psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individ...
Article
Full-text available
Targeting of location-specific aid for the U.S. opioid epidemic is difficult due to our inability to accurately predict changes in opioid mortality across heterogeneous communities. AI-based language analyses, having recently shown promise in cross-sectional (between-community) well-being assessments, may offer a way to more accurately longitudinal...
Article
Full-text available
How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing the accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment...
Article
Full-text available
Objective Language patterns may elucidate mechanisms of mental health conditions. To inform underlying theory and risk models, we evaluated prospective associations between in vivo text messaging language and differential symptoms of depression, generalized anxiety, and social anxiety. Methods Over 16 weeks, we collected outgoing text messages fro...
Article
Full-text available
We study the language differentially associated with loneliness and depression using 3.4-million Facebook posts from 2986 individuals, and uncover the statistical associations of survey-based depression and loneliness with both dictionary-based (Linguistic Inquiry Word Count 2015) and open-vocabulary linguistic features (words, phrases, and topics)...
Article
Context: Impairment in social functioning is a feature and consequence of depression and anxiety disorders. For example, in depression, anhedonia and negative feelings about the self may impact relationships; in anxiety, fear of negative evaluation may interfere with getting close to others. It is unknown whether social impairment associated with...
Preprint
Full-text available
How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on s...
Preprint
Background: Wellbeing is predominantly measured through surveys. Alternatively, wellbeing can be measured by analysing individuals’ language on social media platforms using social media language text mining (SMTM). Methods: To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks based o...
Article
In a recent longitudinal study of U.S. adolescents, grit predicted rank-order increases in growth mindset and, to a lesser degree, growth mindset predicted rank-order increases in grit. The current investigation replicated and extended these findings in a younger non-Western, educated, industrialized, rich, and democratic (non-WEIRD) population. Tw...
Article
Full-text available
Background: Early indicators of who will remain in – or leave – treatment for substance use disorder (SUD) can drive targeted interventions to support long-term recovery. Objectives: To conduct a comprehensive study of linguistic markers of SUD treatment outcomes, the current study integrated features produced by machine learning models known to ha...
Article
Mapping individual differences in behavior is fundamental to personalized neuroscience, but quantifying complex behavior in real world settings remains a challenge. While mobility patterns captured by smartphones have increasingly been linked to a range of psychiatric symptoms, existing research has not specifically examined whether individuals hav...
Article
Full-text available
Modelling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. Using Weibo posts from over 65,000 users across 329 counties, we build hierarchical mixed-effects models...
Article
Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population --- a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where obse...
Preprint
Full-text available
Empathy is a cognitive and emotional reaction to an observed situation of others. Empathy has recently attracted interest because it has numerous applications in psychology and AI, but it is unclear how different forms of empathy (e.g., self-report vs counterpart other-report, concern vs. distress) interact with other affective phenomena or demogra...
Preprint
Full-text available
Wellbeing is an important concept that concerns researchers, policy makers, and the broader general public. The measurement of individuals’ wellbeing levels has predominantly been done through self-reports (e.g., survey questionnaires), which is time-consuming for respondents and costly. Alternatively, wellbeing can be measured in real-time by auto...
Article
Background: Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. Methods: Using data collected on 36...
Preprint
Full-text available
Background Digital technology, the internet and social media are increasingly investigated as a promising means for monitoring symptoms and delivering mental health treatment. These apps and interventions have demonstrated preliminary acceptability and feasibility, but previous reports suggests that access to technology may still be limited among i...
Article
Full-text available
Background The quality of care in labor and delivery is traditionally measured through the Hospital Consumer Assessment of Healthcare Providers and Systems but less is known about the experiences of care reported by patients and caregivers on online sites that are more easily accessed by the public. Objective The aim of this study was to generate...
Article
Full-text available
An understanding of healthcare super-utilizers’ online behaviors could better identify experiences to inform interventions. In this retrospective case-control study, we analyzed patients’ social media posts to better understand their day-to-day behaviors and emotions expressed online. Patients included those receiving care in an urban academic emer...
Article
Objective Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings. Materials and Methods We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the English Wikipedia corpus, the Me...
Article
Background Personal sensing has shown promise for detecting behavioral correlates of depression, but there is little work examining personal sensing of cognitive and affective states. Digital language, particularly through personal text messages, is one source that can measure these markers. Methods We correlated privacy-preserving sentiment analy...
Data
Supplementary analyses mentioned in Atanasov et al. (2020).
Preprint
Full-text available
Modeling differential stress expressions in urban and rural regions in China can provide a better understanding of the effects of urbanization on psychological well-being in a country that has rapidly grown economically in the last two decades. This paper studies linguistic differences in the experiences and expressions of stress in urban-rural Chi...
Article
Full-text available
Significance On May 25, 2020, George Floyd, an unarmed Black American male, was murdered by a White police officer in Minneapolis. Footage of his death was widely shared and caused widespread protests. Using data from Gallup before and after his death, we found an unprecedented level of anger and sadness in the population, particularly among Black...
Article
Full-text available
Background Online platforms are used to manage aspects of our lives including health outside clinical settings. Little is known about the effectiveness of using online platforms to manage hypertension. We assessed effects of tweeting/retweeting cardiovascular health content by individuals with poorly controlled hypertension on systolic blood pressu...
Article
Full-text available
Background: The assessment of behaviors related to mental health typically relies on self-report data. Networked sensors embedded in smartphones can measure some behaviors objectively and continuously, with no ongoing effort. Objective: This study aims to evaluate whether changes in phone sensor-derived behavioral features were associated with s...
Article
Full-text available
Objective We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. Method We applied a...
Article
The content of images users post to their social media is driven in part by personality. In this study, we analyze how Twitter profile images vary with the personality of the users posting them. In our main analysis, we use profile images from over 66,000 users whose personality we estimate based on their tweets. To facilitate interpretability, we...
Article
Background Machine learning (ML) has garnered increasing attention as a means to quantitatively analyze the growing and complex medical data to improve individualized patient care. We herein aim to critically examine the current state of ML in predicting surgical outcomes, evaluate the quality of currently available research, and propose areas of i...
Article
Full-text available
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but thes...
Article
Background Recommendations for promoting mental health during the COVID-19 pandemic include maintaining social contact, through virtual rather than physical contact, moderating substance/alcohol use, and limiting news and media exposure. We seek to understand if these pandemic-related behaviors impact subsequent mental health. Methods Daily online...
Preprint
BACKGROUND Digital technology, the internet and social media are increasingly investigated as a promising means for monitoring symptoms and delivering mental health treatment. These apps and interventions have demonstrated preliminary acceptability and feasibility, but previous reports suggests that access to technology may still be limited among i...
Article
Full-text available
Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across cou...
Preprint
BACKGROUND The quality of care in labor and delivery is traditionally measured through the Hospital Consumer Assessment of Healthcare Providers and Systems but less is known about the experiences of care reported by patients and caregivers on online sites that are more easily accessed by the public. OBJECTIVE The aim of this study was to generate...
Article
Estimating causality from observational data is essential in many data science questions but can be a challenging task. Here we review approaches to causality that are popular in econometrics and that exploit (quasi) random variation in existing data, called quasi-experiments, and show how they can be combined with machine learning to answer causal...
Preprint
UNSTRUCTURED As of December 2020, the SARS-CoV-2 virus has been responsible for over 78 million cases of COVID-19 worldwide, resulting in over 1.7 million deaths. In the United States in particular, protective measures against the COVID-19 pandemic have been hampered by political polarization and discrepancies among federal, state, and local polici...
Article
Full-text available
Unstructured: By March 2021, the SARS-CoV-2 virus has been responsible for over 115 million cases of COVID-19 worldwide, resulting in over 2.5 million deaths. As the virus grew exponentially, so did its media coverage, resulting in a proliferation of conflicting information on social media platforms - a so-called "infodemic." In this mixed scoping...
Article
Users’ information-seeking and information-sharing behavior provide socioeconomic and psychological insights that are useful to understand regional trends in health. We study the spatial variations in aggregate Google Search and Twitter trends across 208 Designated Market Areas (DMAs) in the United States and their association with regional health....
Conference Paper
Full-text available
Psychologists routinely assess people's emotions and traits, such as their personality, by collecting their responses to survey questionnaires. Such assessments can be costly in terms of both time and money, and often lack generalizability, as existing data cannot be used to predict responses for new survey questions or participants. In this study,...
Preprint
Full-text available
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but thes...
Preprint
Full-text available
In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COV...
Article
Full-text available
Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across American English and Mandarin Chinese. First, we annotate 5,300 TwiŠer posts from the United States (US) and 5,300 Sina Weibo posts from China fo...
Preprint
Full-text available
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties. Inspired by geolocation research, we propose the novel task of Micro-Dialect Identification (MDI) and introduce MARBERT, a new language model with striking abilities to predic...
Article
Full-text available
We sought to evaluate whether there was variability in language used on social media across different time points of pregnancy (before, during, and after pregnancy, as well as by trimester and parity). Consenting patients shared access to their individual Facebook posts and electronic medical records. Random forest models trained on Facebook posts...
Preprint
BACKGROUND Current Atherosclerotic cardiovascular disease (ASCVD) predictive models have limitations, efforts are underway to improve the discriminatory power of ASCVD models. OBJECTIVE We sought to evaluate the discriminatory power of using social media posts to predict 10-year risk for ASCVD as compared to the pooled cohort risk equations (PCEs)...
Article
Full-text available
Background Current atherosclerotic cardiovascular disease (ASCVD) predictive models have limitations; thus, efforts are underway to improve the discriminatory power of ASCVD models. Objective We sought to evaluate the discriminatory power of social media posts to predict the 10-year risk for ASCVD as compared to that of pooled cohort risk equation...
Article
Full-text available
Objectives: This study aimed to determine whether words used in medical school admissions essays can predict physician empathy. Methods: A computational form of linguistic analysis was used for the content analysis of medical school admissions essays. Words in medical school admissions essays were computationally grouped into 20 'topics' which w...
Preprint
Full-text available
Laboratory research has shown that both underreaction and overreaction to new information pose threats to forecasting accuracy. This article explores how real-world forecasters who vary in skill attempt to balance these threats. We distinguish among three aspects of updating: frequency, magnitude, and confirmation propensity. Drawing on data from a...
Article
Full-text available
Laboratory research has shown that both underreaction and overreaction to new information pose threats to forecasting accuracy. This article explores how real-world forecasters who vary in skill attempt to balance these threats. We distinguish among three aspects of updating: frequency, magnitude, and confirmation propensity. Drawing on data from a...
Article
To understand rural–urban differences in stressors, this study compared the cognitive and emotional language in geolocated Twitter posts in the United States against survey-reported county-level trends from the Gallup-Sharecare Well-Being Index. Mentions of stress on Twitter can predict population-level trends in stress in both rural (R²=31.6%) and...
Preprint
Full-text available
Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across US English and Mandarin Chinese. First, we annotate 5,300 Twitter posts from the US and 5,300 Sina Weibo posts from China for politeness scores....
Conference Paper
Full-text available
This overview describes the official results of the CL-Aff Shared Task 2020 – #OffMyChest. The dataset comprised a semi-supervised classification task, and an open-ended knowledge modeling task on a dataset of Reddit comments with annotations crowdsourced from Amazon Mechanical Turk. The Shared Task was organized as a part of the 3 rd Workshop on A...
Preprint
BACKGROUND The assessment of behaviors related to mental health typically relies on self-report data. Networked sensors embedded in smartphones can measure some behaviors objectively and continuously, with no ongoing effort. OBJECTIVE This study aims to evaluate whether changes in phone sensor–derived behavioral features were associated with subse...
Article
A rapidly growing literature has attempted to explain Donald Trump's success in the 2016 U.S. presidential election as a result of a wide variety of differences in individual characteristics, attitudes, and social processes. We propose that the economic and psychological processes previously established have in common that they generated or elector...