Johan Bollen’s research while affiliated with Indiana University Bloomington and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (175)


Figure 1: Histogram of classifier results on GPTregenerated text. Figure 2: Confusion matrix for the classifier.
Figure 3: The distribution difference of PELT thresholds for the segmented dataset.
Figure 4: The original data. Observe length as a confounding variable.
Figure 5: After normalization with Z-score.
Summary statistics of classification scores by group.

+4

GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints
  • Preprint
  • File available

May 2025

·

5 Reads

Soren DeHaan

·

Yuanze Liu

·

Johan Bollen

·

Sa'ul A. Blanco

The proliferation of Large Language Models (LLMs) in late 2022 has impacted academic writing, threatening credibility, and causing institutional uncertainty. We seek to determine the degree to which LLMs are used to generate critical text as opposed to being used for editing, such as checking for grammar errors or inappropriate phrasing. In our study, we analyze arXiv papers for stylistic segmentation, which we measure by varying a PELT threshold against a Bayesian classifier trained on GPT-regenerated text. We find that LLM-attributed language is not predictive of stylistic segmentation, suggesting that when authors use LLMs, they do so uniformly, reducing the risk of hallucinations being introduced into academic preprints.

Download

Social media use and transdiagnostic dimensions of psychopathology: Relative effects of frequency and problematic social media use

May 2025

·

10 Reads

Background: Social media has been identified as a possible risk factor for depression in young adults, though it is unclear if it is associated with other dimensions of psychopathology. Moreover, it is unclear what aspects of social media use beyond frequency are associated to symptoms. Methods: A large sample of young adults (N= 7453) participated in a screening of transdiagnostic symptoms of psychopathology (i.e., depression, panic, generalized anxiety, social anxiety, alcohol use, drug use, insomnia, and pain). We correlated depression scores to time spent on social media and problematic social media use. We used a principal components analysis to identify shared dimensions of psychopathology and explored the correlations between symptom dimensions and social media use. Results: Participants who reported higher social media usage exhibited significantly higher self-reported depression. Our PCA suggested two dimensions of psychopathology consistent with the HiTOP model which we dubbed “emotional” and “externalizing” symptoms. Time spent on social media was correlated with emotional symptoms and with externalizing symptoms, albeit to a lesser extent. However, these associations were not significant after controlling for problematic social media use, which had differential relations to dimensions of psychopathology. e-value analysis suggested that unmeasured confounds could potentially explain these associations. Discussion: The association between time spent on social media and psychopathology could be accounted for by the way in which social media is used. However, unmeasured confounding remains a threat to these inferences.


One shot intervention reduces online engagement with distorted content

March 2025

·

5 Reads

PNAS Nexus

·

·

Lorenzo Lorenzo-Luaces

·

[...]

·

Jennifer Sue Trueblood

Depression is one of the leading causes of disability worldwide. Individuals with depression often experience unrealistic and overly negative thoughts, i.e. cognitive distortions, that cause maladaptive behaviors and feelings. Now that a majority of the US population uses social media platforms, concerns have been raised that they may serve as a vector for the spread of distorted ideas and thinking amid a global mental health epidemic. Here, we study how individuals (N=838) interact with distorted content on social media platforms using a simulated environment similar to Twitter (now X). We find that individuals with higher depression symptoms tend to prefer distorted content more than those with fewer symptoms. However, a simple one-shot intervention can teach individuals to recognize and drastically reduce interactions with distorted content across the entire depression scale. This suggests that distorted thinking on social media may disproportionally affect individuals with depression, but simple awareness training can mitigate this effect. Our findings have important implications for understanding the role of social media in propagating distorted thinking and potential paths to reduce the societal cost of mental health disorders.


Bootstrapped aggregate CDS prevalence distributions for each GAD-10 severity class. Note Each bar above with the sample size of the corresponding severity class. The colored box represents interquartile range, while the horizontal lines correspond to 95% CI. Our results show a trend of increasing CDS prevalence as severity increases, with pairwise significant differences denoted by braces
Bootstrapped aggregate CDS prevalence distributions for each PHQ-9 severity class. Note. Each bar is annotated above with the median and 95% CI bounds in brackets. The colored box represents interquartile range, while the horizontal lines correspond to 95% CI. Pairwise significant differences denoted by braces
Pairwise Spearman rank-order correlation coefficients between (a) GAD10, PHQ9 and confounding variables and (b) accounting for shared and unique variance between PHQ9 and GAD10. Note. Significance of results denoted below the coefficient value by **: p < 0.01 and *: p < 0.05
Anxiety and Depression are Associated with More Distorted Thinking on Social Media: A Longitudinal Multi-Method Study

March 2025

·

34 Reads

·

1 Citation

Cognitive Therapy and Research

Background Depression and anxiety are associated with patterns of negative thinking that can be targeted through cognitive restructuring as a part of cognitive therapy (CT) or cognitive behavioral therapy (CBT). Our team has created a set of cognitive distortion schemata (CDS) n-grams based on theories underlying CT to measure the linguistic markers that indicate cognitive vulnerability to depression. These CDS were specifically designed to examine online language. Our prior work supports a relationship between CDS and a diagnosis of depression, but less is known about the relationship between online language, CDS, and anxiety. The current study measures if CDS can be detected in people who report anxiety symptoms, and whether CDS increase with symptom severity. Methods 1,377 participants were recruited from a study assessing social media use and mental health symptoms, the Studies of Online Cohorts of Internalizing Symptoms and Language (SOCIAL). From this, 804 timelines were harvested, and after removing missing data and bots, our final sample was 537 respondents who posted 999,859 tweets. This is a longitudinal, multi-method design, using surveys and text-based analysis of social media timelines. We used bootstrap resampling to compare differences in CDS prevalence in anxious and depressed participants. Results CDS can be observed in anxiety disorders, significantly increase as a function of anxiety symptom severity, and are related to depression and anxiety comorbidity. Conclusions Using behavioral, affective, and cognitive indicators of distorted thinking from social media may yield new insight into the trajectories of depression and anxiety. This work has implications for the future of CT/CBT and other online interventions that target distorted thinking styles.


Social inequality and cultural factors impact the awareness and reaction during the cryptic transmission period of pandemic

February 2025

·

8 Reads

PNAS Nexus

The World Health Organization (WHO) declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC) on January 31, 2020. However, rumors of a ``mysterious virus’' had already been circulating in China in December 2019, possibly preceding the first confirmed COVID-19 case. Understanding how awareness about an emerging pandemic spreads through society is vital not only for enhancing disease surveillance, but also for mitigating demand shocks and social inequities, such as shortages of personal protective equipment (PPE) and essential supplies. Here we leverage a massive e-commerce dataset comprising 150 billion online queries and purchase records from 94 million people to detect the traces of early awareness and public response during the cryptic transmission period of COVID-19. Our analysis focuses on identifying information gaps across different demographic cohorts, revealing significant social inequities and the role of cultural factors in shaping awareness diffusion and response behaviors. By modeling awareness diffusion in heterogeneous social networks and analyzing online shopping behavior, we uncover the evolving characteristics of vulnerable populations. Our findings expand the theoretical understanding of awareness spread and social inequality in the early stages of a pandemic, highlighting the critical importance of e-commerce data and social network data in effectively and timely addressing future pandemic challenges. We also provide actionable recommendations to better manage and mitigate dynamic social inequalities in public health crises.


Fig. 3. Awareness (percentage) Patterns for Different Occupation Groups (upper sub-plot); a base-10 log scale is applied for the Y-axis; the left cut-out zooms in on details for 5 days around the Wuhan lockdown (01/23/2020); the right cut-out zooms in on details for 7 days around WHO declared the new coronavirus outbreak (01/31/2020) in the post-peak phase. The hospital staff kept the highest awareness percentage (0.16%-0.45%) in the whole beginning phase. In the growth phase, the education/research group surpassed hospital staff and became the most aware group (0.39%-25%), while agriculture forestry animal-husbandry and fishery were the least group (0.16%-10.09%). The peak phase showed a similar pattern; education/research was the most aware group (37.24%-66.25%) while agriculture forestry animal-husbandry and fishery were the least one (16.86%-44.03%). The gaps of different occupation groups shrank during the post-peak phase. The lower sub-plot visualizes four representative days of different phases, and the Y-axis is the average purchasing power of the aware population from different occupation groups. Results show that high-income people respond to the emerging pandemic more quickly than low-income people.
Five Different Phases [31] of the Cryptic Period of COVID-19 with Specific Real-World Events
Social inequality and cultural factors impact the awareness and reaction during the cryptic transmission period of pandemic

February 2025

·

18 Reads

The World Health Organization (WHO) declared the COVID-19 outbreak a Public Health Emergency of International Concern (PHEIC) on January 31, 2020. However, rumors of a "mysterious virus" had already been circulating in China in December 2019, possibly preceding the first confirmed COVID-19 case. Understanding how awareness about an emerging pandemic spreads through society is vital not only for enhancing disease surveillance, but also for mitigating demand shocks and social inequities, such as shortages of personal protective equipment (PPE) and essential supplies. Here we leverage a massive e-commerce dataset comprising 150 billion online queries and purchase records from 94 million people to detect the traces of early awareness and public response during the cryptic transmission period of COVID-19. Our analysis focuses on identifying information gaps across different demographic cohorts, revealing significant social inequities and the role of cultural factors in shaping awareness diffusion and response behaviors. By modeling awareness diffusion in heterogeneous social networks and analyzing online shopping behavior, we uncover the evolving characteristics of vulnerable populations. Our findings expand the theoretical understanding of awareness spread and social inequality in the early stages of a pandemic, highlighting the critical importance of e-commerce data and social network data in effectively and timely addressing future pandemic challenges. We also provide actionable recommendations to better manage and mitigate dynamic social inequalities in public health crises.


Social media use and transdiagnostic dimensions of psychopathology: Relative effects of frequency and problematic social media use

November 2024

Background: Young adults in the United States are heavy users of social media which has been identified as a possible risk factor for depression. However, it is unclear whether social media use is associated with internalizing disorder symptom vulnerability beyond depression or whether it is related to other relevant symptom dimensions like externalizing symptoms (e.g., substance use). Moreover, it is unclear what aspects of social media use beyond frequency are associated to symptoms. Methods: A large cohort of undergraduate students from a Midwest university (N= 7123) participated in a screening of transdiagnostic symptoms of psychopathology. Data collected included: time spent on social media, social media platforms used, and aspects of problematic social media use (i.e., use for emotion regulation, compulsive use, or social-media-related impairment). We also collected sociodemographics and self-reported symptoms: depression, panic, generalized anxiety, social anxiety, alcohol use, drug use, insomnia, and pain. We correlated depression scores to time spent on social media to confirm prior findings of an association between social media use and depression. We then used a principal components analysis to identify shared dimensions of psychopathology across these symptoms and explored the correlations between symptom dimensions and elements of social media use. Results: Participants who reported higher social media usage exhibited significantly higher self-reported depression. Our PCA suggested two dimensions of psychopathology consistent with the HiTOP model which we dubbed “emotional” and “externalizing” symptoms. Time spent on social media was correlated with emotional symptoms and with externalizing symptoms, albeit to a lesser extent. However, these associations were not significant after controlling for problematic social media use. Moreover, problematic social media use had differential relations to dimensions of psychopathology. For example, using social media for emotion regulation was associated with more severe internalizing symptoms but less severe externalizing symptoms. E-value analysis suggested that unmeasured confounds could potentially explain these associations. Discussion: Our results suggest that the association between time spent on social media and psychopathology could be accounted for by the way in which social media is used, which in turn is differentially associated to emotional and externalizing symptoms. However, unmeasured confounding remains a threat to these inferences.


Social media use and transdiagnostic dimensions of psychopathology: Relative effects of frequency and problematic social media use

November 2024

·

131 Reads

Background: Young adults in the United States are heavy users of social media which has been identified as a possible risk factor for depression. However, it is unclear whether social media use is associated with internalizing disorder symptom vulnerability beyond depression or whether it is related to other relevant symptom dimensions like externalizing symptoms (e.g., substance use). Moreover, it is unclear what aspects of social media use beyond frequency are associated to symptoms. Methods: A large cohort of undergraduate students from a Midwest university (N= 7123) participated in a screening of transdiagnostic symptoms of psychopathology. Data collected included: time spent on social media, social media platforms used, and aspects of problematic social media use (i.e., use for emotion regulation, compulsive use, or social-media-related impairment). We also collected sociodemographics and self-reported symptoms: depression, panic, generalized anxiety, social anxiety, alcohol use, drug use, insomnia, and pain. We correlated depression scores to time spent on social media to confirm prior findings of an association between social media use and depression. We then used a principal components analysis to identify shared dimensions of psychopathology across these symptoms and explored the correlations between symptom dimensions and elements of social media use. Results: Participants who reported higher social media usage exhibited significantly higher self-reported depression. Our PCA suggested two dimensions of psychopathology consistent with the HiTOP model which we dubbed “emotional” and “externalizing” symptoms. Time spent on social media was correlated with emotional symptoms and with externalizing symptoms, albeit to a lesser extent. However, these associations were not significant after controlling for problematic social media use. Moreover, problematic social media use had differential relations to dimensions of psychopathology. For example, using social media for emotion regulation was associated with more severe internalizing symptoms but less severe externalizing symptoms. E-value analysis suggested that unmeasured confounds could potentially explain these associations. Discussion: Our results suggest that the association between time spent on social media and psychopathology could be accounted for by the way in which social media is used, which in turn is differentially associated to emotional and externalizing symptoms. However, unmeasured confounding remains a threat to these inferences.


Anxiety and depression are associated with more distorted thinking on social media: Longitudinal Observational Study (Preprint)

November 2024

·

9 Reads

BACKGROUND Depression and anxiety are associated with patterns of negative thinking that can be targeted through cognitive restructuring as a part of cognitive behavioral therapy (CBT). Our team has created a set of cognitive distortion schemata (CDS) n-grams based on theories underlying CBT to measure the linguistic markers that indicate cognitive vulnerability to depression. These CDS were specifically designed to examine online language. OBJECTIVE Our prior work supports a relationship between CDS and a diagnosis of depression, but less is known about the relationship between online language, CDS, and anxiety. The current study measures if CDS can be detected in people who report an anxiety symptoms, and if CDS increase with symptom severity. METHODS 691 participants were recruited from a study assessing social media use and mental health symptoms, the Studies of Online Cohorts of Internalizing Symptoms and Language (SOCIAL). We used bootstrap resampling to compare differences in CDS prevalence in anxious and depressed participants. RESULTS CDS can be observed in anxiety disorders, increase as a function of anxiety symptom severity, and are related to depression and anxiety comorbidity. CONCLUSIONS Using behavioral, affective, and cognitive indicators of distorted thinking from social media may yield new insight into the trajectories of depression and anxiety. This work has implications for the future of CBT and other online interventions that target distorted thinking styles. CLINICALTRIAL n/a


A heart model of Earth Stewardship: Shaking up science for positive futures

October 2024

·

260 Reads

·

2 Citations

Few disagree that we should pass on the Earth in good shape to future generations, and many scientists want their work to contribute to that goal. Recent work has shown that hopelessness stands in the way of people taking an active attitude. At the same time, it is becoming clear what can be done about that: providing compelling visions of attractive futures and highlighting feasible pathways. Currently, science and the humanities are not well designed for this task. Practices that stand in the way of a more holistic change‐making approach include proposal‐based funding, paralyzing rigor requirements, and a focus on explanation rather than action. Removing those barriers may require culture shifts, a notoriously difficult and slow kind of change. Meanwhile, realistic inspiring future scenarios can be developed by bringing diverse thinkers together in environments where time, space, and immediate outcomes are not pressing.


Citations (56)


... The established links between distortions and mental health conditions have motivated language analysis on social networks for early detection of depression markers of depression in social media posts (Ophir et al., 2017;Bathina et al., 2021;A. Rutter et al., 2025). Our study underscores that NLP models of cognitive distortions effectively align language with actual mental health conditions, and contributes to real-world monitoring or intervention strategies through advanced detection capabilities. ...

Reference:

Linking Language-based Distortion Detection to Mental Health Outcomes
Anxiety and Depression are Associated with More Distorted Thinking on Social Media: A Longitudinal Multi-Method Study

Cognitive Therapy and Research

... While considerable research has been dedicated to text classi cation in various domains, such as blogs and social media reviews [20][21][22], there has been relatively limited focus on sentiment analysis within the economic and nancial sectors. Recent research in text mining for nancial content has predominantly relied on word categorization methods such as the "Bag of Words" approach [23]. ...

Happiness is assortative in online social networks
  • Citing Preprint
  • March 2011

... The RA played a pivotal role in shaping the SES field by developing and synthesizing foundational concepts such as resilience thinking (Folke et al. 2005(Folke et al. , 2010, the adaptive cycle and panarchy (Gunderson and Holling 2002), and socialecological traps (Platt 1973, Allison and Hobbs 2004, Carpenter and Brock 2008, Bowles et al. 2011. It pioneered a distinctive mode of collaboration through small, interdisciplinary workshops designed to foster trust, creativity, and theoretical integration (Walker et al. 2004, Parker and Hackett 2012, Scheffer et al. 2024, an approach that remains a hallmark of SES research today. Beyond shaping theory, the RA helped legitimize the emerging SES field through the launch of Ecology and Society, one of the first open-access, online-only journals for interdisciplinary sustainability science (Folke et al. 2016, Schlüter et al. 2019, Manyani et al. 2024. ...

A heart model of Earth Stewardship: Shaking up science for positive futures
  • Citing Article
  • October 2024

... Por otro lado, el afecto negativo se asocia a una mayor predisposición a experimentar emociones negativas, afectando a la satisfacción vital y a la calidad de vida (Dufey & Fernández, 2012;Martín et al., 2015). Además, los niveles altos de afecto negativo caracterizan los trastornos de ansiedad y depresión (Rutter et al., 2024). Esta relación entre el afecto y variables psicológicas relacionadas con el bienestar y la salud mental se vuelve más evidente en el contexto educativo. ...

Negative affect variability differs between anxiety and depression on social media

... In particular, the digital nature of online activities simplifies assessment and opens up the potential for real-time practical applications. Knowledge of the relationship between online information-seeking patterns and mental health can inform the development of tools that could complement existing interventions, such as screen time awareness tools 18,19 , and digital phenotyping methods [20][21][22][23][24][25][26] . ...

Quantifying the Digital Phenotype of Loneliness on Twitter

... The low endorsement of mpox stigma may also be related to overall stigma reduction efforts related to experiences and identities of marginalized identities, particularly in the HIV and STI context. In general, effective communication can enable or dispel myths and misconceptions about mpox in targeted digital media, news sources, large events, or through sexual networks (Banjar & Alaqeel, 2024;Edinger et al., 2023). While greater mpox knowledge could lead to reduced forms of stigma, this contrasts with HIV and HIV-related stigma, as people living with HIV who are well-informed about HIV still experience stigma and discrimination (Letshwenyo-Maruatona et al., 2019;Nyblade, 2006;Yang et al., 2006). ...

Misinformation and Public Health Messaging in the Early Stages of the Mpox Outbreak: Mapping the Twitter Narrative with Deep Learning (Preprint)

Journal of Medical Internet Research

... This enables a detailed analysis of social networks and their impact on pandemic awareness and disaster planning. By leveraging these unique data characteristics, eCommerce data facilitates innovative research that can track the diffusion of pandemic awareness with high resolution across social, temporal, demographic, and geographical dimensions, at truly societal scales (a majority of the Chinese population) 3 . This data also allows for early detection of shifts in consumer behavior related to health concerns, providing valuable signals for timely public health interventions. ...

Declining Well-Being During the COVID-19 Pandemic Reveals U.S. Social Inequities
  • Citing Article
  • January 2021

SSRN Electronic Journal

... For the study we required ADHD as an eligibility requirement. Self-diagnosis was accepted due to the work of Rutter et al. [19] who found self-diagnosis as a reasonable indicator of an internalised condition and to fall in line with similar works [11]. ...

“I haven’t been diagnosed, but I should be”: Insight into self-diagnoses of common mental health disorders (Preprint)

JMIR Formative Research

... Pro-anorexia communities fulfill that need for significance by allowing individuals to set body goals (e.g., "lose 20 lbs before prom") and track progress for accountability (both common hashtags in our data). Communities also create a collective identity [71] by having members self-label themselves with hashtags that express solidarity with a chosen group [72], e.g., "edtwt" or "ricecaketwt". Once individuals join a pro-anorexia community, or radical network, group dynamic processes keep them engaged [70]. ...

Quantifying collective identity online from self-defining hashtags

... To evaluate the bots' influence on elections, we retrieved all replies under the posts of each party's secretary during the last month of elections, between 23 August and 23 September 2022. To detect bots among the repliers, we employed Botometer [61], a widespread ML-based tool used in the literature (e.g., [48,32]) that distinguishes between legitimate users and bots. This approach was chosen because as of February 2023, the Italian language is yet to have a strong Among the metrics, Botometer returns, for each checked account, the following scores: ...

Studies of Online Cohorts for Internalizing symptoms and Language (SOCIAL) I and II: Triangulating surveys and social media data (Preprint)
  • Citing Article
  • May 2022

JMIR Formative Research