Ceren Budak’s research while affiliated with University of Michigan and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (76)


Plurals: A System for Guiding LLMs via Simulated Social Ensembles
  • Conference Paper

April 2025

·

4 Reads

·

Emily Fry

·

Narendra Edara

·

[...]

·

Ceren Budak

Figure 1: Distribution of user engagement and video duration for each of the Tenet Media podcast hosts.
Figure 2: Tenet media videos with top user engagement: a) most viewed video, b) most commented on video, c) most upvoted video, and d) most downvoted video.
Outsourcing an Information Operation: A Complete Dataset of Tenet Media's Podcasts on Rumble
  • Preprint
  • File available

March 2025

·

14 Reads

Tenet Media, a U.S.-based, right-wing media company, hired six established podcasters to create content related to U.S. politics and culture during the 2024 U.S. presidential election cycle. After publishing content on YouTube and Rumble for nearly a year, Tenet Media was declared by the U.S. government to be funded entirely by Russia -- making it effectively an outsourced state-sponsored information operation (SSIO). We present a complete dataset of the 560 podcast videos published by the Tenet Media channel on the video-sharing platform Rumble between November 2023 and September 2024. Our dataset includes video metadata and user comments, as well as high-quality video transcriptions, representing over 300 hours of video content. This dataset provides researchers with material to study a Russian SSIO, and notably on Rumble, which is an understudied platform in SSIO scholarship.

Download

Figure 3: ROC-AUC scores at the 30% threshold for each LLM and prompt combination, with and without adding discourse-level signal from SBERT. Adding discourse-level metaphorical associations improves performance across all LLMs and prompts.
Figure A1: Inter-annotator agreement (Krippendorff's α) for each concept and domain-agnostic metaphor.
Figure A2: Comparison of GPT-4o-based metaphor scoring models that vary in prompt (Simple or Descriptive) and whether document-level associations are incorporated with SBERT embeddings. The x-axis represents different classification thresholds (i.e., percent of annotators who label a tweet as metaphorical). Across all thresholds, including SBERT improves performance.
Figure A3: Boxplot showing distribution of metaphor scores for each source domain across all 400K tweets. White dots represent mean scores.
When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models

February 2025

·

58 Reads

Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.


Figure 1: System diagram of Plurals-an end-to-end generator of simulated social ensembles. (1) Agents complete tasks within (2) Structures, with communication optionally summarized by (3) Moderators. Plurals integrates with government datasets (1a) and templates inspired by democratic deliberation theory (1b). The building block is Agents, which are large language models (LLMs) that have system instructions and tasks. System instructions can be generated from user input, government datasets (American National Election Studies; ANES), and templates from deliberative democracy literature [14]. Agents exist within Structures, which define what information is shared. Combination instructions tell Agents how to combine the responses of other Agents when deliberating in the Structure. Users can customize an Agent's combination instructions or use existing templates drawn from deliberation literature and beyond. Moderators aggregate responses from multi-agent deliberation.
Plurals: A System for Guiding LLMs Via Simulated Social Ensembles

September 2024

·

46 Reads

Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a 'view from nowhere' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by democratic deliberation theory, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI. The Plurals library is available at https://github.com/josh-ashkinaze/plurals and will be continually updated.


Understanding the rationales and information environments for early, late, and nonadopters of the COVID-19 vaccine

September 2024

·

50 Reads

·

2 Citations

npj Vaccines

Anti-vaccine sentiment during the COVID-19 pandemic grew at an alarming rate, leaving much to understand about the relationship between people’s vaccination status and the information they were exposed to. This study investigated the relationship between vaccine behavior, decision rationales, and information exposure on social media over time. Using a cohort study that consisted of a nationally representative survey of American adults, three subpopulations (early adopters, late adopters, and nonadopters) were analyzed through a combination of statistical analysis, network analysis, and semi-supervised topic modeling. The main reasons Americans reported choosing to get vaccinated were safety and health. However, work requirements and travel were more important for late adopters than early adopters (95% CI on OR of [0.121, 0.453]). While late adopters’ and nonadopters’ primary reason for not getting vaccinated was it being too early, late adopters also mentioned safety issues more often and nonadopters mentioned government distrust (95% CI on OR of [0.125, 0.763]). Among those who shared Twitter/X accounts, early adopters and nonadopters followed a larger fraction of highly partisan political accounts compared to late adopters, and late adopters were exposed to more neutral and pro-vaccine messaging than nonadopters. Together, these findings suggest that the decision-making process and the information environments of these subpopulations have notable differences, and any online vaccination campaigns need to consider these differences when attempting to provide accurate vaccine information to all three subpopulations.


Wikipedia in Wartime: Experiences of Wikipedians Maintaining Articles About the Russia-Ukraine War

September 2024

·

50 Reads

How do Wikipedians maintain an accurate encyclopedia during an ongoing geopolitical conflict where state actors might seek to spread disinformation or conduct an information operation? In the context of the Russia-Ukraine War, this question becomes more pressing, given the Russian government's extensive history of orchestrating information campaigns. We conducted an interview study with 13 expert Wikipedians involved in the Russo-Ukrainian War topic area on the English-language edition of Wikipedia. While our participants did not perceive there to be clear evidence of a state-backed information operation, they agreed that war-related articles experienced high levels of disruptive editing from both Russia-aligned and Ukraine-aligned accounts. The English-language edition of Wikipedia had existing policies and processes at its disposal to counter such disruption. State-backed or not, the disruptive activity created time-intensive maintenance work for our participants. Finally, participants considered English-language Wikipedia to be more resilient than social media in preventing the spread of false information online. We conclude by discussing sociotechnical implications for Wikipedia and social platforms.


Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

July 2024

·

76 Reads

·

1 Citation

Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.


Framing Social Movements on Social Media: Unpacking Diagnostic, Prognostic, and Motivational Strategies

June 2024

·

104 Reads

Social media enables activists to directly communicate with the public and provides a space for movement leaders, participants, bystanders, and opponents to collectively construct and contest narratives. Focusing on Twitter messages from social movements surrounding three issues in 2018-2019 (guns, immigration, and LGBTQ rights), we create a codebook, annotated dataset, and computational models to detect diagnostic (problem identification and attribution), prognostic (proposed solutions and tactics), and motivational (calls to action) framing strategies. We conduct an in-depth unsupervised linguistic analysis of each framing strategy, and uncover cross-movement similarities in associations between framing and linguistic features such as pronouns and deontic modal verbs. Finally, we compare framing strategies across issues and other social, cultural, and interactional contexts. For example, we show that diagnostic framing is more common in replies than original broadcast posts, and that social movement organizations focus much more on prognostic and motivational framing than journalists and ordinary citizens.


Misunderstanding the harms of online misinformation

June 2024

·

415 Reads

·

35 Citations

Nature

The controversy over online misinformation and social media has opened a gap between public discourse and scientific research. Public intellectuals and journalists frequently make sweeping claims about the effects of exposure to false content online that are inconsistent with much of the current empirical evidence. Here we identify three common misperceptions: that average exposure to problematic content is high, that algorithms are largely responsible for this exposure and that social media is a primary cause of broader social problems such as polarization. In our review of behavioural science research on online misinformation, we document a pattern of low exposure to false and inflammatory content that is concentrated among a narrow fringe with strong motivations to seek out such information. In response, we recommend holding platforms accountable for facilitating exposure to false and extreme content in the tails of the distribution, where consumption is highest and the risk of real-world harm is greatest. We also call for increased platform transparency, including collaborations with outside researchers, to better evaluate the effects of online misinformation and the most effective responses to it. Taking these steps is especially important outside the USA and Western Europe, where research and data are scant and harms may be more severe.


Framing Social Movements on Social Media: Unpacking Diagnostic, Prognostic, and Motivational Strategies

May 2024

·

238 Reads

·

4 Citations

Journal of Quantitative Description Digital Media

Social media enables activists to directly communicate with the public and provides a space for movement leaders, participants, bystanders, and opponents to collectively construct and contest narratives. Focusing on Twitter messages from social movements surrounding three issues in 2018-2019 (guns, immigration, and LGBTQ rights), we create a codebook, annotated dataset, and computational models to detect diagnostic (problem identification and attribution), prognostic (proposed solutions and tactics), and motivational (calls to action) framing strategies. We conduct an in-depth unsupervised linguistic analysis of each framing strategy, and uncover cross-movement similarities in associations between framing and linguistic features such as pronouns and deontic modal verbs. Finally, we compare framing strategies across issues and other social, cultural, and interactional contexts. For example, we show that diagnostic framing is more common in replies than original broadcast posts, and that social movement organizations focus much more on prognostic and motivational framing than journalists and ordinary citizens.


Citations (53)


... During the COVID-19 pandemic, the major challenge faced by public health authorities in developing a vaccinated population was not the technical challenge of creating vaccines, but the socio-political challenge of getting them into arms. To study this, we conducted a proability survey of Americans, asking for both basic demographic information and vaccination status for each respondent (Singh et al., 2024). For a subset of respondents who used Twitter 4 , we asked them if they would share their handle and collect their data to support computer science and social science research 5 . ...

Reference:

Proximal Iteration for Nonlinear Adaptive Lasso
Understanding the rationales and information environments for early, late, and nonadopters of the COVID-19 vaccine

npj Vaccines

... While LLMs can efficiently process large datasets and identify potential citation issues, researchers caution against viewing these models as a replacement of human judgment. Ashkinaze et al. (2024) emphasize that evaluating citations often requires a deeper understanding of context, source reliability, and potential biases. In these areas, human editors are superior to current AI models. ...

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

... Rather than viewing misinformation solely as a crisis to be solved, however, it is more productive to examine why misinformation emerges, and how it operates in the current information ecosystem and influences public reasoning. Framing misinformation as a unique crisis may neglect it as a persistent part of democratic discourse, which involved competing claims, and misinformation is often a byproduct of evolving knowledge, ideological conflicts, and contested narratives within political and scientific deliberations (Budak et al., 2024;Krause et al., 2022;Scheufele, Krause, et al., 2021). Take science communication as an example --The scientific process is inherently provisional, with findings continually subject to revision. ...

Misunderstanding the harms of online misinformation
  • Citing Article
  • June 2024

Nature

... In the context of social media, the process of policy communication has seen the emergence of a new logic of "connective action" based on personalized content and framing. This logic emphasizes interactions between individuals and networked mobilization, rather than the traditional logic of collective action that relies on organizational intermediaries and collective identity [36]. These communities play a crucial role in the dissemination of policy information. ...

Framing Social Movements on Social Media: Unpacking Diagnostic, Prognostic, and Motivational Strategies

Journal of Quantitative Description Digital Media

... For these users, the presence of a community note note may discourage engagement with the content (Chuai et al. 2024a). In contrast, a fact-checked user's followers may remain loyal because their relationship with the user is driven more by social or ideological factors (Aiello et al. 2012;Barberá 2015) rather than content credibility (Ashkinaze, Gilbert, and Budak 2024). This interpretation aligns with evidence from prior research showing that while fact-checks can reduce beliefs in false claims, they often fail to change attitudes towards the person who authored it, particularly in political contexts (Swire-Thompson et al. 2020;Nyhan et al. 2020). ...

The Dynamics of (Not) Unfollowing Misinformation Spreaders
  • Citing Conference Paper
  • May 2024

... The HCI community has developed various approaches to facilitate learning and awareness-building, from e-learning platforms or applications [32], to personal informatics [43] and chatbots [29]. Among these, serious games-designed for purposes beyond entertainment [46]-have demonstrated particular effectiveness across diverse domains, such as emotional regulation [16], political discussion [54], health behavior [49,51], or disability awareness education [21,36]. ...

GuesSync!: An Online Casual Game To Reduce Affective Polarization
  • Citing Article
  • October 2023

Proceedings of the ACM on Human-Computer Interaction

... The subject under consideration Surprisingly, the results demonstrate that the selection of natural language has little bearing on the structure of global hypertext in any nation. According to Mendelsohn et al. (2023), the role of vestibules and websites in social structure reproduction is thus deemed more significant. This finding implies that data regarding communication in bilingual or weblog contexts may significantly overstate the role of natural language as a catalyst for international communication. ...

Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media
  • Citing Article
  • June 2023

Proceedings of the International AAAI Conference on Web and Social Media

... Additionally, as crises evolve, the nature of misinformation evolves quickly as exemplified by the COVID "infodemic. " (c.f., [91,224]) That is why relying on the wisdom of "crowds of ordinary users" to flag dis/misinformation by accounting for "content facticity, user intent, and perceived harm" has proven to be a more effective moderation policy [22]. In our case, users might trigger the CDC AI assistant by mentioning @CDCGov and using a hashtag like mis/disinformation that would then again be annotated and responded to by CDC staff as introduced above. ...

Wisdom of Two Crowds: Misinformation Moderation on Reddit and How to Improve this Process---A Case Study of COVID-19
  • Citing Article
  • April 2023

Proceedings of the ACM on Human-Computer Interaction

... We build on prior work analysing the ecosystem supporting misinformation websites [29][30][31][32][33] and programmatic advertising 34 by matching millions of instances of advertising companies appearing across thousands of news outlets with data on misinformation websites, thereby providing large-scale evidence of the ecosystem that sustains online misinformation over a consistent period of three years. Additionally, we present descriptive evidence about the relative roles of advertising companies and digital advertising platforms in financing misinformation. ...

An Analysis of the Partnership between Retailers and Low-credibility News Publishers

Journal of Quantitative Description Digital Media

... Regarding social capital, those societies with a sense of social obligation to help others, where (in)tangible relations and resources are voluntarily allocated to charitable initiatives, provide a supportive environment for the success of DCF (Aprilia and Wibowo 2017;Kshetri 2015). Accordingly, drivers mainly respond to the maturity of charity marketplaces (Meer 2014(Meer , 2017Budak and Rao 2016;Ghosh and Mahdian 2008), and their political, cultural, financial, and regulatory aspects (Bernardino and Santos 2016;Body and Breeze 2016;Bellio et al. 2015). Some authors proved the relation between other charitable mechanisms (i.e., matching grants), the existing dynamics (i.e., competition, efficiency), and the likelihood of a charitable campaign to succeed via DCF (Meer 2014(Meer , 2017Budak and Rao 2016;Kshetri 2015). ...

Measuring the Efficiency of Charitable Giving with Content Analysis and Crowdsourcing
  • Citing Article
  • August 2021

Proceedings of the International AAAI Conference on Web and Social Media