Sarah Rajtmajer’s research while affiliated with Pennsylvania State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (72)


Figure 1: Box-plot comparison of correctly identified real, fake, and total stories by humans and GPT-4o.
Figure 2: Density plots comparing the performance of humans and GPT-4 models (batch and single modes) across four metrics. Top-left: Precision; Top-right: Recall; Bottom-left: False Positive Rate (FPR); Bottomright: False Negative Rate (FNR).
Figure A3: Scatter plot displaying the frequency of terms occurred in the indicators generated by Gemini (T emp best =0.3), with the x-axis representing terms more frequently associated with news incorrectly identified as real by Gemini, and the y-axis representing terms more frequently associated with news correctly identified as fake by Gemini.
Figure A4: Scatter plot displaying the frequency of terms occurred in the indicators generated by Llama-3.1 (T emp best =0.3), with the x-axis representing terms more frequently associated with news incorrectly identified as real by Llama-3.1, and the y-axis representing terms more frequently associated with news correctly identified as fake by Llama-3.1.
The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News
  • Preprint
  • File available

October 2024

·

52 Reads

·

Wenbo Zhang

·

Sai Koneru

·

[...]

·

With the rise of AI-generated content spewed at scale from large language models (LLMs), genuine concerns about the spread of fake news have intensified. The perceived ability of LLMs to produce convincing fake news at scale poses new challenges for both human and automated fake news detection systems. To address this gap, this work presents the findings from a university-level competition which aimed to explore how LLMs can be used by humans to create fake news, and to assess the ability of human annotators and AI models to detect it. A total of 110 participants used LLMs to create 252 unique fake news stories, and 84 annotators participated in the detection tasks. Our findings indicate that LLMs are ~68% more effective at detecting real news than humans. However, for fake news detection, the performance of LLMs and humans remains comparable (~60% accuracy). Additionally, we examine the impact of visual elements (e.g., pictures) in news on the accuracy of detecting fake news stories. Finally, we also examine various strategies used by fake news creators to enhance the credibility of their AI-generated content. This work highlights the increasing complexity of detecting AI-generated fake news, particularly in collaborative human-AI settings.

Download

Figure 1: NLP papers on misinformation detection between 2016 and 2023 (orange = ALL; blue = focused on low-resource languages (LRLs)
Figure 2: General pipeline of monolingual and multilingual misinformation detection for LRLs
Figure 3: Approaches in monolingual and multilingual misinformation detection for LRLs
Figure A1: PRISMA diagram for the selection of papers presenting datasets and detection algorithms of misinformation detection
Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

October 2024

·

16 Reads

·

1 Citation

In today's global digital landscape, misinformation transcends linguistic boundaries, posing a significant challenge for moderation systems. While significant advances have been made in misinformation detection, the focus remains largely on monolingual high-resource contexts, with low-resource languages often overlooked. This survey aims to bridge that gap by providing a comprehensive overview of the current research on low-resource language misinformation detection in both monolingual and multilingual settings. We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts. We also examine emerging approaches, such as language-agnostic models and multi-modal techniques, while emphasizing the need for improved data collection practices, interdisciplinary collaboration, and stronger incentives for socially responsible AI research. Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.


Toward Context-Aware Privacy Enhancing Technologies for Online Self-Disclosure

October 2024

·

14 Reads

Proceedings of the AAAI Conference on Human Computation and Crowdsourcing

Voluntary sharing of personal information is at the heart of user engagement on social media and central to platforms' business models. From the users' perspective, so-called self-disclosure is closely connected with both privacy risks and social rewards. Prior work has studied contextual influences on self-disclosure, from platform affordances and interface design to user demographics and perceived social capital. Our work takes a mixed-methods approach to understand the contextual information which might be integrated in the development of privacy-enhancing technologies. Through observational study of several Reddit communities, we explore the ways in which topic of discussion, group norms, peer effects, and audience size are correlated with personal information sharing. We then build and test a prototype privacy-enhancing tool that exposes these contextual factors. Our work culminates in a browser extension that automatically detects instances of self-disclosure in Reddit posts at the time of posting and provides additional context to users before they post to support enhanced privacy decision-making. We share this prototype with social media users, solicit their feedback, and outline a path forward for privacy-enhancing technologies in this space.


The False Promises of Application-Driven Learning: Mathematical Thinking in Today’s Rapidly Evolving Technology Landscape

June 2024

·

2 Reads

Undergraduate curricula in informatics programs typically include several courses in mathematics. In my time teaching mathematics in IST, I have noticed continual and increasing calls from colleagues and administrators to prove to students the relevance of these courses through application-driven lesson plans and preferential treatment of topics with well-understood connection to today’s technologies. This chapter offers a counterpoint to these calls. I suggest that the treatment of foundational mathematical concepts as a means to well-defined and highly circumscribed ends perpetuates math anxiety and undermines flexibility of mathematical reasoning critical to highly dynamic IST careers. The examples I provide center around a course on discrete mathematics. However, the message is general.


Fig. 1. Heat map of user activity. Last status vs. record time (aggregated by month).
Fig. 2. Average public metrics and statuses count for active users.
Fig. 3. Dynamic frequency of Twitter conversations related to academics migrating to Mastodon.
The Failed Migration of Academic Twitter

June 2024

·

97 Reads

Following change in Twitter's ownership and subsequent changes to content moderation policies, many in academia looked to move their discourse elsewhere and migration to Mastodon was pursued by some. Our study looks at the dynamics of this migration. Utilizing publicly available user account data, we track the posting activity of academics on Mastodon over a one year period. Our analyses reveal significant challenges sustaining user engagement on Mastodon due to its decentralized structure as well as competition from other platforms such as Bluesky and Threads. The movement lost momentum after an initial surge of enthusiasm as most users did not maintain their activity levels, and those who did faced lower levels of engagement compared to Twitter. Our findings highlight the challenges involved in transitioning professional communities to decentralized platforms, emphasizing the need for focusing on migrating social connections for long-term user engagement.




Figure 2: Example of prompts and detection model performance to showcase the importance of context and intent in understanding online abuse.
Summary of existing taxonomies of digital abuse on social media.
The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content

May 2024

·

76 Reads

As social media has become a predominant mode of communication globally, the rise of abusive content threatens to undermine civil discourse. Recognizing the critical nature of this issue, a significant body of research has been dedicated to developing language models that can detect various types of online abuse, e.g., hate speech, cyberbullying. However, there exists a notable disconnect between platform policies, which often consider the author's intention as a criterion for content moderation, and the current capabilities of detection models, which typically lack efforts to capture intent. This paper examines the role of intent in content moderation systems. We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent. We propose strategic changes to the design and development of automated detection and moderation systems to improve alignment with ethical and policy conceptualizations of abuse.



Reacting to Generative AI: Insights from Student and Faculty Discussions on Reddit

January 2024

·

19 Reads

·

3 Citations

Generative Artificial intelligence (GenAI) such as ChatGPT has elicited strong reactions from almost all stakeholders across the education system. Education-oriented and academic social media communities provide an important venue for these stakeholders to share experiences and exchange ideas about GenAI, which is constructive for developing human-centered policies. This study examines early user reactions to GenAI, consisting of 725 Reddit threads between 06/2022 and 05/2023. Through natural language processing (NLP) and content analysis, we observe an increasingly negative sentiment in the discussion and identify six main categories of student and faculty experiences of GenAI in education. These experiences reflect concerns about academic integrity and AI's negative impact on the value of traditional education. Our analysis also highlights an additional workload imposed by new technologies. Our findings suggest that dialogue between stakeholders in the education community is critical and can mitigate sources of tension between students and faculty.


Citations (40)


... Future models could benefit from leveraging existing cultural knowledge datasets to enhance accuracy. Resources such as multilingual corpora with annotated cultural references [45], regional news datasets, and linguistically diverse fact-checking sources would help capture the unique attributes of each language. Also, a future enhancement could involve transfer learning from high-resource languages to lower-resource languages. ...

Reference:

LLaMA 3 vs. State-of-the-Art Large Language Models: Performance in Detecting Nuanced Fake News
Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

... While guiding students through classroom-level management may be an effective way to maintain academic quality [61] and address potential issues brought by AI, it should be noted that this responsibility may lead to an increased workload for faculty. Recent research has shown that faculty could complain about the increased workload due to the need to manage more AI-related academic integrity problems [62]. This is especially important as the lack of work-life balance has been a known problem for junior faculty in the United States [63]. ...

Reacting to Generative AI: Insights from Student and Faculty Discussions on Reddit
  • Citing Conference Paper
  • May 2024

... These studies collectively contribute to understanding the impact of LLMs in education, demonstrating their potential to enhance learning and creativity while also highlighting the importance of addressing trust, dependency, and ethical concerns [20]. However, we found that previous research primarily emphasizes the advantages of LLMs as tools, highlighting their effectiveness [25,28,49], or focuses on attitudes towards their use [41,47], such as concerns or trust [22], and discussions on learning models remain relatively under explored. Therefore, while our findings align with the overall trend in the literature, that LLMs are effective but can be optimized further, we specifically address research questions related to learning models, especially intentional learning and incidental learning. ...

Reacting to Generative AI: Insights from Student and Faculty Discussions on Reddit
  • Citing Preprint
  • January 2024

... Other works developed detection models that consider any type of personal information revealed by users as a single category [10,73,97]. A few studies also developed models capable of more fine-grained detection of multiple self-disclosure categories at once [4,52]. A major limitation in past work is that it mainly centers around operational changes for enhancing model performance in detecting self-disclosures, such as improving their accuracy or expanding their granularity of categories. ...

Online Self-Disclosure, Social Support, and User Engagement During the COVID-19 Pandemic
  • Citing Article
  • September 2023

ACM Transactions on Social Computing

... Previous research has explored the activity of IO accounts, such as state-sponsored trolls targeting the #BlackLivesMatter movement (Stewart, Arif, and Starbird 2018) and the 2016 U.S. Election (Badawy et al. 2019), and the differences in their activities across campaigns (Zannettou et al. 2019b). Some studies have shown how IO accounts leverage inauthentic or automated accounts to increase their prominence and artificially amplify messages (Linvill and Warren 2020;Elmas 2023), while being resilient to large-scale shutdown (Merhi, Rajtmajer, and Lee 2023). Researchers have reported on different tactics used by IO accounts, such as trolling (Zannettou et al. 2019a ...

Information Operations in Turkey: Manufacturing Resilience with Free Twitter Accounts
  • Citing Article
  • June 2023

Proceedings of the International AAAI Conference on Web and Social Media

... Similarly, studies have uncovered IOs backed by the Chinese government Carley, 2022, 2024). In contrast, IO detection in Spanish-explored in this paper-has received relatively less attention, despite Spanish-language accounts backed by the Cuban and Venezuelan governments playing a significant role in spreading misinformation and shaping political opinions on social media (Wang et al., 2023). Furthermore, Bogonez Muñoz (2023) demonstrated how Russian trolls strategically posted in Spanish to influence Spanish-speaking audiences during the Russia-Ukraine war. ...

Evidence of inter-state coordination amongst state-backed information operations

... It often employs ambiguous, metaphorical, and roundabout language (Giglietto and Lee 2017;Ana 1999), making identification challenging (Paz, Montero-Díaz, and Moreno-Delgado 2020). This complexity is compounded by the fact that hate speech can be intention-driven, and its interpretation can vary depending on the context and cultural background (Paz, Montero-Díaz, and Moreno-Delgado 2020;Wang, Wu, and Rajtmajer 2023). In the annotation guidelines for their hate speech dataset, Kennedy et al. (2022) define it as "language that intends to attack the dignity of a group of people -through rhetorical devices and contextual references -either by inciting violence, encouraging the incitement to violence, or inciting hatred." ...

From Yellow Peril to Model Minority: Asian stereotypes in social media during the COVID-19 pandemic
  • Citing Conference Paper
  • April 2023

... Non-Algorithmic Frameworks promoting inclusivity of diverse cultural, geographical, and demographic backgrounds when recruiting annotators and producing labels with uniform conceptualization. Additionally, a framework proposed by CruzCort et al. [24] advocates for employing structural changes; their framework (RISE) is established on four principles: Reformulate, Identify, Structuralize, and Expand. It takes a broader approach to consider sources of harm during problem formulation, analysis, and stakeholder identification. ...

Locality of Technical Objects and the Role of Structural Interventions for Systemic Change
  • Citing Conference Paper
  • June 2022

... A nascent literature is exploring artificial prediction markets -numerically simulated markets, populated by artificial agents (bot traders) for supervised learning of probability estimators [3]. Early work has demonstrated the plausibility of using a trained market as a supervised learning algorithm, achieving comparable performance to standard approaches on simple classification tasks [3,4,16,20]. We suggest the most promising opportunity afforded by artificial prediction markets is eventual human-AI collaboration -a market framework that supports human traders participating alongside agents to evaluate outcomes. ...

Design and analysis of a synthetic prediction market using dynamic convex sets
  • Citing Article
  • September 2021

Results in Control and Optimization

... This also highlights the importance of empathy in eliciting social concerns. 63 In regard to the comparison between health professionals and laypeople, the study found a significant difference in disclosing ). This may be due to the health literacy gap 67 between the two groups or the easier access to medical resources for health professionals during the lockdown period. ...

Self-disclosure on Twitter During the COVID-19 Pandemic: A Network Perspective
  • Citing Chapter
  • September 2021

Lecture Notes in Computer Science