Danaë Metaxa’s research while affiliated with University of Pennsylvania and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (27)


Learning About Algorithm Auditing in Five Steps: Scaffolding How High School Youth Can Systematically and Critically Evaluate Machine Learning Applications
  • Article

April 2025

·

1 Read

Proceedings of the AAAI Conference on Artificial Intelligence

·

·

Lauren Vogelstein

·

[...]

·

Danaë Metaxa

While there is widespread interest in supporting young people to critically evaluate machine learning-powered systems, there is little research on how we can support them in inquiring about how these systems work and what their limitations and implications may be. Outside of K-12 education, an effective strategy in evaluating black-boxed systems is algorithm auditing—a method for understanding algorithmic systems’ opaque inner workings and external impacts from the outside in. In this paper, we review how expert researchers conduct algorithm audits and how end users engage in auditing practices to propose five steps that, when incorporated into learning activities, can support young people in auditing algorithms. We present a case study of a team of teenagers engaging with each step during an out-of-school workshop in which they audited peer-designed generative AI TikTok filters. We discuss the kind of scaffolds we provided to support youth in algorithm auditing and directions and challenges for integrating algorithm auditing into classroom activities. This paper contributes: (a) a conceptualization of five steps to scaffold algorithm auditing learning activities, and (b) examples of how youth engaged with each step during our pilot study.


Figure 1: Examples of faces altered by TikTok filters from the EAP workshop.
Figure 2: Two teachers presenting their auditing projects at the Teen Advisory Session.
Teacher-designers' auditing projects (hypothesis, input images, and results)
Youth as Advisors in Participatory Design: Situating Teens' Expertise in Everyday Algorithm Auditing with Teachers and Researchers
  • Preprint
  • File available

April 2025

·

15 Reads

Research on children and youth's participation in different roles in the design of technologies is one of the core contributions in child-computer interaction studies. Building on this work, we situate youth as advisors to a group of high school computer science teacher- and researcher-designers creating learning activities in the context of emerging technologies. Specifically, we explore algorithm auditing as a potential entry point for youth and adults to critically evaluate generative AI algorithmic systems, with the goal of designing classroom lessons. Through a two-hour session where three teenagers (16-18 years) served as advisors, we (1) examine the types of expertise the teens shared and (2) identify back stage design elements that fostered their agency and voice in this advisory role. Our discussion considers opportunities and challenges in situating youth as advisors, providing recommendations for actions that researchers, facilitators, and teachers can take to make this unusual arrangement feasible and productive.

Download

Fig. 1. Participant self-reported household income from $0 to $1,500,000, with a median of $100,000.
Fig. 3. Participants reported applying to 0 to 600 jobs, with 50% applying to more than 20.
Fig. 4. Participants used referrals in anywhere from 0% to 100% of applications, with over half of participants having at least one referral.
Fig. 5. Participants reporting higher annual household incomes were also more likely to receive one or more job offers.
Navigating Automated Hiring: Perceptions, Strategy Use, and Outcomes Among Young Job Seekers

February 2025

·

59 Reads

As the use of automated employment decision tools (AEDTs) has rapidly increased in hiring contexts, especially for computing jobs, there is still limited work on applicants' perceptions of these emerging tools and their experiences navigating them. To investigate, we conducted a survey with 448 computer science students (young, current technology job-seekers) about perceptions of the procedural fairness of AEDTs, their willingness to be evaluated by different AEDTs, the strategies they use relating to automation in the hiring process, and their job seeking success. We find that young job seekers' procedural fairness perceptions of and willingness to be evaluated by AEDTs varied with the level of automation involved in the AEDT, the technical nature of the task being evaluated, and their own use of strategies, such as job referrals. Examining the relationship of their strategies with job outcomes, notably, we find that referrals and family household income have significant and positive impacts on hiring success, while more egalitarian strategies (using free online coding assessment practice or adding keywords to resumes) did not. Overall, our work speaks to young job seekers' distrust of automation in hiring contexts, as well as the continued role of social and socioeconomic privilege in job seeking, despite the use of AEDTs that promise to make hiring "unbiased."


Figure 1: Ziyi tested the filter with the default images available on Effect House. Figure displays two input and output pairs.
Figure 2: Inputs generated by Ziyi and Ishmael on the twoaxis organizer.
Figure 3: Table with input and output pairs with notes.
Figure 4: Input-output pair for the first test conducted by Ziyi and Ishmael.
Auditing steps and scaffolds provided to youth.
Learning About Algorithm Auditing in Five Steps: Scaffolding How High School Youth Can Systematically and Critically Evaluate Machine Learning Applications

December 2024

·

46 Reads

While there is widespread interest in supporting young people to critically evaluate machine learning-powered systems, there is little research on how we can support them in inquiring about how these systems work and what their limitations and implications may be. Outside of K-12 education, an effective strategy in evaluating black-boxed systems is algorithm auditing-a method for understanding algorithmic systems' opaque inner workings and external impacts from the outside in. In this paper, we review how expert researchers conduct algorithm audits and how end users engage in auditing practices to propose five steps that, when incorporated into learning activities, can support young people in auditing algorithms. We present a case study of a team of teenagers engaging with each step during an out-of-school workshop in which they audited peer-designed generative AI TikTok filters. We discuss the kind of scaffolds we provided to support youth in algorithm auditing and directions and challenges for integrating algorithm auditing into classroom activities. This paper contributes: (a) a conceptualization of five steps to scaffold algorithm auditing learning activities, and (b) examples of how youth engaged with each step during our pilot study.


Lower Quantity, Higher Quality: Auditing News Content and User Perceptions on Twitter/X Algorithmic versus Chronological Timelines

November 2024

·

13 Reads

·

2 Citations

Proceedings of the ACM on Human-Computer Interaction

Social media personalization algorithms increasingly influence the flow of civic information through society, resulting in concerns about "filter bubbles'', "echo chambers'', and other ways they might exacerbate ideological segregation and fan the spread of polarizing content. To address these concerns, we designed and conducted a sociotechnical audit (STA) to investigate how Twitter/X's timeline algorithm affects news curation while also tracking how user perceptions change in response. We deployed a custom-built system that, over the course of three weeks, passively tracked all tweets loaded in users' browsers in the first week, then in the second week enacted an intervention to users' Twitter/X homepage to restrict their view to only the algorithmic or chronological timeline (randomized). We flipped this condition for each user in the third week. We ran our audit in late 2023, collecting user-centered metrics (self-reported survey measures) and platform-centered metrics (views, clicks, likes) for 243 users, along with over 800,000 tweets. Using the STA framework, our results are two-fold: (1) Our algorithm audit finds that Twitter/X's algorithmic timeline resulted in a lower quantity but higher quality of news --- less ideologically congruent, less extreme, and slightly more reliable --- compared to the chronological timeline. (2) Our user audit suggests that although our timeline intervention had significant effects on users' behaviors, it had little impact on their overall perceptions of the platform. Our paper discusses these findings and their broader implications in the context of algorithmic news curation, user-centric audits, and avenues for independent social science research.



User-Centric Behavioral Tracking: Lessons from Three Case Studies with Do-It-Yourself Computational Pipelines

October 2024

·

15 Reads

·

1 Citation


User-centric behavioral tracking: Lessons from three case studies with do-it-yourself computational pipelines

October 2024

·

5 Reads

User-centric behavioral tracking, a cutting-edge computational social science tool, holds tremendous promise for advertising research. The article introduces the technique and presents three Do-It-Yourself (DIY) case studies, where researchers develop tracking applications and platforms, build infrastructures that host participants, maintain computational pipelines logging user behavior and content exposure, and manage logistical in-and-outs such as onboarding, offboarding, and compensations all by themselves. We share our lessons, discuss challenges ahead of DIY user-centric behavioral tracking, and advocate for computational advertising scholars to become pioneers in this emerging body of work.


Fig. 1. Profile Photos. The two image sets, one per demographic group, used in each study.
Fig. 4. The Effect of AI Suspicion on Quality Evaluations AI Suspicion negatively impacts evaluations of writing quality.
Fig. 5. Interaction Effects between Writing Style and Gender. Participants evaluated the control more favorably than the AI-inducing writing style.
Fig. 10. The Effect of AI Suspicion on Hiring. Participants are less likely to hire a freelancer they suspect of using AI.
Generative AI and Perceptual Harms: Who's Suspected of using LLMs?

October 2024

·

53 Reads

Large language models (LLMs) are increasingly integrated into a variety of writing tasks. While these tools can help people by generating ideas or producing higher quality work, like many other AI tools they may risk causing a variety of harms, disproportionately burdening historically marginalized groups. In this work, we introduce and evaluate perceptual harm, a term for the harm caused to users when others perceive or suspect them of using AI. We examined perceptual harms in three online experiments, each of which entailed human participants evaluating the profiles for fictional freelance writers. We asked participants whether they suspected the freelancers of using AI, the quality of their writing, and whether they should be hired. We found some support for perceptual harms against for certain demographic groups, but that perceptions of AI use negatively impacted writing evaluations and hiring outcomes across the board.


The automated content moderation APIs and associated characteristics.
Bounds on the flagging threshold for scores for each of OpenAI's moderation endpoint categories based on running all instances of each dataset through that API.
Identity-related Speech Suppression in Generative AI Content Moderation

September 2024

·

42 Reads

Automated content moderation has long been used to help identify and filter undesired user-generated content online. Generative AI systems now use such filters to keep undesired generated content from being created by or shown to users. From classrooms to Hollywood, as generative AI is increasingly used for creative or expressive text generation, whose stories will these technologies allow to be told, and whose will they suppress? In this paper, we define and introduce measures of speech suppression, focusing on speech related to different identity groups incorrectly filtered by a range of content moderation APIs. Using both short-form, user-generated datasets traditional in content moderation and longer generative AI-focused data, including two datasets we introduce in this work, we create a benchmark for measurement of speech suppression for nine identity groups. Across one traditional and four generative AI-focused automated content moderation services tested, we find that identity-related speech is more likely to be incorrectly suppressed than other speech except in the cases of a few non-marginalized groups. Additionally, we find differences between APIs in their abilities to correctly moderate generative AI content.


Citations (16)


... In this paper, we query the r/popular feed for 11 months and examine its ranking decisions to infer details about its internals and impact on engagement. Our study complements the large corpus of existing social media algorithm audits performed on platforms like X/Twitter (Wang et al. 2024), YouTube (Ribeiro et al. 2020;Liu, Wu, and Resnick 2024), Facebook (González-Bailón et al. 2023), and Tik-Tok (Mousavi, Gummadi, and Zannettou 2024) to name a few. Additionally, prior studies often focus on political bias that may be built into curation algorithms which differs from our focus on the factors that influence, and are influenced by, popularity. ...

Reference:

Examining Algorithmic Curation on Social Media: An Empirical Audit of Reddit's r/popular Feed
Lower Quantity, Higher Quality: Auditing News Content and User Perceptions on Twitter/X Algorithmic versus Chronological Timelines
  • Citing Article
  • November 2024

Proceedings of the ACM on Human-Computer Interaction

... Research also show candidates distrust feedback from LLMs in mock interviews [140]. In addition, prior work shows AI-based resume screening can incorporate bias against marginalized populations [11,154], such as ethnicity/race [14], disability [55], and gender [91]. In this work, we aim to explore methods to incorporate evidence demonstrating the technical abilities of candidates into modern AI-based hiring pipelines. ...

The Silicon Ceiling: Auditing GPT’s Race and Gender Biases in Hiring
  • Citing Conference Paper
  • October 2024

... Whereas most algorithm auditing research has been conducted by expert researchers [2], research on everyday algorithm auditing recognizes that end-users who are non-expert auditors are able to detect harmful algorithmic behaviors and make contributions that would otherwise be unnoticed [12,45]. For instance, research shows that young people are capable of identifying harmful behaviors in the technologies they are familiar with [30,48]. A more recent study with teens found that they were able to conduct an algorithm audit, which included articulating hypotheses, conducting tests, and producing reports, uncovering behaviors that had not been previously studied by expert researchers [31]. ...

Youth as Peer Auditors: Engaging Teenagers with Algorithm Auditing of Machine Learning Applications

... It is widely used to measure the toxic outcomes and biases of LLMs [32], and it has been suggested that Perspective API has become a cornerstone for academic research on online abuse and incivility [75]. Meanwhile, OpenAI's Moderation API functions as the hate speech filter for ChatGPT and GPT models [57]. ...

Auditing GPT's Content Moderation Guardrails: Can ChatGPT Write Your Favorite TV Show?
  • Citing Conference Paper
  • June 2024

... Patient-centered care has been at the forefront of healthcare for over a decade, with previous work identifying how this critical issue has been a concern for both healthcare organizations and patients 1 . With the rapid advancement of artificial intelligence in healthcare, specifically generative AI 2 , ensuring that AI-assisted systems integrated into health infrastructures do not compromise patient-centered care has become imperative. ...

Explainable Notes: Examining How to Unlock Meaning in Medical Notes with Interactivity and Artificial Intelligence
  • Citing Conference Paper
  • May 2024

... While workers recognized the advantages of these features, they also expressed concerns about the technical proficiency needed to effectively utilize custom-made GAI tools in their work. In response, several studies used off-the-shelf GAI technologies (e.g., ChatGPT) to understand the benefits and liabilities these tools brought to individuals' work [57,71,80]. Mirowski et al. [75] conducted an experimental study with screenwriters that revealed tensions between the writers' creative instincts and GAI's functionalities, highlighting the limitations of decontextualized evaluations of such systems and underscoring the need for a deeper understanding of GAI integration in writing fields. ...

The Role of Inclusion, Control, and Ownership in Workplace AI-Mediated Communication
  • Citing Conference Paper
  • May 2024

... Past research shows that AI users can detect and raise awareness of biased and harmful AI behaviors by leveraging their lived experiences, building on one anothers' findings, and generating and testing hypotheses about AI harms [36,108]. There has been rapidly growing interest among both researchers and practitioners in engaging end users in auditing AI products and services: one line of recent research has explored systems and processes to enable user engagement in auditing [6,17,31,56,63,84,85], while another has focused on understanding industry AI practitioners' needs and challenges around effective user engagement [30,34,86]. However, a critical gap remains in connecting the needs, challenges, and perspectives of these two groups of stakeholders. ...

Supporting User Engagement in Testing, Auditing, and Contesting AI
  • Citing Conference Paper
  • October 2023

... Algorithm auditing involves "repeatedly querying an algorithm and observing its output in order to draw conclusions about the algorithm's opaque inner workings and possible external impact" (Metaxa et al. 2021b). The majority of algorithm audits have been conducted either by experts or adult end-users to identify potential problematic system behaviors in AI/ML-powered systems (Bandy 2021;Lam et al. 2023). More recently, this research has been expanded to examine how high school youth participate in auditing practices (Solyst et al. 2023;Walker, Sherif, and Breazeal 2022). ...

Sociotechnical Audits: Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising
  • Citing Article
  • October 2023

Proceedings of the ACM on Human-Computer Interaction

... There are also demographic disparities in individuals' exposure to harmful ad content, as older adults and racial minorities in the US have been found to encounter problematic ads on Facebook more often than other groups [6]. Ad content can also be particularly harmful to LGBTQ+ individuals with its lack of queer representation and tokenizing nature [84]. ...

Representation, Self-Determination, and Refusal: Queer People’s Experiences with Targeted Advertising
  • Citing Conference Paper
  • June 2023

... First, platforms should conduct regular, independent audits to assess their effectiveness in maintaining inclusivity. For example, effective audit systems are often sought in research examining algorithmic harm [46]. Similarly, on a novel online platform like social VR-unlike the traditional 2D social media that we are more familiar with-such systems are crucial. ...

End-User Audits: A System Empowering Communities to Lead Large-Scale Investigations of Harmful Algorithmic Behavior
  • Citing Article
  • November 2022

Proceedings of the ACM on Human-Computer Interaction