Figure - available from: Sexuality & Culture
This content is subject to copyright. Terms and conditions apply.
Tweet 10—toxicity level: 91.88%

Tweet 10—toxicity level: 91.88%

Source publication
Article
Full-text available
Link to access full content: rdcu.be/b90gk Companies operating internet platforms are developing artificial intelligence tools for content moderation purposes. This paper discusses technologies developed to measure the ‘toxicity’ of text-based content. The research builds upon queer linguistic studies that have indicated the use of ‘mock impoliten...

Citations

... whether and how their own design and processes incentivise or enable online harms. For example, platforms could audit whether their automated approaches to content moderation (including their moderation of humour) over-identify, or fail to protect, those most affected by online harms, as current research suggests (Buolamwini & Gebru, 2018;Dias Oliva et al., 2021;Paasonen et al., 2019). These new regulations may also push platforms to seriously assess the risks associated with a steady build up of unsavoury (though not individually harmful) content/conduct, including some forms of humour, and to tap into a broader suite of proportionate remedies to address the wide variety and complex manifestations of online harms, beyond content removal and user bans. ...
Article
Full-text available
This paper makes a case for addressing humour as an online safety issue so that social media platforms can include it in their risk assessments and harm mitigation strategies. We take the ‘online safety’ regulation debate, especially as it is taking place in the UK and the European Union, as an opportunity to reconsider how and when humour targeted at historically marginalised groups can cause harm. Drawing on sociolegal literature, we argue that in their online safety efforts, platforms should address lawful humour targeted at historically marginalised groups because it can cause individual harm via its cumulative effects and contribute to broader social harms. We also demonstrate how principles and concepts from critical humour studies and Feminist Standpoint Theory can help platforms assess the differential impacts of humour.
... Students could be asked to provide examples of how other groups are affected by false negatives or positives in content moderation. Scholarly works about bias against marginalized groups in speech toxicity detection systems could also be discussed [14,27,30]. Instructors using this assignment could consider using in-class discussion rather than written responses for the reflection piece of the assignment. ...
Preprint
Full-text available
There is a growing movement in undergraduate computer science (CS) programs to embed ethics across CS classes rather than relying solely on standalone ethics courses. One strategy is creating assignments that encourage students to reflect on ethical issues inherent to the code they write. Building off prior work that has surveyed students after doing such assignments in class, we conducted focus groups with students who reviewed a new introductory ethics-based CS assignment. In this experience report, we present a case study describing our process of designing an ethics-based assignment and proposing the assignment to students for feedback. Participants in our focus groups not only shared feedback on the assignment, but also on the integration of ethics into coding assignments in general, revealing the benefits and challenges of this work from a student perspective. We also generated novel ethics-oriented assignment concepts alongside students. Deriving from tech controversies that participants felt most affected by, we created a bank of ideas as a starting point for further curriculum development.
... Indeed, xenophobic harms pertaining to civic ostracism may follow, for example, from online public shaming [8] and amplified hate speech aimed at marginalised communities. Finally, indirect xenophobic harm may result indirectly from the shaping of public opinion, as well as directly by silencing minority views [126,230] and weaponising content moderation against foreign individuals, resulting in their effective exclusion from public forums and an inability to influence future decisions. To address these concerns, greater transparency and contestability need to be incorporated in content moderation system design [313], to ensure fair processes and help mitigate the consequences of potential algorithmic biases. ...
Preprint
Xenophobia is one of the key drivers of marginalisation, discrimination, and conflict, yet many prominent machine learning (ML) fairness frameworks fail to comprehensively measure or mitigate the resulting xenophobic harms. Here we aim to bridge this conceptual gap and help facilitate safe and ethical design of artificial intelligence (AI) solutions. We ground our analysis of the impact of xenophobia by first identifying distinct types of xenophobic harms, and then applying this framework across a number of prominent AI application domains, reviewing the potential interplay between AI and xenophobia on social media and recommendation systems, healthcare, immigration, employment, as well as biases in large pre-trained models. These help inform our recommendations towards an inclusive, xenophilic design of future AI systems.
... Such controlling images have appeared in ranking and retrieval systems, including reinforcing false perceptions of criminality by displaying ads for bail bond businesses when searching for Black-sounding names versus white-sounding names [197]. Similarly, patterns of demeaning imagery have been found in hateful natural language predictions about Muslim people [1], and toxicity and sentiment classifiers that are more likely to classify descriptions or mentions of disabilities [99,197] and LGBTQ identities [200,223] as toxic or negative. As these identities are often weaponized, models struggle with the social nuance and context required to distinguish between hateful and non-hateful speech [223]. ...
... 20). Similarly, for content creators, desire to maintain visibility or prevent shadow banning, may lead to increased conforming of content [200]. ...
Preprint
Full-text available
Understanding the landscape of potential harms from algorithmic systems enables practitioners to better anticipate consequences of the systems they build. It also supports the prospect of incorporating controls to help minimize harms that emerge from the interplay of technologies and social and cultural dynamics. A growing body of scholarship has identified a wide range of harms across different algorithmic technologies. However, computing research and practitioners lack a high level and synthesized overview of harms from algorithmic systems arising at the micro-, meso-, and macro-levels of society. We present an applied taxonomy of sociotechnical harms to support more systematic surfacing of potential harms in algorithmic systems. Based on a scoping review of computing research (n=172), we identified five major themes related to sociotechnical harms — representational, allocative, quality-of-service, interpersonal harms, and social system/societal harms — and sub-themes. We describe these categories and conclude with a discussion of challenges and opportunities for future research.
... These models are especially important to historically marginalized groups, who are more frequently the target of * Work done while at Twitter online harassment and hate speech (International, 2018;Vogels, 2021). However, previous research indicates that these models often have higher false positive rates for marginalized communities, such as the Black community, women, and the LGBTQ community (Sap et al., 2019;Oliva et al., 2021;Park et al., 2018). Within the context of social media, higher false positive rates for a specific subgroup pose the risk of reduced visibility, where the community loses the opportunity to voice their opinion on the platform. ...
... "dyke marches" or "slut walks" to draw awareness to issues of stigma and discrimination) (Brontsema, 2004;Nunberg, 2018). Re-appropriation can be leveraged to express ingroup solidarity and shared history (Croom, 2011;Ritchie, 2017) and "mock impoliteness" has been demonstrated to help LGBTQ people deal with hostility (Oliva et al., 2021;Murray, 1979;Jones Jr, 2007;McKinnon, 2017). ...
Preprint
Harmful content detection models tend to have higher false positive rates for content from marginalized groups. In the context of marginal abuse modeling on Twitter, such disproportionate penalization poses the risk of reduced visibility, where marginalized communities lose the opportunity to voice their opinion on the platform. Current approaches to algorithmic harm mitigation, and bias detection for NLP models are often very ad hoc and subject to human bias. We make two main contributions in this paper. First, we design a novel methodology, which provides a principled approach to detecting and measuring the severity of potential harms associated with a text-based model. Second, we apply our methodology to audit Twitter's English marginal abuse model, which is used for removing amplification eligibility of marginally abusive content. Without utilizing demographic labels or dialect classifiers, we are still able to detect and measure the severity of issues related to the over-penalization of the speech of marginalized communities, such as the use of reclaimed speech, counterspeech, and identity related terms. In order to mitigate the associated harms, we experiment with adding additional true negative examples and find that doing so provides improvements to our fairness metrics without large degradations in model performance.
... This is a tradeoff one has to make to measure the amount of violating content with a wide-lens across an entire platform. And although we believe that the eight categories of macro norms we used in our analysis give us a more nuanced understanding of what is going on in these communities than catch-all metrics (e.g., toxicity), we note that any work that aims to take such an approach needs to be cautious so as to not unfairly flag minority communities as being more violating [57] just because they deviate from certain macro norms. ...
Preprint
Full-text available
With increasing attention to online anti-social behaviors such as personal attacks and bigotry, it is critical to have an accurate accounting of how widespread anti-social behaviors are. In this paper, we empirically measure the prevalence of anti-social behavior in one of the world's most popular online community platforms. We operationalize this goal as measuring the proportion of unmoderated comments in the 97 most popular communities on Reddit that violate eight widely accepted platform norms. To achieve this goal, we contribute a human-AI pipeline for identifying these violations and a bootstrap sampling method to quantify measurement uncertainty. We find that 6.25% (95% Confidence Interval [5.36%, 7.13%]) of all comments in 2016, and 4.28% (95% CI [2.50%, 6.26%]) in 2020-2021, are violations of these norms. Most anti-social behaviors remain unmoderated: moderators only removed one in twenty violating comments in 2016, and one in ten violating comments in 2020. Personal attacks were the most prevalent category of norm violation; pornography and bigotry were the most likely to be moderated, while politically inflammatory comments and misogyny/vulgarity were the least likely to be moderated. This paper offers a method and set of empirical results for tracking these phenomena as both the social practices (e.g., moderation) and technical practices (e.g., design) evolve.
... For example, despite Facebook's claim to defend all races and genders equally (Angwin et al., 2017), marginalized users who rearticulate the hate speech directed at them in their effort to expose and confront the injustice face the same content removal as their antagonists (Jan and Dwoskin, 2017). The wide margins of error and the lack of context and nuance of current computational solutions to content moderation make such interventions prone to overcorrection and are thus inadequate for many users, especially for the most marginalized (see also Buolamwini and Gebru, 2018;Oliva et al., 2021). ...
Article
Full-text available
Research suggests that marginalized social media users face disproportionate content moderation and removal. However, when content is removed or accounts suspended, the processes governing content moderation are largely invisible, making assessing content moderation bias difficult. To study this bias, we conducted a digital ethnography of marginalized users on Reddit’s /r/FTM subreddit and Twitch’s “Just Chatting” and “Pools, Hot Tubs, and Beaches” categories, observing content moderation visibility in real time. We found that on Reddit, a text-based platform, platform tools make content moderation practices invisible to users, but moderators make their practices visible through communication with users. Yet on Twitch, a live chat and streaming platform, content moderation practices are visible in channel live chats, “unban appeal” streams, and “back from my ban” streams. Our ethnography shows how content moderation visibility differs in important ways between social media platforms, harming those who must see offensive content, and at other times, allowing for increased platform accountability.
... This enables the misclassification of positive phrases like "she makes me happy to be gay". Even Twitter accounts belonging to drag queens have been rated higher in terms of average toxicity than the accounts associated with white nationalists (Oliva et al., 2021). These findings underline how language models with faulty correlations can facilitate the censorship of productive conversations held by marginalized communities. ...
Preprint
Full-text available
Uses of pejorative expressions can be benign or actively empowering. When models for abuse detection misclassify these expressions as derogatory, they inadvertently censor productive conversations held by marginalized groups. One way to engage with non-dominant perspectives is to add context around conversations. Previous research has leveraged user- and thread-level features, but it often neglects the spaces within which productive conversations take place. Our paper highlights how community context can improve classification outcomes in abusive language detection. We make two main contributions to this end. First, we demonstrate that online communities cluster by the nature of their support towards victims of abuse. Second, we establish how community context improves accuracy and reduces the false positive rates of state-of-the-art abusive language classifiers. These findings suggest a promising direction for context-aware models in abusive language research.
... Noble's "Algorithms of Oppression" discusses how the outcomes of aggregated data analysis often reflect societal attitudes as opposed to truths (e.g., criminalized black men and sexualization of black girls in Google search) [74]. Similarly, queer users often find themselves more heavily moderated, scoring as more "toxic" than racist content due to the use of "mock impoliteness" in queer communities [36]. Prior work has also shown queer folk -especially transgender individuals -feel at risk of harassment even in safe spaces on public platforms [87]. ...
Article
Full-text available
Centralized online social networks --- e.g., Facebook, Twitter and TikTok --- help drive social connection on the Internet, but have nigh unfettered access to monitor and monetize the personal data of their users. This centralization can especially undermine the use of the social internet by minority populations, who disproportionately bear the costs of institutional surveillance. We introduce a new class of privacy-enhancing technology --- decentralized privacy overlays (DePOs) --- that helps cOSN users regain some control over their personal data by allowing them to selectively share secret content on cOSNs through decentralized content distribution networks. As a first step, we present an implementation and user evaluation of Image DePO, a proof-of-concept design probe that allows users to upload and share secret photos on Facebook through the Interplanetary File System peer-to-peer protocol. We qualitatively evaluated Image DePO in a controlled, test environment with 19 queer and Black, Indigenous, (and) Person of Color (BIPOC) participants. We found that while Image DePO could help address the institutional threats with which our participants expressed concern, interpersonal threats were the more salient concern in their decisions to share content. Accordingly, we argue that in order to see widespread use, DePOs must align protection against abstract institutional threats with protection against the more salient interpersonal threats users consider when making specific sharing decisions.
... The prayer read "God, before the end of this holy day forgive our sins, bless us and our loved ones in this life and the afterlife with your mercy almighty." Further, as found byOliva et al (2021), such technologies are just not cut out to pick up on the language used, for example, by the LGBTQ community whose 'mock impoliteness" and use of terms such as 'dyke', 'fag' and 'tranny' occurs as a form of reclamation of power and a means of preparing members of this community to 'cope with hostility.' Oliva etal (2021) give several reports from LGBTQ activists of content removal, such as the banning of a trans women from Facebook after she displayed a photograph of her new hairstyle and referred to herself as a 'tranny.' ...
Preprint
Artificial Intelligence is increasingly being used by social media platforms to tackle online hate speech. The sheer quantity of content, the speed at which is it developed and the enhanced pressure companies are facing by States to remove hate speech quickly from their platforms have led to a tricky situation. This commentary argues that automated mechanisms, which may have biased datasets and be unable to pick up on the nuances of language, should not be left unattended with hate speech as this can lead to issues of violating freedom of expression and the right to non-discrimination.