Debora Nozza

Debora Nozza
Università commerciale Luigi Bocconi | Bocconi · Department of Computing Science

About

45
Publications
2,396
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
857
Citations
Citations since 2016
44 Research Items
857 Citations
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
2016201720182019202020212022050100150200250
Introduction
Debora Nozza is a Postdoctoral Research Fellow in Computer Science at the DMI unit of Bocconi University. Her research interests mainly focus on Natural Language Processing, specifically on the detection and counter-acting of hate speech and algorithmic bias on Social Media data.
Additional affiliations
October 2016 - present
Università degli Studi di Milano-Bicocca
Position
  • Laboratory Assistant
Description
  • Laboratory assistant for the course Mathematics and Computer Science Laboratory (Laboratorio di Matematica e Informatica), Bachelor Degree in Mathematics.
Education
November 2011 - July 2014
Università degli Studi di Milano-Bicocca
Field of study
  • Computer Science

Publications

Publications (45)
Preprint
Full-text available
Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exist in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manu...
Preprint
Machine learning models are now able to convert user-written text descriptions into naturalistic images. These models are available to anyone online and are being used to generate millions of images a day. We investigate these models and find that they amplify dangerous and complex stereotypes. Moreover, we find that the amplified stereotypes are d...
Preprint
Full-text available
Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the development of more effective hate speech detection models in hundreds of languages spoken by billions across the world. More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful...
Preprint
Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While...
Preprint
Full-text available
Language is constantly changing and evolving, leaving language models to quickly become outdated, both factually and linguistically. Recent research proposes we continuously update our models using new data. Continuous training allows us to teach language models about new events and facts and changing norms. However, continuous training also means...
Preprint
Full-text available
Many interpretability tools allow practitioners and researchers to explain Natural Language Processing systems. However, each tool requires different configurations and provides explanations in different forms, hindering the possibility of assessing and comparing them. A principled, unified evaluation benchmark will guide the users through the cent...
Preprint
Full-text available
Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced fu...
Preprint
Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Mos...
Preprint
Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, sentiment, entailment, or speaker properties should be the same in a translation and original of a text. We introduce language invariant properties: i.e., properties that should not change when we transform text, and...
Article
The task of Named Entity Recognition (NER) is aimed at identifying named entities in a given text and classifying them into pre-defined domain entity types such as persons, organizations, locations. Most of the existing NER systems make use of generic entity type classification schemas, however, the comparison and integration of (more or less) diff...
Conference Paper
Full-text available
Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addresse...
Preprint
Full-text available
Many data sets in a domain (reviews, forums, news, etc.) exist in parallel languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by...
Preprint
Full-text available
Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional Encoder Representations from Transformers), which enables researchers to obtain state-of-the art per...
Article
In this paper we deal with complex attributed graphs which can exhibit rich connectivity patterns and whose nodes are often associated with attributes, such as text or images. In order to analyze these graphs, the primary challenge is to find an effective way to represent them by preserving both structural properties and node attribute information....
Conference Paper
During the last years, the phenomenon of hate against women increased exponentially especially in online environments such as microblogs. Although this alarming phenomenon has triggered many studies both from computational linguistic and machine learning points of view, less effort has been spent to analyze if those misogyny detection models are af...
Chapter
The huge amount of textual user-generated content on the Web has incredibly grown in the last decade, creating new relevant opportunities for different real-world applications and domains. In particular, microblogging platforms enables the collection of continuously and instantly updated information. The organization and extraction of valuable know...
Chapter
Automatic Misogyny Identification (AMI) is a new shared task proposed for the first time at the Evalita 2018 evaluation campaign. The AMI challenge, based on both Italian and English tweets, is distinguished into two subtasks, i.e. Subtask A on misogyny identification and Subtask B about misogynistic behaviour categorization and target classificati...
Conference Paper
Numerous state-of-the-art Named Entity Recognition (NER) systems use different classification schemas/ontologies. Comparisons and integration among NER systems, thus, becomes complex. In this paper, we propose a transfer-learning approach where we use supervised learning methods to automatically learn mappings between ontologies of NER systems, whe...
Article
Numerous state-of-the-art Named Entity Recognition (NER) systems use different classification schemas/ontologies. Comparisons and integration among NER systems, thus, becomes complex. In this paper, we propose a transfer-learning approach where we use supervised learning methods to automatically learn mappings between ontologies of NER systems , wh...
Conference Paper
Full-text available
English. This paper describes the framework proposed by the UNIMIB Team for the task of Named Entity Recognition and Linking of Italian tweets (NEEL-IT). The proposed pipeline, which represents an entry level system, is composed of three main steps: (1) Named Entity Recognition using Conditional Random Fields, (2) Named Entity Linking by considerin...
Conference Paper
Full-text available
The growing availability of social media platforms, in particular microblogs such as Twitter, opened new way to people for expressing their opinions. Sentiment Analysis aims at inferring the polarity of these opinions, but most of the existing approaches are based only on text, disregarding information that comes from the relationships among users...

Network

Cited By