Milagros MiceliWeizenbaum Institute for the Networked Society
Milagros Miceli
Doctor of Engineering
About
28
Publications
8,251
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
544
Citations
Introduction
I'm a doctoral researcher working at the intersection of data, humans, and power. I investigate how classification practices impact traning data for machine learning.
Additional affiliations
April 2022 - present
DAIR Institute
Position
- Research fellow
Education
November 2019 - December 2022
April 2016 - September 2019
April 2015 - February 2016
Publications
Publications (28)
The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotat...
In industrial computer vision, discretionary decisions surrounding the production of image training data remain widely undocumented. Recent research taking issue with such opacity has proposed standardized processes for dataset documentation. In this paper, we expand this space of inquiry through fieldwork at two data processing companies and thirt...
Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor...
Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by study...
In this paper, we analyze the relation between biased data- driven outcomes and practices of data annotation for vision models, by placing them in the context of market economy. Understanding data annotation as a sense-making process, we investigate which goals are prioritized by decision-makers throughout the annotation of datasets. Following a qu...
Data work plays a fundamental role in the development of algorithmic systems and the AI industry. It is often performed in business process outsourcing (BPO) companies and crowdsourcing platforms, involving a global and distributed workforce as well as networks of collaborative actors. Previous work on community building among data workers centers...
Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by study...
The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets towards documenting data production....
The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets toward documenting data production....
Research in machine learning (ML) has argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor condition...
Even after decades of intensive research and public debates, the topic of data privacy remains surrounded by confusion and misinformation. Many people still struggle to grasp the importance of privacy, which has far-reaching consequences for social norms, jurisprudence, and legislation. Discussions on personal data misuse often revolve around a few...
Algorithmic and data-driven systems have been introduced to assist Public Employment Services (PES) in various countries. However , their deployment has been heavily criticized. This paper is based on a workshop organized by a distributed team of researchers in AI ethics and adjacent fields, which brought together academics, system developers , rep...
This abstract has been worked out for an open space presentation at the virtual DigiMeet networking event at the Weizenbaum Institute for the Networked Society. In the session, we aim at uncovering the domain of sustainable data production for increasingly networked societies. Combining human-computer interaction (HCI) research with social science...
Developers of computer vision algorithms outsource some of the labor involved in annotating training data through business process outsourcing companies and crowdsourcing platforms. Many data annotators are situated in the Global South and are considered independent contractors. This paper focuses on the experiences of Argentinian and Venezuelan an...
The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotat...
The work of data annotators is fundamental to machine learning (ML). This paper summarizes the goals and preliminary findings of our investigation into work practices at the intersection of data annotation and machine learning engineering. We conducted several weeks of fieldwork at two annotation companies, analyzing which structures, power relatio...
Data is the fuel of machine learning. How training datasets are produced, i.e. preconceptions, interests, and power imbalances encoded in data, decisively shapes ML systems. This paper summarizes an ongoing research project, its goals, and results at the intersection of data creation and deployment in machine learning products. We argue for the inc...
The work of data annotators is fundamental to machine
learning and, more broadly, to contemporary
knowledge production. This paper summarizes the
goals and results of our investigation into work
practices of data annotation. Guided by Grounded
Theory, we conducted several weeks of fieldwork at two
annotation companies, analyzing which structures,
p...
The work of data annotators is fundamental to machine learning and, more broadly, to contemporary knowledge production. This paper summarizes the preliminary results of our investigation into data annotation for vision models. Following a qualitative design, this research project analyzes labeling practices and their possible effects on data and sy...
Following a qualitative design, this paper analyzes the relation between biased data-driven outcomes and practices of data annotation for vision models and investigate which goals are prioritized by decision-makers throughout the annotation of datasets.
Building on Bourdieu’s concept of symbolic power, this article analyses the key role of classifications for automation and investigates how power is increasingly exercised through machine learning systems by means of categorization and designation. Previous research in different disciplines has raised questions on accountability for algorithms' per...