Milagros Miceli

Milagros Miceli
Weizenbaum Institute for the Networked Society

Doctor of Engineering

About

28
Publications
8,251
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
544
Citations
Introduction
I'm a doctoral researcher working at the intersection of data, humans, and power. I investigate how classification practices impact traning data for machine learning.
Additional affiliations
April 2022 - present
DAIR Institute
Position
  • Research fellow
Education
November 2019 - December 2022
Technische Universität Berlin
Field of study
  • Computer Science
April 2016 - September 2019
Humboldt-Universität zu Berlin
Field of study
  • Social Sciences
April 2015 - February 2016
Humboldt-Universität zu Berlin
Field of study
  • Social Sciences

Publications

Publications (28)
Preprint
Full-text available
The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotat...
Conference Paper
In industrial computer vision, discretionary decisions surrounding the production of image training data remain widely undocumented. Recent research taking issue with such opacity has proposed standardized processes for dataset documentation. In this paper, we expand this space of inquiry through fieldwork at two data processing companies and thirt...
Preprint
Full-text available
Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor...
Preprint
Full-text available
Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by study...
Conference Paper
Full-text available
In this paper, we analyze the relation between biased data- driven outcomes and practices of data annotation for vision models, by placing them in the context of market economy. Understanding data annotation as a sense-making process, we investigate which goals are prioritized by decision-makers throughout the annotation of datasets. Following a qu...
Article
Data work plays a fundamental role in the development of algorithmic systems and the AI industry. It is often performed in business process outsourcing (BPO) companies and crowdsourcing platforms, involving a global and distributed workforce as well as networks of collaborative actors. Previous work on community building among data workers centers...
Article
Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by study...
Article
Full-text available
The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets towards documenting data production....
Preprint
Full-text available
The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets toward documenting data production....
Article
Research in machine learning (ML) has argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor condition...
Preprint
Full-text available
Even after decades of intensive research and public debates, the topic of data privacy remains surrounded by confusion and misinformation. Many people still struggle to grasp the importance of privacy, which has far-reaching consequences for social norms, jurisprudence, and legislation. Discussions on personal data misuse often revolve around a few...
Conference Paper
Full-text available
Algorithmic and data-driven systems have been introduced to assist Public Employment Services (PES) in various countries. However , their deployment has been heavily criticized. This paper is based on a workshop organized by a distributed team of researchers in AI ethics and adjacent fields, which brought together academics, system developers , rep...
Poster
Full-text available
This abstract has been worked out for an open space presentation at the virtual DigiMeet networking event at the Weizenbaum Institute for the Networked Society. In the session, we aim at uncovering the domain of sustainable data production for increasingly networked societies. Combining human-computer interaction (HCI) research with social science...
Preprint
Full-text available
Developers of computer vision algorithms outsource some of the labor involved in annotating training data through business process outsourcing companies and crowdsourcing platforms. Many data annotators are situated in the Global South and are considered independent contractors. This paper focuses on the experiences of Argentinian and Venezuelan an...
Article
The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotat...
Conference Paper
The work of data annotators is fundamental to machine learning (ML). This paper summarizes the goals and preliminary findings of our investigation into work practices at the intersection of data annotation and machine learning engineering. We conducted several weeks of fieldwork at two annotation companies, analyzing which structures, power relatio...
Conference Paper
Data is the fuel of machine learning. How training datasets are produced, i.e. preconceptions, interests, and power imbalances encoded in data, decisively shapes ML systems. This paper summarizes an ongoing research project, its goals, and results at the intersection of data creation and deployment in machine learning products. We argue for the inc...
Conference Paper
The work of data annotators is fundamental to machine learning and, more broadly, to contemporary knowledge production. This paper summarizes the goals and results of our investigation into work practices of data annotation. Guided by Grounded Theory, we conducted several weeks of fieldwork at two annotation companies, analyzing which structures, p...
Conference Paper
The work of data annotators is fundamental to machine learning and, more broadly, to contemporary knowledge production. This paper summarizes the preliminary results of our investigation into data annotation for vision models. Following a qualitative design, this research project analyzes labeling practices and their possible effects on data and sy...
Conference Paper
Following a qualitative design, this paper analyzes the relation between biased data-driven outcomes and practices of data annotation for vision models and investigate which goals are prioritized by decision-makers throughout the annotation of datasets.
Conference Paper
Building on Bourdieu’s concept of symbolic power, this article analyses the key role of classifications for automation and investigates how power is increasingly exercised through machine learning systems by means of categorization and designation. Previous research in different disciplines has raised questions on accountability for algorithms' per...

Network

Cited By