L. Elisa Celis’s research while affiliated with Yale University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (74)


Algorithmic Fairness From the Perspective of Legal Anti-discrimination Principles
  • Article

October 2024

·

14 Reads

Vijay Keswani

·

L. Elisa Celis

Real-world applications of machine learning (ML) algorithms often propagate negative stereotypes and social biases against marginalized groups. In response, the field of fair machine learning has proposed technical solutions for a variety of settings that aim to correct the biases in algorithmic predictions. These solutions remove the dependence of the final prediction on the protected attributes (like gender or race) and/or ensure that prediction performance is similar across demographic groups. Yet, recent studies assessing the impact of these solutions in practice demonstrate their ineffectiveness in tackling real-world inequalities. Given this lack of real-world success, it is essential to take a step back and question the design motivations of algorithmic fairness interventions. We use popular legal anti-discriminatory principles, specifically anti-classification and anti-subordination principles, to study the motivations of fairness interventions and their applications. The anti-classification principle suggests addressing discrimination by ensuring that decision processes and outcomes are independent of the protected attributes of individuals. The anti-subordination principle, on the other hand, argues that decision-making policies can provide equal protection to all only by actively tackling societal hierarchies that enable structural discrimination, even if that requires using protected attributes to address historical inequalities. Through a survey of the fairness mechanisms and applications, we assess different components of fair ML approaches from the perspective of these principles. We argue that the observed shortcomings of fair ML algorithms are similar to the failures of anti-classification policies and that these shortcomings constitute violations of the anti-subordination principle. Correspondingly, we propose guidelines for algorithmic fairness interventions to adhere to the anti-subordination principle. In doing so, we hope to bridge critical concepts between legal frameworks for non-discrimination and fairness in machine learning.


Figure 2: Preference-based fairness as measured by P (1) , with synthetic data where non-i.i.d. preferences are generated from Mallows distributions. The x-axis denotes γ, the Kendall-Tau distance between the central rankings, and the error bars denote the standard error of the mean over 50 iterations. We observe that institution-wise constraints achieve higher preference-based fairness than group-wise and unconstrained settings.
Figure 3: Illustration of the proof of Theorem 7.28: The figure on the left plots the density f (x). The quantities A 1 , A 2 , A 3 are the areas under the curve for the intervals [0, ∆], [∆, ∆/β], [∆/β, 1]. The figure on the right plots g(x) := ln(f (x)), which is a concave function. The line L(x) is shown in red.
Figure 4: P (1) , P (3) , and U measured for synthetic data when the dispersion parameter for the Mallows distribution is varied over ϕ ∈ [0, 1]. In (a), we see the preference-based fairness measured by P (1) when utilities are generated from D Gauss . In (b), we measure P (3) when utilities are generated from D Gauss . (c) shows the utility ratio when utilities are generated from D Gauss . (d), (e), and (f) show the results of (a), (b), and (c), respectively, when utilities are generated from D Pareto . Our main observation is that ϕ does not have a large impact on preference-based fairness for A inst-wise , while A group and A st generally increase with ϕ. See Appendix B.2 for details and discussion. The x-axis denotes α, the y-axis denotes P (1) , P (3) , or U , and the error bars denote the standard error of the mean over 50 iterations.
Centralized Selection with Preferences in the Presence of Biases
  • Preprint
  • File available

September 2024

·

7 Reads

This paper considers the scenario in which there are multiple institutions, each with a limited capacity for candidates, and candidates, each with preferences over the institutions. A central entity evaluates the utility of each candidate to the institutions, and the goal is to select candidates for each institution in a way that maximizes utility while also considering the candidates' preferences. The paper focuses on the setting in which candidates are divided into multiple groups and the observed utilities of candidates in some groups are biased--systematically lower than their true utilities. The first result is that, in these biased settings, prior algorithms can lead to selections with sub-optimal true utility and significant discrepancies in the fraction of candidates from each group that get their preferred choices. Subsequently, an algorithm is presented along with proof that it produces selections that achieve near-optimal group fairness with respect to preferences while also nearly maximizing the true utility under distributional assumptions. Further, extensive empirical validation of these results in real-world and synthetic settings, in which the distributional assumptions may not hold, are presented.

Download


Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions

June 2023

·

25 Reads

Niclas Boehmer

·

L. Elisa Celis

·

·

[...]

·

Nisheeth K. Vishnoi

We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest ``quality'' subset. Score functions from the multiwinner voting literature have been used to aggregate rankings into quality scores for subsets. We study this setting of subset selection problems when, in addition, rankings may contain systemic or unconscious biases toward a group of items. For a general model of input rankings and biases, we show that requiring the selected subset to satisfy group fairness constraints can improve the quality of the selection with respect to unbiased rankings. Importantly, we show that for fairness constraints to be effective, different multiwinner score functions may require a drastically different number of rankings: While for some functions, fairness constraints need an exponential number of rankings to recover a close-to-optimal solution, for others, this dependency is only polynomial. This result relies on a novel notion of ``smoothness'' of submodular functions in this setting that quantifies how well a function can ``correctly'' assess the quality of items in the presence of bias. The results in this paper can be used to guide the choice of multiwinner score functions for the subset selection setting considered here; we additionally provide a tool to empirically enable this.


Figure 2: Synthetic clusters in §3.
Figure 3: Accuracy vs noise parameter s used in dSim for task allocation experiments presented in § 3.
Designing Closed-Loop Models for Task Allocation

May 2023

·

34 Reads

Automatically assigning tasks to people is challenging because human performance can vary across tasks for many reasons. This challenge is further compounded in real-life settings in which no oracle exists to assess the quality of human decisions and task assignments made. Instead, we find ourselves in a "closed" decision-making loop in which the same fallible human decisions we rely on in practice must also be used to guide task allocation. How can imperfect and potentially biased human decisions train an accurate allocation model? Our key insight is to exploit weak prior information on human-task similarity to bootstrap model training. We show that the use of such a weak prior can improve task allocation accuracy, even when human decision-makers are fallible and biased. We present both theoretical analysis and empirical evaluation over synthetic data and a social media toxicity detection task. Results demonstrate the efficacy of our approach.


Revisiting Group Fairness Metrics: The Effect of Networks

November 2022

·

10 Reads

·

8 Citations

Proceedings of the ACM on Human-Computer Interaction

An increasing amount of work studies fairness in socio-technical settings from a computational perspective. This work has introduced a variety of metrics to measure fairness in different settings. Most of these metrics, however, do not account for the interactions between individuals or evaluate any underlying network's effect on the outcomes measured. While a wide body of work studies the organization of individuals into a network structure and how individuals access resources in networks, the impact of network structure on fairness has been largely unexplored. We introduce templates for group fairness metrics that account for network structure. More specifically, we present two types of group fairness metrics that measure distinct yet complementary forms of bias in networks. The first type of metric evaluates how access to others in the network is distributed across groups. The second type of metric evaluates how groups distribute their interactions across other groups, and hence captures inter-group biases. We find that ignoring the network can lead to spurious fairness evaluations by either not capturing imbalances in influence and reach illuminated by the first type of metric, or by overlooking interaction biases as evaluated by the second type of metric. Our empirical study illustrates these pronounced differences between network and non-network evaluations of fairness.



Attitudes towards interpretability across real-world AI applications
Joyplot visualizes the distributions of interpretability ratings, averaged across recommend and decide versions. Participants (N = 170) responded to the question “how important is it that the AI in this application is explainable, even if it performs accurately?” on a 5-point rating scale (1 = not at all important, 5 = extremely important).
Exemplary instructions from Study 2
Schematic representation of the instructions for the vaccine application with its four versions. Each version was shown on a separate page, with the same general scenario described at the top. The depicted bolding and underlining corresponds to the format shown to participants.
Results for Study 2
Participants’ responses from Study 2 (N = 84) to the question “In this case, how important is it that the AI is explainable?” on a continuous slider-scale from “not at all important” (1) to “extremely important” (5). All panels show the jittered raw data, its density, the point estimate of the mean with its 95% confidence intervals, and interquartile ranges; all grouped by stakes (indicated by fill colour; low stakes = yellow, high stakes = red) and scarcity (indicated on x-axes). In summary, participants rated interpretability as more important for high stakes and high scarcity situations. Main effects for stakes and scarcity were not qualified by an interaction. a data aggregated across all five applications; triangle-shaped data points represent averages for the five applications. b–f non-aggregated data for each individual application; circle-shaped data points represent individual responses.
Results for Study 3A
Participants’ responses from Study 3A (N = 261) to the question “How important is it that the given AI model is explainable?” on continuous slider-scales from “not at all important” (1) to “extremely important” (5). For each AI model with a given level of accuracy, there was a separate slider-scale. Panels show the jittered raw data, its density, the point estimate of the mean with its 95% confidence intervals, and interquartile ranges. Overall, there was a slight tendency for participants to rate interpretability as less important for more accurate models. a Data aggregated across all five applications; triangle-shaped data points represent averages for every of the five applications. b–f Non-aggregated data for each individual application; circle-shaped data points represent individual responses.
Dependent variable and results for Studies 3B and 3C
a Dependent variable on which participants were asked to move the slider to a position representing their preference for the interpretability - accuracy tradeoff. The order of attributes and hence the direction of the slider was counter-balanced across participants. b Tradeoff-preferences from Study 3B (N = 112; within-subjects design), aggregating across all five applications. c Tradeoff-preferences from Study 3C (N = 1344; between-subjects design), aggregating across all five applications.
Public attitudes value interpretability but prioritize accuracy in Artificial Intelligence

October 2022

·

154 Reads

·

66 Citations

As Artificial Intelligence (AI) proliferates across important social institutions, many of the most powerful AI systems available are difficult to interpret for end-users and engineers alike. Here, we sought to characterize public attitudes towards AI interpretability. Across seven studies (N = 2475), we demonstrate robust and positive attitudes towards interpretable AI among non-experts that generalize across a variety of real-world applications and follow predictable patterns. Participants value interpretability positively across different levels of AI autonomy and accuracy, and rate interpretability as more important for AI decisions involving high stakes and scarce resources. Crucially, when AI interpretability trades off against AI accuracy, participants prioritize accuracy over interpretability under the same conditions driving positive attitudes towards interpretability in the first place: amidst high stakes and scarce resources. These attitudes could drive a proliferation of AI systems making high-impact ethical decisions that are difficult to explain and understand.


Diverse Representation via Computational Participatory Elections -- Lessons from a Case Study

May 2022

·

14 Reads

Elections are the central institution of democratic processes, and often the elected body -- in either public or private governance -- is a committee of individuals. To ensure the legitimacy of elected bodies, the electoral processes should guarantee that diverse groups are represented, in particular members of groups that are marginalized due to gender, ethnicity, or other socially salient attributes. To address this challenge of representation, we have designed a novel participatory electoral process coined the Representation Pact, implemented with the support of a computational system. That process explicitly enables voters to flexibly decide on representation criteria in a first round, and then lets them vote for candidates in a second round. After the two rounds, a counting method is applied, which selects the committee of candidates that maximizes the number of votes received in the second round, conditioned on satisfying the criteria provided in the first round. With the help of a detailed use case that applied this process in a primary election of 96 representatives in Switzerland, we explain how this method contributes to fairness in political elections by achieving a better "descriptive representation". Further, based on this use case, we identify lessons learnt that are applicable to participatory computational systems used in societal or political contexts. Good practices are identified and presented.


Addressing Strategic Manipulation Disparities in Fair Classification

May 2022

·

5 Reads

In real-world classification settings, individuals respond to classifier predictions by updating their features to increase their likelihood of receiving a particular (positive) decision (at a certain cost). Yet, when different demographic groups have different feature distributions or different cost functions, prior work has shown that individuals from minority groups often pay a higher cost to update their features. Fair classification aims to address such classifier performance disparities by constraining the classifiers to satisfy statistical fairness properties. However, we show that standard fairness constraints do not guarantee that the constrained classifier reduces the disparity in strategic manipulation cost. To address such biases in strategic settings and provide equal opportunities for strategic manipulation, we propose a constrained optimization framework that constructs classifiers that lower the strategic manipulation cost for the minority groups. We develop our framework by studying theoretical connections between group-specific strategic cost disparity and standard selection rate fairness metrics (e.g., statistical rate and true positive rate). Empirically, we show the efficacy of this approach over multiple real-world datasets.


Citations (46)


... To address fairness, we define the disparity between the TPRs of two groups. This fairness notion has been studied previously in the context of strategic classification (e.g., [Keswani and Celis, 2023]). ...

Reference:

Two Tickets are Better than One: Fair and Accurate Hiring Under Strategic LLM Manipulations
Addressing Strategic Manipulation Disparities in Fair Classification
  • Citing Conference Paper
  • October 2023

... Recently, questions of fairness of information access of individuals and demographic groups in a social network have come to the fore [6,7,[23][24][25][26][27]. Much work focuses on the probability that an individual receives informa- tion that spreads through a social network. ...

Revisiting Group Fairness Metrics: The Effect of Networks
  • Citing Article
  • November 2022

Proceedings of the ACM on Human-Computer Interaction

... Bolton et al. [13] similarly explore AI tools for antimicrobial use to balance tradeoffs between immediate patient needs and long-term impacts. In the electoral domain, Evequoz et al. [41] use computational preference elicitation to create a participatory multi-winner electoral process where voters provide preferences about the representation criteria elected bodies should satisfy and a computational process selects the group of popular candidates who satisfy the voters' representation preferences. ...

Diverse Representation via Computational Participatory Elections - Lessons from a Case Study
  • Citing Conference Paper
  • October 2022

... This view that people's knowledge and ideas about AI are influenced by what they receive in media coverage is also expressed in news coverage itself [e.g., Naughton, 2019; see also Kalwa, 2022]. Even further go Ouchchy et al. [2020] or Nussberger et al. [2022], as they believe that news portrayal of AI could even influence AI research and development, legislation and regulation. ...

Public attitudes value interpretability but prioritize accuracy in Artificial Intelligence

... The reasons for exclusion of non-binary identities are diverse, each pointing to different aspects of research design and data handling (see Table 7). Firstly, a prevalent reason is the lack of data (Booth et al. 2021;Meister et al. 2023;Hoque et al. 2020;Keswani and Celis 2021;Dominguez-Catena, Paternain, and Galar 2023;Hirota, Nakashima, and Garcia 2022;Alasadi, Al Hilli, and Singh 2019). Numerous studies point to an absence or scarcity of data representing non-binary individuals. ...

Auditing for Diversity Using Representative Examples
  • Citing Conference Paper
  • August 2021

... Despite advancements, bias remains a concern in automated summarization (Dash et al., 2019;Jung et al., 2019;Keswani and Celis, 2021;Olabisi et al., 2022) as most existing summarization methods focus on quality but fall short in optimizing fairness. This gap leads to the key question: if a summarization method is optimized for fairness, how does it affect the overall summary quality? ...

Dialect Diversity in Text Summarization on Twitter
  • Citing Conference Paper
  • April 2021

... Among them, Heidari and Kleinberg (2021) considers a model of intergenerational opportunities showing that it can be economically efficient to allocate (e.g., educational) opportunities to individuals of lower socioeconomic status rather than higher-performing individuals of higher socioeconomic status. Other work has considered how affirmative action policies may help inform firms about the capabilities of or counteract the biases against minority workers (Coate and Loury, 1993;Celis et al., 2021;Kleinberg and Raghavan, 2018). Our paper complements this literature by providing a microfoundation for cross-group link recommendations in professional networking. ...

The Effect of the Rooney Rule on Implicit Bias in the Long Term
  • Citing Conference Paper
  • March 2021

... The work by Buolamwini and Gebru [2] shows that, indeed, the current commercial facial recognition systems that detect and recognize faces of dark-skinned women tend to produce errors in their performance. Celis and Keswani [3] have proposed a model to check socially biased image retrieval. They found that the search queries often carry the regional social, cultural and demographic characteristics thus generating photo representations that have biased characteristics. ...

Implicit Diversity in Image Summarization
  • Citing Article
  • October 2020

Proceedings of the ACM on Human-Computer Interaction