Blossom Metevier’s research while affiliated with University of Massachusetts Amherst and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics
  • Conference Paper

June 2024

·

2 Citations

Min-Hsuan Yeh

·

Blossom Metevier

·

Austin Hoag

·

Philip Thomas

Pursuing Social Good: An Overview of Short- and Long-Term Fairness in Classification

April 2024

·

2 Reads

ACM SIGCAS Computers and Society

Machine learning (ML) models are increasingly being used to aid decision-making in high-risk applications. However, these models can perpetuate biases present in their training data or the systems in which they are integrated. When unaddressed, these biases can lead to harmful outcomes, such as misdiagnoses in healthcare [11], wrongful denials of loan applications [9], and over-policing of minority communities [2, 4]. Consequently, the fair ML community is dedicated to developing algorithms that minimize the influence of data and model bias.


Matched Pair Calibration for Ranking Fairness

June 2023

·

9 Reads

Hannah Korevaar

·

Chris McConnell

·

Edmund Tong

·

[...]

·

Khalid El-Arini

We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.


Enforcing Delayed-Impact Fairness Guarantees

August 2022

·

12 Reads

·

1 Citation

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity or demographic parity. However, enforcing constraints of this type may result in models that have negative long-term impact on disadvantaged individuals and communities. We introduce ELF (Enforcing Long-term Fairness), the first classification algorithm that provides high-confidence fairness guarantees in terms of long-term, or delayed, impact. We prove that the probability that ELF returns an unfair solution is less than a user-specified tolerance and that (under mild assumptions), given sufficient training data, ELF is able to find and return a fair solution if one exists. We show experimentally that our algorithm can successfully mitigate long-term unfairness.


Reinforcement Learning When All Actions Are Not Always Available

April 2020

·

44 Reads

·

6 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which better captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs can suffer from potential divergence issues, and present new policy gradient algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches on tasks inspired by real-life use cases wherein the action set is stochastic.


Reinforcement Learning When All Actions are Not Always Available

June 2019

·

45 Reads

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.


Lexicase Selection Beyond Genetic Programming

January 2019

·

28 Reads

·

40 Citations

Lexicase selection is a selection method that was developed for parent selection in genetic programming. In this chapter, we present a study of lexicase selection in a non-genetic-programming context, conducted to investigate the broader applicability of the technique. Specifically, we present a framework for solving Boolean constraint satisfaction problems using a traditional genetic algorithm, with linear genomes of fixed length. We present results of experiments in this framework using three parent selection algorithms: lexicase selection, tournament selection (with several tournament sizes), and fitness-proportionate selection. The results show that when lexicase selection is used, more solutions are found, fewer generations are required to find those solutions, and more diverse populations are maintained. We discuss the implications of these results for the utility of lexicase selection more generally.

Citations (3)


... It's common in the fairness-aware machine learning literature for fairness measures to be defined such that the optimization goal is a ratio with a value of 1.0 or a difference with a value of 0.0, and previous work on bias in content moderation has used the difference [10]. Following recent work that demonstrates that the ratio is more appropriate for most fairness contexts [28], we define speech suppression accordingly: ...

Reference:

Identity-related Speech Suppression in Generative AI Content Moderation
Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics
  • Citing Conference Paper
  • June 2024

... They showed that the invalid action mask method scales better than the penalty method, but it would still suffer from Kullback-Leibler (KL) divergence explosion when dealing with more challenging tasks. The challenge can not be underestimated in the deterministic case, and it would be even more difficult when it comes to the stochastic case, where the problem becomes a stochastic action set Markov decision process (SAS-MDP) [10], [16]. In ATFM, the weather is of stochastic nature, which could largely influence the flights [27], which is reflected in the changing sector capacity in our problem, and the current methods still require knowledge of all available actions, which could cause huge computational complexity with the combinatorial action space. ...

Reinforcement Learning When All Actions Are Not Always Available
  • Citing Article
  • April 2020

Proceedings of the AAAI Conference on Artificial Intelligence

... Many GP selection strategies aggregate each program's performance across all training cases to produce one fitness score that can be used for selection. In contrast, lexicase selection (Spector, 2012; avoids aggregation and considers each training case separately, which has been shown to improve diversity maintenance (Helmuth et al., 2016;Dolson and Ofria, 2018) and problem-solving success across a wide range of domains (Moore and Stanton, 2017;Metevier et al., 2019;Aenugu and Spector, 2019;Ding and Spector, 2021;Lalejini et al., 2022). ...

Lexicase Selection Beyond Genetic Programming
  • Citing Chapter
  • January 2019