Chapter

Fairness and Explainability for Enabling Trust in AI Systems

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This chapter discusses the ethical complications and challenges arising from the use of AI systems in our everyday lives. It outlines recent and upcoming regulations and policies regarding the use of AI systems, and dives into the topics of explainability and fairness. We argue that trustworthiness has at its heart explainability, and thus we present ideas and techniques aimed at making AI systems more understandable and ultimately more trustworthy for humans. Moreover, we discuss the topic of algorithmic fairness and the requirement that AI systems are free from biases and do not discriminate against individuals on the basis of some protected attributes. In all cases, we present a brief summary of the most important concepts and results in the literature. As a conclusion, we present some ideas for future research and sketch open challenges in the field.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Perceived risk is the psychological effect of potential uncertainties and ambiguities associated with technology usage (Li et al., 2021). Global AI concerns include discrimination, privacy loss, and human rights violations, with educational challenges such as algorithmic bias and surveillance issues (Kamalul Ariffin et al., 2018;Federspiel et al., 2023;Sacharidis et al., 2024). Faculty concerns about AI, including risks and potential disruptions, have a significant impact on attitudes towards and the use of AI in higher education for administrative duties and student involvement (Wu et al., 2022;Rahiman & Kodikal, 2024). ...
Article
Full-text available
The incorporation of artificial intelligence (AI) into education is becoming more important over time, although faculty viewpoints on this integration are not well recognized. To analyze educators' attitudes towards AI tools in Bangladesh, this research built a modified model that included components from the technology acceptance model (TAM), unified theory of acceptance and use of technology (UTAUT), and DeLone models. The study's purpose was to assess faculty attitudes towards AI adoption by analysing system quality information quality, facilitating condition, performance expectancy, effort expectancy, and behavioural intention. An online survey was administered to 543 faculty members from private universities in Bangladesh using a cluster and multi-stage sampling technique. The results were analysed using structural equation modelling and complemented by in-depth interviews with 20 educators. The statistically significant model ( p < 0.05) explained 37.4% of the variation in educator's attitudes towards AI. The findings indicated that performance expectancy, effort expectancy, and information quality all had a favourable influence on attitudes, whereas behavioural intention had a negligible negative effect. The research, which uses the DeLone model in a novel way, concludes that information quality had a negligible positive influence on the attitude of educators. In contrast, system quality was not a significant determinant. This study highlights the relevance of knowing faculty attitudes towards AI and the necessity to concentrate on performance and effort expectations to drive successful AI integration methods in education.
Article
Full-text available
Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding AI research. A better understanding of the needs of XAI users, as well as human-centered evaluations of explainable models are both a necessity and a challenge. In this paper, we explore how human-computer interaction (HCI) and AI researchers conduct user studies in XAI applications based on a systematic literature review. After identifying and thoroughly analyzing 97 core papers with human-based XAI evaluations over the past five years, we categorize them along the measured characteristics of explanatory methods, namely trust, understanding, usability , and human-AI collaboration performance . Our research shows that XAI is spreading more rapidly in certain application domains, such as recommender systems than in others, but that user evaluations are still rather sparse and incorporate hardly any insights from cognitive or social sciences. Based on a comprehensive discussion of best practices, i.e., common models, design choices, and measures in user studies, we propose practical guidelines on designing and conducting user studies for XAI researchers and practitioners. Lastly, this survey also highlights several open research directions, particularly linking psychological science and human-centered XAI.
Article
Full-text available
Explainable AI provides insights to users into the why for model predictions, offering potential for users to better understand and trust a model, and to recognize and correct AI predictions that are incorrect. Prior research on human and explainable AI interactions has focused on measures such as interpretability, trust, and usability of the explanation. There are mixed findings whether explainable AI can improve actual human decision-making and the ability to identify the problems with the underlying model. Using real datasets, we compare objective human decision accuracy without AI (control), with an AI prediction (no explanation), and AI prediction with explanation. We find providing any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact. Moreover, we observed the strongest predictor for human decision accuracy was AI accuracy and that users were somewhat able to detect when the AI was correct vs. incorrect, but this was not significantly affected by including an explanation. Our results indicate that, at least in some situations, the why information provided in explainable AI may not enhance user decision-making, and further research may be needed to understand how to integrate explainable AI into real systems.
Article
Full-text available
Algorithmic fairness is typically studied from the perspective of predictions. Instead, here we investigate fairness from the perspective of recourse actions suggested to individuals to remedy an unfavourable classification. We propose two new fair-ness criteria at the group and individual level, which—unlike prior work on equalising the average group-wise distance from the decision boundary—explicitly account for causal relationships between features, thereby capturing downstream effects of recourse actions performed in the physical world. We explore how our criteria relate to others, such as counterfactual fairness, and show that fairness of recourse is complementary to fairness of prediction. We study theoretically and empirically how to enforce fair causal recourse by altering the classifier and perform a case study on the Adult dataset. Finally, we discuss whether fairness violations in the data generating process revealed by our criteria may be better addressed by societal interventions as opposed to constraints on the classifier.
Article
Full-text available
We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems among others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important concerns regarding the fairness of such systems. In this work, we aim at presenting a toolkit of definitions, models and methods used for ensuring fairness in rankings and recommendations. Our objectives are threefold: (a) to provide a solid framework on a novel, quickly evolving and impactful domain, (b) to present related methods and put them into perspective and (c) to highlight open challenges and research paths for future work.
Conference Paper
Full-text available
Microlending can lead to improved access to capital in impoverished countries. Recommender systems could be used in microlending to provide efficient and personalized service to lenders. However, increasing concerns about discrimination in machine learning hinder the application of recommender systems to the microfinance industry. Most previous recommender systems focus on pure personalization, with fairness issue largely ignored. A desirable fairness property in microlending is to give borrowers from different demographic groups a fair chance of being recommended, as stated by Kiva. To achieve this goal, we propose a Fairness-Aware Re-ranking (FAR) algorithm to balance ranking quality and borrower-side fairness. Furthermore, we take into consideration that lenders may differ in their receptivity to the diversification of recommended loans, and develop a Personalized Fairness-Aware Re-ranking (PFAR) algorithm. Experiments on a real-world dataset from Kiva.org show that our re-ranking algorithm can significantly promote fairness with little sacrifice in accuracy, and be attentive to individual lender preference on loan diversity.
Preprint
Full-text available
Combining big data and machine learning algorithms, the power of automatic decision tools induces as much hope as fear. Many recently enacted European legislation (GDPR) and French laws attempt to regulate the use of these tools. Leaving aside the well-identified problems of data confidentiality and impediments to competition, we focus on the risks of discrimination, the problems of transparency and the quality of algorithmic decisions. The detailed perspective of the legal texts, faced with the complexity and opacity of the learning algorithms, reveals the need for important technological disruptions for the detection or reduction of the discrimination risk, and for addressing the right to obtain an explanation of the automatic decision. Since trust of the developers and above all of the users (citizens, litigants, customers) is essential, algorithms exploiting personal data must be deployed in a strict ethical framework. In conclusion, to answer this need, we list some ways of controls to be developed: institutional control, ethical charter, external audit attached to the issue of a label.
Conference Paper
Full-text available
Classification models are often used to make decisions that affect humans: whether to approve a loan application, extend a job offer, or provide insurance. In such applications, individuals should have the ability to change the decision of the model. When a person is denied a loan by a credit scoring model, for example, they should be able to change the input variables of the model in a way that will guarantee approval. Otherwise, this person will be denied the loan so long as the model is deployed, and -- more importantly --will lack agency over a decision that affects their livelihood. In this paper, we propose to evaluate a linear classification model in terms of recourse, which we define as the ability of a person to change the decision of the model through actionable input variables (e.g., income vs. age or marital status). We present an integer programming toolkit to: (i) measure the feasibility and difficulty of recourse in a target population; and (ii) generate a list of actionable changes for a person to obtain a desired outcome. We discuss how our tools can inform different stakeholders by using them to audit recourse for credit scoring models built with real-world datasets. Our results illustrate how recourse can be significantly affected by common modeling practices, and motivate the need to evaluate recourse in algorithmic decision-making.
Conference Paper
Full-text available
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school.
Article
Full-text available
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.
Conference Paper
Full-text available
In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n » k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above or indistinguishable from a given minimum. Utility is operationalized in two ways: (i) every candidate included in the top-k should be more qualified than every candidate not included; and (ii) for every pair of candidates in the top-k, the more qualified candidate should be ranked above. An efficient algorithm is presented for producing the Fair Top-k Ranking, and tested experimentally on existing datasets as well as new datasets released with this paper, showing that our approach yields small distortions with respect to rankings that maximize utility without considering fairness criteria. To the best of our knowledge, this is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list.
Article
Full-text available
There has been much discussion of the right to explanation in the EU General Data Protection Regulation, and its existence, merits, and disadvantages. Implementing a right to explanation that opens the black box of algorithmic decision-making faces major legal and technical barriers. Explaining the functionality of complex algorithmic decision-making systems and their rationale in specific cases is a technically challenging problem. Some explanations may offer little meaningful information to data subjects, raising questions around their value. Explanations of automated decisions need not hinge on the general public understanding how algorithmic systems function. Even though such interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the black box. Looking at explanations as a means to help a data subject act rather than merely understand, one could gauge the scope and content of explanations according to the specific goal or action they are intended to support. From the perspective of individuals affected by automated decision-making, we propose three aims for explanations: (1) to inform and help the individual understand why a particular decision was reached, (2) to provide grounds to contest the decision if the outcome is undesired, and (3) to understand what would need to change in order to receive a desired result in the future, based on the current decision-making model. We assess how each of these goals finds support in the GDPR. We suggest data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims. These counterfactual explanations describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.
Article
The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practices of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. This survey also contributes to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. Our systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. The Co-12 categorization scheme and our identified evaluation methods open up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously.
Chapter
In this paper, we use counterfactual explanations to offer a new perspective on fairness, that, besides accuracy, accounts also for the difficulty or burden to achieve fairness. We first gather a set of fairness-related datasets and implement a classifier to extract the set of false negative test instances to generate different counterfactual explanations on them. We subsequently calculate two measures: the false negative ratio of the set of test instances, and the distance (also called burden) from these instances to their corresponding counterfactuals, aggregated by sensitive feature groups. The first measure is an accuracy-based estimation of the classifier biases against sensitive groups, whilst the second is a counterfactual-based assessment of the difficulty each of these groups has of reaching their corresponding desired ground truth label. We promote the idea that a counterfactual and an accuracy-based fairness measure may assess fairness in a more holistic manner, whilst also providing interpretability. We then propose and evaluate, on these datasets, a measure called Normalized Accuracy Weighted Burden, which is more consistent than only its accuracy or its counterfactual components alone, considering both false negative ratios and counterfactual distance per sensitive feature. We believe this measure would be more adequate to assess classifier fairness and promote the design of better performing algorithms.KeywordsAlgorithmic fairnessCounterfactual explanationsBias
Article
Fairness related to locations (i.e., "where") is critical for the use of machine learning in a variety of societal domains involving spatial datasets (e.g., agriculture, disaster response, urban planning). Spatial biases incurred by learning, if left unattended, may cause or exacerbate unfair distribution of resources, social division, spatial disparity, etc. The goal of this work is to develop statistically-robust formulations and model-agnostic learning strategies to understand and promote spatial fairness. The problem is challenging as locations are often from continuous spaces with no well-defined categories (e.g., gender), and statistical conclusions from spatial data are fragile to changes in spatial partitionings and scales. Existing studies in fairness-driven learning have generated valuable insights related to non-spatial factors including race, gender, education level, etc., but research to mitigate location-related biases still remains in its infancy, leaving the main challenges unaddressed. To bridge the gap, we first propose a robust space-as-distribution (SPAD) representation of spatial fairness to reduce statistical sensitivity related to partitioning and scales in continuous space. Furthermore, we propose a new SPAD-based stochastic strategy to efficiently optimize over an extensive distribution of fairness criteria, and a bi-level training framework to enforce fairness via adaptive adjustment of priorities among locations. Experiments on real-world crop monitoring show that SPAD can effectively reduce sensitivity in fairness evaluation and the stochastic bi-level training framework can greatly improve the fairness.
Article
While EXplainable Artificial Intelligence (XAI) approaches aim to improve human-AI collaborative decision-making by improving model transparency and mental model formations, experiential factors associated with human users can cause challenges in ways system designers do not anticipate. In this paper, we first showcase a user study on how anchoring bias can potentially affect mental model formations when users initially interact with an intelligent system and the role of explanations in addressing this bias. Using a video activity recognition tool in cooking domain, we asked participants to verify whether a set of kitchen policies are being followed, with each policy focusing on a weakness or a strength. We controlled the order of the policies and the presence of explanations to test our hypotheses. Our main finding shows that those who observed system strengths early-on were more prone to automation bias and made significantly more errors due to positive first impressions of the system, while they built a more accurate mental model of the system competencies. On the other hand, those who encountered weaknesses earlier made significantly fewer errors since they tended to rely more on themselves, while they also underestimated model competencies due to having a more negative first impression of the model. Motivated by these findings and similar existing work, we formalize and present a conceptual model of user’s past experiences that examine the relations between user’s backgrounds, experiences, and human factors in XAI systems based on usage time. Our work presents strong findings and implications, aiming to raise the awareness of AI designers towards biases associated with user impressions and backgrounds.
Article
Machine learning is increasingly used to inform decision-making in sensitive situations where decisions have consequential effects on individuals’ lives. In these settings, in addition to requiring models to be accurate and robust, socially relevant values such as fairness, privacy, accountability, and explainability play an important role in the adoption and impact of said technologies. In this work, we focus on algorithmic recourse , which is concerned with providing explanations and recommendations to individuals who are unfavorably treated by automated decision-making systems. We first perform an extensive literature review, and align the efforts of many authors by presenting unified definitions , formulations , and solutions to recourse. Then, we provide an overview of the prospective research directions towards which the community may engage, challenging existing assumptions and making explicit connections to other ethical challenges such as security, privacy, and fairness.
Chapter
This chapter argues that there is very little chance that we humans can specify our objectives completely and correctly, in such a way that the pursuit of those objectives by more capable machines is guaranteed to result in beneficial outcomes for humans. Consequently, this chapter defends and further articulates the need for “provably beneficial AI,” which is the idea that to the extent that human values are revealed in our behavior, we should be able to get machines to learn underlying human preferences from observing human behavior. It then discusses the technical challenges involved in building provably beneficial AI and responds to some possible concerns to this approach.
Article
In many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black box supervised learning models (e.g. complex trees, neural networks, boosted trees, random forests, nearest neighbours, local kernel‐weighted methods and support vector regression) in this regard is their lack of interpretability or transparency. Partial dependence plots, which are the most popular approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. As an alternative to partial dependence plots, we present a new visualization approach that we term accumulated local effects plots, which do not require this unreliable extrapolation with correlated predictors. Moreover, accumulated local effects plots are far less computationally expensive than partial dependence plots. We also provide an R package ALEPlot as supplementary material to implement our proposed method.
Conference Paper
Location-based recommender systems learn from historical movement traces of users in order to make recommendations for places to visit, events to attend, itineraries to follow. As with other systems assisting humans in their decisions, there is an increasing need to scrutinize the implications of algorithmically made location recommendations. The challenge is that one can define different fairness concerns, as both users and locations may be subjects of unfair treatment. In this work, we propose a comprehensive framework that allows the expression of various fairness aspects, and quantify the degree to which the system is acting justly. In a case study, we focus on three fairness aspects, and investigate several types of location-based recommenders in terms of their ability to be fair under the studied aspects.
Article
Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box.
Conference Paper
Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system's pairwise fairness.
Conference Paper
Latent factor models (LFMs) such as matrix factorization have achieved the state-of-the-art performance among various collaborative filtering approaches for recommendation. Despite the high recommendation accuracy of LFMs, a critical issue to be resolved is their lack of interpretability. Extensive efforts have been devoted to interpreting the prediction results of LFMs. However, they either rely on auxiliary information which may not be available in practice, or sacrifice recommendation accuracy for interpretability. Influence functions, stemming from robust statistics, have been developed to understand the effect of training points on the predictions of black-box models. Inspired by this, we propose a novel explanation method named FIA (Fast Influence Analysis) to understand the prediction of trained LFMs by tracing back to the training data with influence functions. We present how to employ influence functions to measure the impact of historical user-item interactions on the prediction results of LFMs and provide intuitive neighbor-style explanations based on the most influential interactions. Our proposed FIA exploits the characteristics of two important LFMs, matrix factorization and neural collaborative filtering, and is capable of accelerating the overall influence analysis process. We provide a detailed complexity analysis for FIA over LFMs and conduct extensive experiments to evaluate its performance using real-world datasets. The results demonstrate the effectiveness and efficiency of FIA, and the usefulness of the generated explanations for the recommendation results.
Conference Paper
A barrier to the wider adoption of neural networks is their lack of interpretability. While local explanation methods exist for one prediction, most global attributions still reduce neural network decisions to a single set of features. In response, we present an approach for generating global attributions called GAM, which explains the landscape of neural network predictions across subpopulations. GAM augments global explanations with the proportion of samples that each attribution best explains and specifies which samples are described by each attribution. Global explanations also have tunable granularity to detect more or fewer subpopulations. We demonstrate that GAM's global explanations 1) yield the known feature importances of simulated data, 2) match feature weights of interpretable statistical models on real data, and 3) are intuitive to practitioners through user studies. With more transparent predictions, GAM can help ensure neural network decisions are generated for the right reasons.
Conference Paper
In this work, we introduce a novel metric for auditing group fairness in ranked lists. Our approach offers two benefits compared to the state of the art. First, we offer a blueprint for modeling of user attention. Rather than assuming a logarithmic loss in importance as a function of the rank, we can account for varying user behaviors through parametrization. For example, we expect a user to see more items during a viewing of a social media feed than when they inspect the results list of a single web search query. Second, we allow non-binary protected attributes to enable investigating inherently continuous attributes (e.g., political alignment on the liberal to conservative spectrum) as well as to facilitate measurements across aggregated sets of search results, rather than separately for each result list. By combining these two elements into our metric, we are able to better address the human factors inherent in this problem. We measure the whole sociotechnical system, consisting of a ranking algorithm and individuals using it, instead of exclusively focusing on the ranking algorithm. Finally, we use our metric to perform three simulated fairness audits. We show that determining fairness of a ranked output necessitates knowledge (or a model) of the end-users of the particular service. Depending on their attention distribution function, a fixed ranking of results can appear biased both in favor and against a protected group1.
Conference Paper
In many settings it is required that items are recommended to a group of users instead of a single user. Conventionally, the objective is to maximize the overall satisfaction among group members. Recently, however, attention has shifted to ensuring that recommendations are fair in that they should minimize the feeling of dissatisfaction among members. In this work, we explore a simple but intuitive notion of fairness: the minimum utility a group member receives. We propose a technique that seeks to rank the Pareto, or unanimously, optimal items by considering all admissible ways in which a group might reach a decision. As our detailed experimental study shows, this results in top-N recommendations that not only achieve a high minimum utility compared to other fairness-aware techniques, but also a high average utility across all group members beating standard aggregation strategies.
Conference Paper
Kearns, Neel, Roth, and Wu [ICML 2018] recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks a statistical fairness constraint (say, equalizing false positive rates across protected groups), but then asks that this constraint hold over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give an algorithm guaranteed to learn subject to this constraint, under the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive empirical evaluation of the algorithm of Kearns et al. On four real datasets for which fairness is a concern, we investigate the basic convergence of the algorithm when instantiated with fast heuristics in place of learning oracles, measure the tradeoffs between fairness and accuracy, and compare this approach with the recent algorithm of Agarwal, Beygelzeimer, Dudik, Langford, and Wallach [ICML 2018], which implements weaker and more traditional marginal fairness constraints defined by individual protected attributes. We find that in general, the Kearns et al. algorithm converges quickly, large gains in fairness can be obtained with mild costs to accuracy, and that optimizing accuracy subject only to marginal fairness leads to classifiers with substantial subgroup unfairness. We also provide a number of analyses and visualizations of the dynamics and behavior of the Kearns et al. algorithm. Overall we find this algorithm to be effective on real data, and rich subgroup fairness to be a viable notion in practice.
Conference Paper
Computers are increasingly used to make decisions that have significant impact on people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions that require investigation for these algorithms to receive broad adoption. We present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures and existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits) and to different forms of preprocessing, indicating that fairness interventions might be more brittle than previously thought.
Conference Paper
When a user has watched, say, 70 romance movies and 30 action movies, then it is reasonable to expect the personalized list of recommended movies to be comprised of about 70% romance and 30% action movies as well. This important property is known as calibration, and recently received renewed attention in the context of fairness in machine learning. In the recommended list of items, calibration ensures that the various (past) areas of interest of a user are reflected with their corresponding proportions. Calibration is especially important in light of the fact that recommender systems optimized toward accuracy (e.g., ranking metrics) in the usual offline-setting can easily lead to recommendations where the lesser interests of a user get crowded out by the user's main interests-which we show empirically as well as in thought-experiments. This can be prevented by calibrated recommendations. To this end, we outline metrics for quantifying the degree of calibration, as well as a simple yet effective re-ranking algorithm for post-processing the output of recommender systems.
Conference Paper
In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes; for example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.
Conference Paper
The widescale use of machine learning algorithms to drive decision-making has highlighted the critical importance of ensuring the interpretability of such models in order to engender trust in their output. The state-of-the-art recommendation systems use black-box latent factor models that provide no explanation of why a recommendation has been made, as they abstract their decision processes to a high-dimensional latent space which is beyond the direct comprehension of humans. We propose a novel approach for extracting explanations from latent factor recommendation systems by training association rules on the output of a matrix factorisation black-box model. By taking advantage of the interpretable structure of association rules, we demonstrate that predictive accuracy of the recommendation model can be maintained whilst yielding explanations with high fidelity to the black-box model on a unique industry dataset. Our approach mitigates the accuracy-interpretability trade-off whilst avoiding the need to sacrifice flexibility or use external data sources. We also contribute to the ill-defined problem of evaluating interpretability.
Article
The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined groups, and then ask for parity of some statistic of the classifier across these groups. Constraints of this form are susceptible to (intentional or inadvertent) "fairness gerrymandering", in which a classifier appears to be fair on each individual group, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes.. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness, and recently proposed individual notions of fairness, but it raises several computational challenges. It is no longer clear how to even audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning --- which means it is computationally hard in the worst case, even for simple structured subclasses. However, it also suggests that common heuristics for learning can be applied to successfully solve the auditing problem in practice. We then derive an algorithm that provably converges to the best fair distribution over classifiers in a given class, given access to oracles which can solve the agnostic learning and auditing problems. The algorithm is based on a formulation of subgroup fairness as fictitious play in a two-player zero-sum game between a Learner and an Auditor. We implement our algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets.
Article
Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively. Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from "What is the right fairness criterion?" to "What do we want to assume about the causal data generating process?" Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.
Conference Paper
Ranking and scoring are ubiquitous. We consider the setting in which an institution, called a ranker, evaluates a set of individuals based on demographic, behavioral or other characteristics. The final output is a ranking that represents the relative quality of the individuals. While automatic and therefore seemingly objective, rankers can, and often do, discriminate against individuals and systematically disadvantage members of protected groups. This warrants a careful study of the fairness of a ranking scheme, to enable data science for social good applications, among others. In this paper we propose fairness measures for ranked outputs. We develop a data generation procedure that allows us to systematically control the degree of unfairness in the output, and study the behavior of our measures on these datasets. We then apply our proposed measures to several real datasets, and detect cases of bias. Finally, we show preliminary results of incorporating our ranked fairness measures into an optimization framework, and show potential for improving fairness of ranked outputs while maintaining accuracy. The code implementing all parts of this work is publicly available at https://github.com/DataResponsibly/FairRank.