PreprintPDF Available

Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Conventional decision-support systems, primarily based on supervised learning, focus on outcome prediction models to recommend actions. However, they often fail to account for the complexities of multi-actor environments, where diverse and potentially conflicting stakeholder preferences must be balanced. In this paper, we propose a novel participatory framework that redefines decision-making as a multi-stakeholder optimization problem, capturing each actor's preferences through context-dependent reward functions. Our framework leverages k-fold cross-validation to fine-tune user-provided outcome prediction models and evaluate decision strategies, including compromise functions mediating stakeholder trade-offs. We introduce a synthetic scoring mechanism that exploits user-defined preferences across multiple metrics to rank decision-making strategies and identify the optimal decision-maker. The selected decision-maker can then be used to generate actionable recommendations for new data. We validate our framework using two real-world use cases, demonstrating its ability to deliver recommendations that effectively balance multiple metrics, achieving results that are often beyond the scope of purely prediction-based methods. Ablation studies demonstrate that our framework, with its modular, model-agnostic, and inherently transparent design, integrates seamlessly with various predictive models, reward structures, evaluation metrics, and sample sizes, making it particularly suited for complex, high-stakes decision-making contexts.
Content may be subject to copyright.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder
Decision-Making
VITTORIA VINEIS, Sapienza University of Rome, Italy
GIUSEPPE PERELLI, Sapienza University of Rome, Italy
GABRIELE TOLOMEI, Sapienza University of Rome, Italy
Conventional decision-support systems, primarily based on supervised learning, focus on outcome prediction models to recommend
actions. However, they often fail to account for the complexities of multi-actor environments, where diverse and potentially conicting
stakeholder preferences must be balanced.
In this paper, we propose a novel participatory framework that redenes decision-making as a multi-stakeholder optimization problem,
capturing each actor’s preferences through context-dependent reward functions. Our framework leverages
𝑘
-fold cross-validation
to ne-tune user-provided outcome prediction models and evaluate decision strategies, including compromise functions mediating
stakeholder trade-os. We introduce a synthetic scoring mechanism that exploits user-dened preferences across multiple metrics to
rank decision-making strategies and identify the optimal decision-maker. The selected decision-maker can then be used to generate
actionable recommendations for new data. We validate our framework using two real-world use cases, demonstrating its ability
to deliver recommendations that eectively balance multiple metrics, achieving results that are often beyond the scope of purely
prediction-based methods. Ablation studies demonstrate that our framework, with its modular, model-agnostic, and inherently
transparent design, integrates seamlessly with various predictive models, reward structures, evaluation metrics, and sample sizes,
making it particularly suited for complex, high-stakes decision-making contexts.
CCS Concepts: Computing methodologies
Learning paradigms;Modeling methodologies;Model verication and
validation;Human-centered computing Collaborative and social computing theory, concepts and paradigms.
Additional Key Words and Phrases: Participatory Articial Intelligence, Participatory Training, Multi-Stakeholder Decision-Making
1 Introduction
Advances in articial intelligence (AI) and machine learning (ML) have fueled the development of automated decision
support systems in high-stakes domains such as healthcare, nance, and public policy [
11
]. However, the predominant
top-down design paradigm and the traditional optimization-oriented setup for these systems are increasingly under
scrutiny, particularly due to their tendency to overlook critical stakeholder perspectives and broader societal impacts
[
18
]. In many cases, algorithmic solutions emphasize predictive accuracy above all else, neglecting or underperforming
in aspects such as fairness [
35
], transparency [
20
], and the consideration of heterogeneous and potentially conicting
interests [
27
]. This narrow focus on outcome prediction has led to impact-blind recommendations, which risk per-
petuating biases and exacerbating inequities [
5
], highlighting the need to regulate and promote the creation of more
trustworthy systems [
12
,
16
,
29
]. For example, in healthcare, predictive models used for diagnosing diseases or allocating
resources often fail to account for disparities in access to medical services or variations in patient demographics, leading
Authors’ Contact Information: Vittoria Vineis, vineis@diag.uniroma1.it, Sapienza University of Rome, Italy; Giuseppe Perelli, Sapienza University of
Rome, Italy, perelli@di.uniroma1.it; Gabriele Tolomei, Sapienza University of Rome, Italy, tolomei@di.uniroma1.it.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted.
©2025 Copyright held by the owner/author(s).
ArXiv preprint. Under review.
ArXiv preprint. Under review. 1
arXiv:2502.08542v1 [cs.LG] 12 Feb 2025
2 Vineis et al.
to biased outcomes that disproportionately harm marginalized communities [
33
]. Similarly, in nance, credit scoring
algorithms frequently rely on historical data embedded with systemic biases, which can unjustly exclude individuals
from economic opportunities [
21
], while also ignoring the broader eects such choices may have on the wider social
and economic system.
Recent eorts to mitigate these issues often involve post-hoc fairness adjustments or the adoption of generalized
fairness metrics (e.g., [
31
,
34
]). While these approaches represent steps toward more equitable AI, they often fall short of
addressing the nuanced, context-dependent priorities of real-world stakeholders. In particular, many current solutions
remain single-actor in design treating the decision-making process as if there were only one objective thereby
missing the opportunity to model and reconcile multiple, potentially conicting objectives in a transparent manner. In
this context, recent developments in participatory approaches to AI design oer promising avenues for improvement
[13]. However, in most cases, these approaches still lack solutions that are versatile enough for diverse use cases.
To address these limitations, in this paper we introduce a novel multi-actor decision framework that reinterprets
the oine training phase of standard predictive models used to inform decisions as a multi-stakeholder optimization
problem. This shift from a single-objective to a multi-actor perspective is particularly important in domains where
disparate needs and values must be accommodated and justied.
Our proposed framework is characterized by several key features which make it particularly suitable for applications
in high-stakes domains: (i) it explicitly models stakeholder interests by directly encoding diverse preferences into
the decision-making process, ensuring the system reects multiple viewpoints and priorities; (ii) it incorporates a
compromise-driven action selection mechanism to identify actions that balance trade-os across diverse objectives;
(iii) its model-agnostic exibility enables seamless adaptation to various predictive models and application contexts,
making it suitable for a wide range of real-world scenarios; and (iv) it is inherently explainable and transparent,
maintaining an interpretable pipeline and outputs that clarify how dierent objectives inuence the nal decision and
how conicts are resolved.
Overall, this work oers the following novel contributions:
We bridge foundational contributions from reward-based learning, game theory, welfare economics, computa-
tional social choice, and optimization to advance the formalization of participatory AI solutions.
We introduce a theoretically grounded framework that overcomes the limitations of traditional single-
perspective prediction-based systems by systematically modeling and reconciling diverse and potentially
conicting objectives, while providing context-aware solutions.
We demonstrate the eectiveness and generalizability of the proposed framework through rigorous
experiments on two real-world case studies, showing how incorporating stakeholder diversity into the AI
training pipeline improves decision-making outcomes compared to purely predictive baselines when evaluated
across a diverse set of metrics.
To enhance transparency and facilitate broader adoption, we provide complete access to the source code and
experimental setup, enabling reproducibility and promoting its application on new use cases.
The remainder of the paper is organized as follows. Section 2 provides a review of the relevant literature. In Section 3,
we present our proposed framework, which is validated through extensive experiments in Section 4. Section 5 outlines
key practical implications of our framework, while Section 6 concludes the paper, mentioning potential directions for
future research.
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 3
2 Related Work
The topics of fairness and accountability in automated decision-making (ADM) systems have garnered signicant
attention among scholars and practitioners due to their increasingly pervasive deployment in multiple domains [
11
] and
their potential social impact [
2
]. Research in this area has extensively explored pre-, in-, and post-processing techniques
to mitigate biases in data and modeling pipelines (for comprehensive reviews see, for instance, [
31
,
34
]), contributing to
the promotion of fairness in algorithmically supported decision systems. However, studies have demonstrated that
fair algorithms alone cannot guarantee fairness in practice [
19
,
26
], and aspects as interpretability and fairness are
inherently interdependent factors adding complexity to their operationalization in real-world AI systems [
14
,
25
,
37
,
40
].
The multifaceted nature of fairness is further inuenced by cultural and social contexts, complicating eorts to
develop fairness frameworks that extend beyond pre-dened universal metrics [
41
]. Moreover, some authors argue
that, while fairness-aware methods often succeed in reducing biases in model outputs, they frequently fail to address
systemic inequities and, for this reason, approaches that more eectively incorporate diverse stakeholder interests and
account for broader societal impacts are needed to promote more equitable social outcomes [17].
Especially in dynamic and interactive decision-making settings, where multiple actors’ interests come into play,
multi-agent systems may oer a promising framework. Contributions to multi-agent reinforcement learning [
47
]
and multi-agent multi-armed bandit frameworks [
24
] leverage, for instance, welfare functions to foster fairness in
multi-agent decision settings. In this context, Wen et al. [
44
] advance these eorts by integrating feedback eects into
Markov decision processes, enabling the modeling of dynamic, long-term impacts of decisions on fairness outcomes, as
demonstrated through a loan approval scenario. Despite their valuable contribution to the integration of the presence
of multiple actors in ADM systems, though, most approaches rely on predened fairness denitions and require specic
problem structures, limiting their adaptability to evolving stakeholder preferences and use-case-specic requirements.
Against this backdrop, Participatory AI has emerged as a signicant paradigm for integrating diverse stakeholder
perspectives throughout the AI lifecycle, oering opportunities to foster context-dependent fairness and promote
accountability [
7
]. This paradigm emphasizes collaboration and co-creation, promoting inclusivity across both technical
and non-technical domains [
6
,
23
]. Contributions in this area span diverse elds, including healthcare [
15
], judicial
systems [
4
], civic engagement [
1
], philanthropy [
28
], and urban planning [
36
]. More technical applications include
collective debiasing [
9
], collaborative debugging [
32
], ranking with partial preferences [
8
] and web-based tools for
democratizing ML workows [
46
] . Collectively, these contributions reect what has been described as a "participatory
turn" in AI design [13].
However, current challenges such as the technical complexity of AI systems, structural and social barriers to
participation, and power asymmetries hinder broader adoption and expose some applications to the risk of what has
been described as "participation washing" [
43
]. Consequently, the scalability and generalizability of Participatory AI
remain limited [
13
], compounded by the lack of exible, generalizable frameworks that can be applied to a wide range
of use cases.
3 The Participatory Training Framework
3.1 Problem Seing
In traditional prediction-oriented decision-making systems the goal is to recommend an action to be taken based on a
set of features and a predicted outcome. In contrast to these approaches, which rely solely on outcome predictions, we
ArXiv preprint. Under review.
4 Vineis et al.
formulate the task of suggesting an action as a multi-actor decision-making problem, where each actor has individual
preferences over possible actions and outcomes.
Formally, let
I={
1
,
2
, . . . , 𝑁 }
represent a set of stakeholders or actors. Without loss of generality, in our framework,
we dene an actor as any real or symbolic entity that is inuenced by the decisions suggested by the system and,
therefore, holds a direct stake in its resulting outputs. In a grant-lending scenario, the categories of actors might include
the nancial institution and its clients, whereas in a healthcare context, they could be represented by the hospital and
the patients. In our context, each actor
𝑖 I
evaluates potential actions based on their own preferences and seeks
outcomes that align with these preferences. The decision space consists of a set of possible actions
A={𝑎1, 𝑎2, . . . , 𝑎𝑘}
and a set of feasible outcomes
O={𝑜1, 𝑜2, . . . , 𝑜𝑚}
. This decision-making process occurs within a context
𝒙 X
,
representing exogenous conditions or features that can inuence the resulting outcome.
Figure 1 provides a high-level view of our proposed framework, illustrating how it builds upon and extends a
traditional ML pipeline for automated-supported decision-making. The white components depict the standard ML
pipeline for predictive modeling, while the green components show the additional elements introduced by our approach.
The diagram encompasses both the (oine) training phase, where historical data are used, and the (online) inference
phase, demonstrating how the framework operates with new data.
At its core, our framework is domain- and model-agnostic and designed to integrate multiple actor perspectives through
dynamic reward modeling and principled decision-making strategies. It evaluates and ranks decision-making strategies
based on performance across diverse evaluation metrics, addressing heterogeneous stakeholder preferences. As for its
outputs, our framework expands upon traditional approaches by preserving the strengths of purely prediction-based
decision support systems while oering enhanced context-awareness, greater transparency, and a more comprehensive
representation of diverse perspectives. In other words, the proposed method does not diminish the capabilities of
prediction-based decision systems; instead, it complements their suggested decisions with a set of viable alternatives,
thus enriching the expressivity of the decision-making process.
Action spaces and outcomes are handled exibly: the action space
A
may be nite or continuous, and outcomes
O
may be discrete, continuous, or discretized from a continuous domain. When
O
is discrete, a classication-oriented
formulation arises; for a continuous
O
, a regression-oriented model is employed. Without loss of generality, continuous
outcomes can be discretized into a nite set of intervals, yielding a unied perspective for both discrete and continuous
cases. All components of the framework and its end-to-end workow are discussed below.
3.2 Core Components of the Framework
3.2.1 Actor-Based Rewards and Payo Matrices. A central element of our framework is the assignment of rewards to
each action-outcome pair, extending conventional prediction-focused methods to address the diverse preferences of
multiple stakeholders. Inspired by reward-based learning approaches, this mechanism leverages feedback signals to
guide decision-making in a manner consistent with stakeholder priorities.
Formally, each actor 𝑖 I species a reward function:
𝑅𝑖:X × A × O [0,1],
which assigns a score between 0 and 1 to every action-outcome pair
(𝑎, 𝑜)
, given the vector of contextual information
𝒙
.
Higher values of 𝑅𝑖(𝒙, 𝑎, 𝑜)indicate outcomes more desirable to actor 𝑖.
From a computational standpoint,
𝑅𝑖(𝒙, 𝑎, 𝑜)
may be derived via static, domain-specic rules or through a learned
function
𝑞𝑖(𝒙, 𝑎, 𝑜 )
that allows to predict the reward that each actor would associate to each triplet
(𝒙, 𝑎, 𝑜 )
, based on a
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 5
Fig. 1. Overview of the proposed framework as integration of the traditional ML decision-support pipeline
set of rewards associated to the historical set. In the latter case, in fact,
𝑞𝑖
approximates
𝑅𝑖
by training on historical
records of actions, outcomes, and contextual features, subject to bounded approximation error. This exible design
supports both expert-driven reward models and data-driven approaches. Similarly, the historical rewards used to train
actor-specic reward models can be collected from real-world data or generated synthetically for instance, using a
Large Language Model prompted to mimic the decision-making of a particular actor category1.
In scenarios where actions and outcomes are discrete or discretized, we arrange these rewards into a payo matrix.
Here, the rows correspond to actions, the columns represent outcomes, and each matrix cell species the reward value
associated with a particular (𝑎, 𝑜 )tuple given the context 𝒙:
R𝑖(𝒙)=
𝑅𝑖(𝒙, 𝑎1, 𝑜1)𝑅𝑖(𝒙, 𝑎 1, 𝑜2). . . 𝑅𝑖(𝒙, 𝑎1, 𝑜𝑚)
𝑅𝑖(𝒙, 𝑎2, 𝑜1)𝑅𝑖(𝒙, 𝑎 2, 𝑜2). . . 𝑅𝑖(𝒙, 𝑎2, 𝑜𝑚)
.
.
..
.
.....
.
.
𝑅𝑖(𝒙, 𝑎𝑘, 𝑜1)𝑅𝑖(𝒙, 𝑎𝑘, 𝑜 2). . . 𝑅𝑖(𝒙, 𝑎𝑘, 𝑜𝑚)
.
1Refer to Section 5 for further insights and implications related to modeling actors’ reward functions.
ArXiv preprint. Under review.
6 Vineis et al.
If either
A
or
O
is continuous, this matrix-based representation extends naturally to a continuous function
𝑅𝑖(𝒙, 𝑎, 𝑜)
,
provided 𝑅𝑖remains well-dened for all (𝑎, 𝑜 ).
3.2.2 Outcome Predictions and Expected Rewards. Another key component of the framework, which serves as the
cornerstone of traditional approaches to automated-supported decision systems, is the outcome prediction model. This
model learns to predict outcomes based on historical data comprising context vectors and past actions. Formally, it can
be represented as a predictive function:
𝑓:X × A Δ( O) or 𝑓:X × A R,
depending on whether the task is cast as classication or regression, where Δ(O) is the probability simplex over O.
An innovative aspect of our framework lies in its integration of predicted outcomes to compute the expected reward
for each actor. Specically, the predicted outcomes are combined with the actor’s reward function
𝑅𝑖
to evaluate the
desirability of selecting action 𝑎given context 𝒙. For discrete or discretized outcomes, this is expressed as:
E[𝑅𝑖(𝑎|𝒙)] =
𝑚
𝑞=1
𝑃(𝑜𝑞|𝒙, 𝑎)𝑅𝑖(𝒙, 𝑎, 𝑜𝑞),
where
𝑃(𝑜𝑞|𝒙, 𝑎)
is derived from the prediction model
𝑓
. For regression tasks with continuous outcomes, the expectation
generalizes to:
E[𝑅𝑖(𝑎|𝒙)] =O
𝑅𝑖(𝒙, 𝑎, 𝑜)𝑝(𝑜|𝒙, 𝑎 )𝑑𝑜,
where
𝑝(𝑜|𝒙, 𝑎)
denotes the predicted outcome density. Alternatively, in deterministic regression models where
𝑓(𝒙, 𝑎)
directly outputs a single most likely outcome ˆ
𝑜, the expected reward simplies to:
E[𝑅𝑖(𝑎|𝒙)] =𝑅𝑖(𝒙, 𝑎, ˆ
𝑜).
Regardless of the specic prediction approach, the expected rewards for actor
𝑖
across all actions can be aggregated
into a vector:
E[𝑅𝑖(𝑎|𝒙)] =hE[𝑅𝑖(𝑎1|𝒙)], . . . , E[𝑅𝑖(𝑎𝑘|𝒙)] i,(1)
providing a concise representation of the actor’s preferences over the action space A.
3.2.3 Decision Strategies. Once the vectors of actor-specic expected rewards are computed based on the predicted
outcomes, a set of decision strategies can be applied to derive the action suggested by the system. In our framework, we
dene
D=C B
as the set of decision functions that take the expected rewards as input and output the suggested
action from the action space.
In particular, the decision strategies are categorized into two main groups: (i) a set of baseline strategies
B
, and (ii) a
set of strategies designed to nd compromise solutions that balance the preferences of multiple actors, denoted as C.
Compromise Functions. In multi-agent settings, selecting a single action requires balancing the competing preferences
of multiple actors. To achieve this, we dene a set of compromise functions
C={𝐶1,𝐶 2, . . . , 𝐶𝑙}
, where each
𝐶𝑗
represents
a decision-making strategy that aggregates the expected rewards
E[𝑅𝑖(𝑎|𝒙)]
across all actors
𝑖 I
and actions
𝑎 A
.
These compromise functions encode various principles of decision-making and welfare distribution, ranging from
eciency-focused strategies to fairness-based approaches.
Formally, each compromise function
𝐶𝑗
takes as input the matrix of expected rewards
E[𝑅(𝑎|𝒙)] [
0
,
1
]𝑁×𝐾
,
where
𝑁
is the number of actors and
𝐾=|A |
is the number of actions, along with a set of additional parameters
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 7
𝒑 P
. These parameters
𝒑
may include actor-specic baseline values (e.g., disagreement or ideal points) or contextual
information relevant to the decision-making process. The output is the selected action
𝑎 A
, aligned with the specied
decision-making principle.
The mapping implemented by 𝐶𝑗is expressed as:
𝐶𝑗:[0,1]𝑁×𝐾× P A, 𝐶 𝑗E[𝑅(𝑎|𝒙)],𝒑=arg max
𝑎 A
Φ𝑗E[𝑅1(𝑎|𝒙)], . . . , E[𝑅𝑁(𝑎|𝒙)] ;𝒑,
where Φ𝑗is a scalar-valued scoring function that encapsulates the decision-making principle.
Examples of compromise principles include maximizing the total reward (utilitarianism), ensuring equal reward
distribution (egalitarianism), or maximizing the minimum reward (max-min fairness). Table 1 summarizes these and
other principles derived from game theory, computational social choice, and welfare economics, illustrating how
dierent normative criteria can guide decision-making in multi-agent settings.
Table 1. Examples of Compromise Functions
Function Formula Description
Nash Bargaining
Solution
𝐶𝑁 𝐵𝑆 =arg max𝑎 A Î𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)] 𝑑𝑖
Maximizes the product of utility gains
above actor-specic disagreement pay-
os, balancing fairness and eciency.
Proportional
Fairness
𝐶𝑃𝐹 =arg max𝑎 A Í𝑁
𝑖=1logE[𝑅𝑖(𝑎|𝒙) ]
Promotes balanced improvements in
collective well-being, ensuring fair
trade-os.
Nash Social Wel-
fare
𝐶𝑁 𝑆𝑊 =arg max𝑎 A Î𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)]
Maximizes the product of actors’ utili-
ties, equivalent to proportional fairness
under a logarithmic transformation.
Maximin 𝐶𝑀𝑀 =arg max𝑎 A min𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)]
Safeguards the most disadvantaged ac-
tor by maximizing the minimum utility
across actors.
Compromise
Programming
𝐶𝐶𝑃 𝐿2=arg min𝑎 A Í𝑁
𝑖=1𝑤𝑖𝑢
𝑖E[𝑅𝑖(𝑎|𝒙)] 2
Minimizes the weighted Euclidean dis-
tance between actors’ utilities and their
ideal points 𝑢
𝑖.
Kalai-
Smorodinsky
Solution
𝐶𝐾𝑆 =arg max𝑎 A min𝑁
𝑖=1
E[𝑅𝑖(𝑎|𝒙) ]−𝑑𝑖
𝑢
𝑖𝑑𝑖
Maximizes proportional gains toward
each actor’s ideal payo relative to their
disagreement payo.
Baseline Strategies. To contextualize and evaluate the proposed compromise functions, we compare their performance
against two baselines: the outcome predictor baseline (
𝐵pred
) and the individual reward maximization baseline (
𝐵𝑖
max
).
These baselines serve as reference points and represent simple decision strategies that ignore the complexities of a
multi-actor setting.
The outcome predictor baseline provides a straightforward decision-making strategy, leveraging a predened mapping
from predicted outcomes to corresponding actions. This baseline operates under two paradigms, namely (i) a bijective
mapping where each outcome is uniquely associated with the action that optimizes utility for that outcome and (ii) a
heuristic or naive rule, according to which actions are determined based on the predicted outcome or another variable
of interest, without requiring a strict one-to-one correspondence. For instance, in a loan approval scenario, if repayment
is the most likely predicted outcome, this baseline recommends granting the loan. Formally, let
𝑜best
denote the outcome
ArXiv preprint. Under review.
8 Vineis et al.
with the highest predicted probability given context 𝒙. The recommended action is:
𝑎pred =arg max
𝑎 A 𝑃(𝑜best |𝒙,𝑎 ).
Alternatively, in cases where decision-making depends on the predicted value of a variable, this baseline prioritizes
actions that optimize that variable. For example, in medical treatment selection, a naive strategy might choose the
treatment that maximizes the predicted eect size or minimizes the predicted cost.
The individual reward maximization baseline, on the other hand, focuses solely on the perspective of a single actor,
recommending actions that maximize that actor’s expected reward without considering the impact on others. For each
actor 𝑖, the recommended action is:
𝑎(𝑖)
max =arg max
𝑎 A
E[𝑅𝑖(𝑎|𝒙)].
3.2.4 Evaluation Metrics and Optimal Action Selection. Our framework can present all actions recommended by dierent
decision strategies, giving end users tools to make more informed decisions. However, if the goal is to identify the
decision function that yields the best performance across a set of user-dened metrics for decision-making on new data,
a scoring mechanism can be employed to rank the decision strategies and select the most eective one. These metrics
may capture multiple objectives, ranging from standard performance and fairness measures to domain-specic criteria
(e.g., protability in loan decisions, treatment costs in healthcare).
In this context, our framework leverages a
𝑘
-fold cross-validation mechanism not only to tune the outcome prediction
and reward models but also to determine the best decision function based on historical data.
Consider a dataset of exhaustive triplets
{𝒙𝑡, 𝑎, 𝑜 }
, where
𝒙𝑡 X
represents the observed context for each sample,
𝑎 A
denotes a potential action, and
𝑜 O
corresponds to a possible outcome. For a dataset with
𝑇
samples, the total
number of triplets is 𝑇· |A| · | O |, since each sample is associated with every possible action–outcome pair.
To evaluate a decision function
𝐷 D
against a set of user-dened metrics
M={𝑀1, 𝑀2, . . . , 𝑀𝑧}
, we compute its
performance across all triplets:
P(𝐷, 𝑀 )=
1
𝑇𝛼
𝑇
𝑡=1
𝑎 A
𝑜 O
Θ𝐷, 𝑀 |𝒙𝑡, 𝑎, 𝑜 ,
where
Θ𝐷, 𝑀 |𝒙𝑡, 𝑎, 𝑜
quanties the performance of
𝐷
with respect to metric
𝑀
for the triplet
(𝒙𝑡, 𝑎, 𝑜 )
. The parameter
𝛼indicates whether metrics are averaged (𝛼=1, e.g., accuracy) or summed (𝛼=0, e.g., total cost).
To ensure comparability across metrics with dierent scales, the raw performance scores
P(𝐷, 𝑀 )
are normalized to
the [0,1]range:
˜
𝑃(𝐷, 𝑀 )=
P(𝐷, 𝑀 ) min𝐷 D P(𝐷, 𝑀 )
max𝐷 D P(𝐷, 𝑀) min𝐷 D P(𝐷, 𝑀 ),
where
max𝐷 D P(𝐷, 𝑀)
and
min𝐷 D P(𝐷, 𝑀)
denote the highest and lowest performance values across all decision
functions for metric
𝑀
. Because each metric
𝑀
targets a specic optimization objective-maximization (e.g., accuracy),
minimization (e.g., cost), or convergence to a target value (e.g., zero disparity across groups) this normalization is
adapted so that more desirable outcomes map to 1, and less desirable outcomes map to lower scores.
We then compute an aggregated score 𝑆(𝐷)by combining the normalized metrics:
𝑆(𝐷)=
𝑧
=1
𝑤˜
𝑃𝐷, 𝑀,
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 9
where
𝑤
represents the relative importance of metric
𝑀
, subject to
Í𝑧
=1𝑤=
1. If all metrics are of equal importance,
then
𝑤=
1
/𝑧
. Naturally, both the selection of metrics and the assignment of weights can be tailored by the user to
match specic preferences.
At this point, we identify the decision function with the highest aggregated score:
𝐷=arg max
𝐷 D 𝑆(𝐷).
Assuming the historical data and future contexts share a consistent distribution,
𝐷
or the induced ranking guides
future decisions.
When presented with a new context
𝒙
, the outcome prediction model estimates the likelihood of each potential
outcome (or produces a real-valued prediction), and the expected rewards
{E[𝑅𝑖(𝑎|𝒙)] }𝑁
𝑖=1
for all actions
𝑎 A
are
computed. The optimal action 𝑎is then determined via 𝐷:
𝑎=arg max
𝑎 A 𝐷{E[𝑅𝑖(𝑎|𝒙)]}𝑁
𝑖=1.
3.3 Computational Complexity Analysis
The additional computational complexity of the proposed framework, beyond classical cross-validation, arises primarily
from the reward prediction models, expected reward computation, and decision-making processes. Training the reward
models scales with the number of reward types (
𝑁
) and the size of the training dataset (
𝑇
), with the complexity
determined by the underlying model architecture. This complexity can be expressed as
𝑂(𝑐𝑡𝑟 𝑎𝑖𝑛 (𝑇, 𝑔 ))
, where
𝑔
is the
number of features considered in the reward models.
During the evaluation phase, the expected reward computation requires predicting rewards for each combination of
actions and data points, with the test or validation set size denoted as
𝑇
. This step scales linearly with the number of
features considered in the reward prediction model (
𝑔
), the number of actions (
|A |
), and the number of reward types
(
𝑁
): the resulting complexity is given by
𝑂(𝑇·𝑐𝑖𝑛𝑓 (𝑔) · |A| · 𝑁)
, where
𝑐𝑖𝑛𝑓 (𝑔)
represents the inference cost of a single
prediction of the reward model. Similarly, decision-making strategies involve generating compromise and baseline
solutions by iterating over all actions and reward types for each data point. This results in an overall complexity of
𝑂(𝑇· |D | · |A | · 𝑁), where |D| is the total number of decision strategies.
The evaluation of decision outcomes involves calculating fairness-related, performance-related, and case-specic
metrics. This process usually scales proportionally with the number of metrics (
|M |
) and the size of the data points
being evaluated (
𝑇
). Ranking and weighted aggregation of normalized scores introduce additional computational
overhead, with complexity scaling as 𝑂(|M | · 𝑁· | A|).
Since the framework scales linearly with its core components, it can handle moderately large datasets while ensuring
robust and explainable decision recommendations. This linear scalability makes the framework ecient for real-world
applications, where the number of actors, actions, metrics, and decision strategies is typically limited.
4 Experiments
Given the specic scope of our framework, its primary outputs are: (i) a list of actions suggested by each decision
strategy for a given context vector, and (ii) a ranking of decision functions derived from historical data and user-
dened preferences regarding the relative importance of various evaluation metrics. Consequently, the eectiveness of
the framework stems from its ability to provide users with an interpretable tool for making decisions that are both
context-sensitive and aligned with the interests of diverse stakeholders. For this reason, it is not feasible to establish a
ArXiv preprint. Under review.
10 Vineis et al.
universally applicable evaluation framework, as its relevance is highly dependent on the specic use case. Consequently,
the primary objectives of this Section are to demonstrate the versatility of the framework across diverse use cases and
to analyze how its outputs adapt to variations in its core components. To demonstrate the practical applicability of the
proposed framework, we present two representative scenarios: a loan approval scenario, showcasing decision-making in
a multi-classication problem, and a health treatment scenario, illustrating a causal inference-based regression problem.
In both scenarios, reward structures are predened using stakeholder-specic heuristics that are aimed to mimic and
approximate real-world preferences and objectives. To reect the inherent variability in human preferences and assess
the robustness of the learned reward models, uniform noise is added to the synthetic rewards used for training.
While the task of modeling actor-specic rewards is critical and complex, it lies beyond the primary scope of this
work. Instead, the reward structures in our experiments serve as illustrative examples to demonstrate the utility and
advantages of the framework. Furthermore, although the actors in the two scenarios represent prototypical stakeholder
groups, the framework is designed to naturally support any number and type of actors from broad categories to
individualized reward models tailored to specic decision-making processes. This adaptability ensures relevance across
diverse domains and decision-making contexts. The code to reproduce the experiments is fully documented and available
in the following GitHub repository: https://anonymous.4open.science/r/participatory_training-502B
4.1 Real-World Use Cases
4.1.1 Use Case 1: Lending Scenario. In the lending scenario, real-world data from the Lending Club database
2
is used.
Our setup is structured as a 3×3 problem, where the decision space
A
comprises three options ("Grant," "Grant lower
amount" and "Not Grant") and the outcome space
O
consists of three discrete options ("Fully Repaid," "Partially Repaid,"
or "Not Repaid") depending on the real repayment status reported in the dataset. The context includes applicant-specic
features such as credit score, income, nancial history and demographic attributes.
This scenario involves three key stakeholder categories:
Bank, who seeks to maximize protability. Its rewards are tied to repayment probabilities, assigning higher
rewards for fully repaid loans and lower rewards for partial or non-repayment.
Applicant, who prioritize loan access to meet their nancial needs. Their rewards reect the utility derived from
loan approval, modulated by the obligations of repayment. Full approval typically yields higher rewards, while
partial or no approval reduces applicant satisfaction.
Regulatory Body, responsible for ensuring nancial stability and inclusivity in lending practices. Its rewards
balance the stability of the nancial system with the willingness to promote social benets, placing particular
value on providing access to credit for vulnerable applicants.
In addition to the baseline strategies described earlier, we benchmark the performance of compromise functions
against an Oracle strategy, which assumes a bijective mapping between the actual outcomes and the optimal actions,
simulating an idealized decision-making process. Besides performance- and fairness-oriented metrics, we include
case-specic evaluation metrics, namely the percentage of total prot achieved by the bank, the percentage of losses
relative to the total potential loss and the proportion of unexploited prot resulting from suboptimal granting decisions.
In the baseline example, we use a Random Forest as the outcome prediction model. For the ablation study, we examine
the framework’s behavior using a simpler model, specically a k-Nearest Neighbor algorithm. In both this and the
subsequent example, the reward prediction models, which are used to mimic actor-based preferences given a new
2https://www.kaggle.com/datasets/wordsforthewise/lending-club
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 11
context vector, are also implemented as Random Forests, as this model architecture is widely employed by real-world
practitioners and provides sucient learning power for our experiments.
4.1.2 Use Case 2: Healthcare Scenario. For the healthcare scenario, we use the rst realization of the Infant Health and
Development Program (IHDP) dataset, as reported by [
30
]
3
. This dataset, rstly introduced by [
22
], originates from a
randomized experiment studying the eect of home visits on cognitive test scores for infants. It has been widely used
in the causal inference literature [30, 42, 45].
In this setting, the action space consists of two possible actions: assigning or not assigning the treatment to the
candidate patient. The outcome is represented as a scalar real value, indicating the cognitive score achieved by the child
under the given treatment. The context vector comprises 25 binary and continuous features describing both the child
and their family.
It is important to note that, as this is a synthetic benchmark dataset, the true values of the outcomes under treatment
and control are available. These true values are used to evaluate the performance of the causal eect estimation
model but are, of course, not utilized during training. We use this dataset to demonstrate that our framework can be
extended to contexts where the predicted target is linked to a causal eect, thereby showcasing its ability to enhance
decision-making also in causal inference scenarios. In this case, the stakeholder categories considered are the following:
Healthcare Provider, focused on improving patient outcomes while managing costs. Its rewards are based on
normalized outcome improvements relative to a baseline, with penalties for higher treatment costs.
Policy Maker, committed to maximizing societal benets and promoting fairness. Its rewards emphasize outcome
improvements normalized by potential gains, with additional weighting to promote equity across demographic
groups.
Parent, prioritizing the well-being of their child. Their rewards are directly proportional to normalized outcomes,
reecting the straightforward utility parents derive from improved health or cognitive scores.
As in the previous example, random noise is introduced to the rewards to account for real-world variability. In addition
to individual maximization strategies, we also consider a baseline strategy that aims to maximize the achieved cognitive
score. As a natural consequence, this strategy would suggest treating all potential patients. As case-specic evaluation
metrics, we used the mean outcome values for the treated and control groups, as well as the absolute dierence between
these values. To estimate the Conditional Average Treatment Eect (CATE), we use an X-regressor meta-learner [
10
]
that leverages XGBRegressor as its base learner.
4.2 Discussion of Key Insights
Figures 2, 3 and 4 illustrate how variations in key aspects of the framework inuence the performance of dierent
decision functions (baseline strategies and compromise functions) across various evaluation metrics related to decision
accuracy, fairness, and case-specic outcomes. The mean values presented in all Figures are calculated over four runs
using dierent random seeds. These Figures provide a comparative analysis of the performance of decision functions
relative to one another and the Oracle, highlighting trade-os among the metrics, in the previously described lending
scenario. In general, baseline strategies achieve the highest decision accuracy compared to real outcomes. However, they
demonstrate suboptimal performance in terms of fairness (see subplots on Demographic_Parity), where compromise
functions tend to enable more equitable outcomes. This is particularly evident when compared to the outcome prediction
baseline (Outcome_Pred_Model in the plots). Notably, despite all decision functions share the same predicted outcome
3https://github.com/AMLab-Amsterdam/CEVAE
ArXiv preprint. Under review.
12 Vineis et al.
values, since they are based on identical vectors of expected rewards derived from context vectors, compromise functions
implicitly introduce a fairness correction. This correction mitigates biases in baseline strategies, which often favor
specic groups, thereby improving fairness in the decision-making process.
Figure 2 specically examines how variations in the reward structures of the actors (namely the Bank and the
Applicant) aect decision outcomes suggested by each strategy, as reected in the evaluation metrics. In this analysis,
two distinct reward structures are compared: a balanced baseline version and a stricter conguration, where the Bank
values granting loans only when full repayment is expected, and the Applicant prioritizes loan approval regardless of
repayment likelihood. As shown in Figure 2, more "selsh" individual preferences exacerbate the trade-os between
performance- and fairness-oriented metrics, leading to more conservative decisions.
In a similar manner, Figure 3 showcases the eect of training sample sizes. As expected, larger sample sizes consistently
improve all metrics, primarily thanks to the greater accuracy of the outcome prediction model. However, interestingly,
the percentage improvement in case-specic metrics such as the percentage of prot achieved by the Bank relative
to the total obtainable amount under omniscient conditions (Total_Prot) and the percentage of loans fully granted
(Percentage_Grant) exceeds the improvement in decision accuracy. This suggests that variations in the performance of
the outcome model have diversied eects on dierent metrics, inuenced by the unique interrelation of predictions
with the stakeholders’ reward models. Nevertheless, the positive impact of increased sample sizes is also evident in
the improved alignment between performance on the training and test sets. Figure 5 illustrates the mean absolute
dierence in the metrics between training and test data, across all decision functions. Notably, as the number of training
samples increases, this dierence diminishes, indicating that larger sample sizes enable the training performance
to more accurately reect future performance on the test set. This trend underscores the value of larger datasets in
enhancing the reliability and generalization of decision strategies.
Figure 4 explores the impact of the complexity of the outcome prediction model by comparing a key Nearest Neighbor
(kNN) and Random Forest (RF) models. As can be seen, RF consistently outperforms kNN in metrics like accuracy and
precision, highlighting the benets of increased model complexity for predictive performance. However, as previously
observed, fairness and case-specic metrics experience dierent variations compared to prediction accuracy as the
outcome model improves, as they are inuenced by the interplay between the reward models and the rationale behind
decision strategies.
Finally, Figure 6 refers to the healthcare scenario and demonstrates how the proposed framework can be used also to
evaluate the distribution of treatment eects across patients, based on treatment assignment decisions and the expected
treatment eects. This analysis underscores the framework’s ability to assess not only traditional metrics associated
with causal eects but also the equity and impact of treatment allocation across dierent population groups. Thus, once
again, it highlights the framework’s versatility in addressing diverse contexts and applications, as well as its potential
to enhance equity and transparency in decision-making, also in scenarios involving causal eect estimation.
5 Practical Implications
When applying the framework, a couple of important considerations must be taken into account to ensure its eective
and ethical use. A critical aspect is the requirement for a vector of rewards associated with the historical dataset to enable
training and mimic a participatory decision-making process on new data. This step must be handled carefully, as it carries
the risk of introducing bias. Ideally, these reward vectors should be informed by real stakeholder participation or objective
real-world measurements of the eects actions have on the actors, ensuring a comprehensive and accurate representation
of diverse perspectives and impacts. In fact, designing rewards articially or "on desk" risks oversimplifying the
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 13
Fig. 2. Eects of reward structures on average test set performance: comparison of evaluation metrics across decision functions in the
lending scenario
Fig. 3. Eects of training sample sizes on average test set performance: comparison of evaluation metrics across decision functions in
the lending scenario
Fig. 4. Eects of outcome prediction model on average test set performance: comparison of evaluation metrics across decision
functions in the lending scenario
Fig. 5. Impact of sample size on training-test performance alignment (Mean Absolute Dierence) across evaluation metrics
complexities of real-world scenarios and misinterpreting the needs and interests of underprivileged stakeholders,
particularly if they are not directly consulted. For this reason, while the modeling of reward functions allows to
incorporate multiple viewpoints, the construction of these reward functions is critical. Proper modeling ensures that
ArXiv preprint. Under review.
14 Vineis et al.
Fig. 6. Key evaluation metrics across decision functions in the causal healthcare scenario
the resulting decisions are equitable, contextually appropriate, and truly reective of the diversity of stakeholder
preferences, rather than inadvertently reinforcing existing biases or misrepresenting key perspectives.
Another structural characteristic of the framework lies in the assumption of a non-adversarial nature among
stakeholders’ preferences used to train the reward models. This technical choice was motivated by the need to allow
the algorithm to learn from, at least theoretically, the actors’ honest preferences. However, it is important to clarify
that the framework is not intended to mimic a real collective decision-making process, where actors interact and can
strategically adapt their declared preferences; instead, it is designed to systematically represent and reconcile diverse
perspectives and interests in a structured manner.
6 Conclusion and Future Work
In many traditional data-driven systems, a single predictive model recommends an action solely based on historical
data and xed performance metrics. Such methods often overlook the diversity of stakeholder preferences and run the
risk of propagating historical biases in action assignments. By contrast, our multi-actor framework explicitly models
the distinct interests of various stakeholders, integrating their preferences into the decision-making process. This
compromise-driven mechanism allow to nd actions that balance these dierent preferences while still optimizing
user-dened metrics. Moreover, because the framework is model-agnostic, it can be paired with a wide range of
predictive models and applied across diverse domains. Its design also emphasizes explainability, providing clear insight
into how actions are chosen and trade-os are managed. A further advantage is that our approach can indirectly foster
fairer outcomes even without constraining the framework to a specic fairness denition or modifying the underlying
outcome prediction model. As our experimental results illustrate, incorporating multiple stakeholder perspectives can
eectively mitigate biases stemming from historically skewed action assignments. In doing so, the system can promote
more equitable decision-making, outperforming single-model approaches on both case-specic and fairness metrics.
Overall, our framework is built on structural choices that ensure decision recommendations are based on representations
of real stakeholder preferences or the actual eects of actions on them, thereby enhancing the expressive power of
prediction-based decision support systems. While this approach allows for a fair representation of various perspectives
and provides an equitable foundation for identifying compromise solutions, the framework does not capture the dynamic
and strategic interactions that often arise in real-world consultation processes. In such settings, stakeholders may
strategically adjust their revealed preferences in response to those of other actors to inuence outcomes in their favour.
Addressing these competitive or adversarial dynamics, for instance analysing the strategy-proofness [
3
,
38
,
39
] of this
approach, represents a signicant avenue for future work.
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 15
References
[1]
M. Arana-Catania, F. A. V. Lier, R. Procter, N. Tkachenko, Y. He, A. Zubiaga, and M. Liakata. 2021. Citizen participation and machine learning for a
better democracy. Digital Government: Research and Practice 2, 3 (2021), 1–22.
[2]
Theo Araujo, Natali Helberger, Sanne Kruikemeier, and Claes H De Vreese. 2020. In AI we trust? Perceptions about automated decision-making by
articial intelligence. AI & society 35, 3 (2020), 611–623.
[3] Kenneth J. Arrow. 2012. Social Choice and Individual Values. Yale University Press. http://www.jstor.org/stable/j.ctt1nqb90
[4]
C. Barabas, C. Doyle, J. B. Rubinovitz, and K. Dinakar. 2020. Studying up: reorienting the study of algorithmic fairness around issues of power. In
Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 167–176.
[5] Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Calif. L. Rev. 104 (2016), 671.
[6] A. Berditchevskaia, E. Malliaraki, and K. Peach. 2021. Participatory AI for humanitarian innovation. Nesta, London.
[7]
Abeba Birhane, William Isaac, Vinodkumar Prabhakaran, Mark Diaz, Madeleine Clare Elish, Iason Gabriel, and Shakir Mohamed. 2022. Power
to the people? Opportunities and challenges for participatory AI. In Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms,
Mechanisms, and Optimization. 1–8.
[8]
Kathleen Cachel and Elke Rundensteiner. 2024. PreFAIR: Combining Partial Preferences for Fair Consensus Decision-making. In The 2024 ACM
Conference on Fairness, Accountability, and Transparency. 1133–1149.
[9]
Eunice Chan, Zhining Liu, Ruizhong Qiu, Yuheng Zhang, Ross Maciejewski, and Hanghang Tong. 2024. Group Fairness via Group Consensus. In
The 2024 ACM Conference on Fairness, Accountability, and Transparency. 1788–1808.
[10]
Huigang Chen, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. 2020. Causalml: Python package for causal machine learning. arXiv
preprint arXiv:2002.11631 (2020).
[11] Fabio Chiusi, Brigitte Alfter, Minna Ruckenstein, and Tuukka Lehtiniemi. 2020. Automating society report 2020. (2020).
[12]
EP Council. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying Down Harmonised Rules on
Articial Intelligence and Amending Regulations (EC) No 300/2008,(EU) No 167/2013,(EU) No 168/2013,(EU) 2018/858,(EU) 2018/1139 and (EU)
2019/2144 and Directives 2014/90/EU,(EU) 2016/797 and (EU) 2020/1828 (Articial Intelligence Act) O. J. Eur. Union 50 (2024), 202.
[13]
Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. 2023. The participatory turn in ai design: Theoretical foundations and the
current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. 1–23.
[14]
Jonathan Dodge, Q Vera Liao,Yunfeng Zhang, Rachel KE Bellamy, and Casey Dugan. 2019. Explaining models: an empirical study of how explanations
impact fairness judgment. In Proceedings of the 24th international conference on intelligent user interfaces. 275–285.
[15]
J. Donia and J. A. Shaw. 2021. Co-design and ethical articial intelligence for health: An agenda for critical research and practice. Big Data & Society
8, 2 (2021), 20539517211065248.
[16] Luciano Floridi et al. 2021. Ethics, governance, and policies in articial intelligence. Springer.
[17] A. Gerdes. 2022. A participatory data-centric approach to AI Ethics by Design. Applied Articial Intelligence 36, 1 (2022), 2009222.
[18]
Frederic Gerdon, Ruben L Bach, Christoph Kern, and Frauke Kreuter. 2022. Social impacts of algorithmic decision-making: A research agenda for
the social sciences. Big Data & Society 9, 1 (2022), 20539517221089305.
[19]
N. Goel, A. Amayuelas, A. Deshpande, and A. Sharma. 2021. The importance of modeling data missingness in algorithmic fairness: A causal
perspective. In Proceedings of the AAAI Conference on Articial Intelligence, Vol. 35. 7564–7573.
[20]
Stephan Grimmelikhuijsen. 2023. Explaining why the computer says no: Algorithmic transparency aects the perceived trustworthiness of
automated decision-making. Public Administration Review 83, 2 (2023), 241–262.
[21]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29
(2016).
[22]
Jennifer L Hill. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20, 1 (2011), 217–240.
[23]
S. Hossain and S. I. Ahmed. 2021. Towards a New Participatory Approach for Designing Articial Intelligence and Data-Driven Technologies. arXiv
preprint arXiv:2104.04072 (2021).
[24]
Safwan Hossain, Evi Micha, and Nisarg Shah. 2021. Fair algorithms for multi-agent multi-armed bandits. Advances in Neural Information Processing
Systems 34 (2021), 24005–24017.
[25] Aditya Jain, Manish Ravula, and Joydeep Ghosh. 2020. Biased models have biased explanations. arXiv preprint arXiv:2012.10986 (2020).
[26]
H. Jeong, H. Wang, and F. P. Calmon. 2022. Fairness without imputation: A decision tree approach for fair prediction with missing values. In
Proceedings of the AAAI Conference on Articial Intelligence, Vol. 36. 9558–9566.
[27]
Benjamin Laufer, Thomas Gilbert, and Helen Nissenbaum. 2023. Optimization’s Neglected Normative Commitments. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency. 50–63.
[28]
M. K. Lee, D. Kusbit, A. Kahng, J. T. Kim, X. Yuan, A. Chan, and A. D. Procaccia. 2019. WeBuildAI: Participatory framework for algorithmic
governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–35.
[29]
Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, transparent, and accountable algorithmic decision-making
processes: The premise, the proposed solutions, and the open challenges. Philosophy & Technology 31 (2018), 611–627.
[30]
Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal eect inference with deep latent-variable
models. Advances in neural information processing systems 30 (2017).
ArXiv preprint. Under review.
16 Vineis et al.
[31]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning.
ACM computing surveys (CSUR) 54, 6 (2021), 1–35.
[32]
Yuri Nakao, Simone Stumpf, Subeida Ahmed, Aisha Naseer, and Lorenzo Strappelli. 2022. Toward involving end-users in interactive human-in-the-
loop AI fairness. ACM Transactions on Interactive Intelligent Systems (TiiS) 12, 3 (2022), 1–30.
[33]
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health
of populations. Science 366, 6464 (2019), 447–453.
[34] Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Computing Surveys (CSUR) 55, 3 (2022), 1–44.
[35]
Eike Petersen, Melanie Ganz, Sune Holm, and Aasa Feragen. 2023. On (assessing) the fairness of risk score models. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency. 817–829.
[36]
S. J. Quan, J. Park, A. Economou, and S. Lee. 2019. Articial intelligence-aided design: Smart design for sustainable city development. Environment
and Planning B: Urban Analytics and City Science 46, 8 (2019), 1581–1599.
[37]
Resmi Ramachandranpillai, Ricardo Baeza-Yates, and Fredrik Heintz. 2023. FairXAI-A Taxonomy and Framework for Fairness and Explainability
Synergy in Machine Learning. Authorea Preprints (2023).
[38]
Mark Allen Satterthwaite. 1975. The Existence of a Strategy Proof Voting Procedure. Evanston, IL : Northwestern University, Kellogg School of
Management, Center for Mathematical Studies in Economics and Management Science (1975).
[39]
Mark Allen Satterthwaite. 1975. Strategy-proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and
social welfare functions. Journal of Economic Theory 10, 2 (1975), 187–217. doi:10.1016/0022- 0531(75)90050-2
[40]
Jakob Schoeer, Maria De-Arteaga, and Niklas Kuehl. 2022. On the relationship between explanations, fairness perceptions, and decisions. arXiv
preprint arXiv:2204.13156 (2022).
[41]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical
systems. In Proceedings of the conference on fairness, accountability, and transparency. 59–68.
[42]
Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment eect: generalization bounds and algorithms. In International
conference on machine learning. PMLR, 3076–3085.
[43]
M. Sloane, E. Moss, O. Awomolo, and L. Forlano. 2022. Participation is not a design x for machine learning. In Equity and Access in Algorithms,
Mechanisms, and Optimization. 1–6.
[44]
Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for fairness in sequential decision making. In International Conference on Articial
Intelligence and Statistics. PMLR, 1144–1152.
[45]
Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018. Representation learning for treatment eect estimation from
observational data. Advances in neural information processing systems 31 (2018).
[46]
Angie Zhang, Olympia Walker, Kaci Nguyen, Jiajun Dai, Anqing Chen, and Min Kyung Lee. 2023. Deliberating with AI: improving decision-making
for the future through participatory AI design and stakeholder deliberation. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1
(2023), 1–32.
[47]
Matthieu Zimmer, Claire Glanois, Umer Siddique, and Paul Weng. 2021. Learning fair policies in decentralized cooperative multi-agent reinforcement
learning. In International Conference on Machine Learning. PMLR, 12967–12978.
ArXiv preprint. Under review.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.
Article
Explainable artificial intelligence (XAI) and fair learning have made significant strides in various application domains, including criminal recidivism predictions, healthcare settings, toxic comment detection, automatic speech detection, recommendation systems, and image segmentation. However, these two fields have largely evolved independently. Recent studies have demonstrated that incorporating explanations into decision-making processes enhances the transparency and trustworthiness of AI systems. In light of this, our objective is to conduct a systematic review of FairXAI, which explores the interplay between fairness and explainability frameworks. To commence, we propose a taxonomy of FairXAI that utilizes XAI to mitigate and evaluate bias. This taxonomy will be a base for machine learning researchers operating in diverse domains. Additionally, we will undertake an extensive review of existing articles, taking into account factors such as the purpose of the interaction, target audience, and domain and context. Moreover, we outline an interaction framework for FairXAI considering various fairness perceptions and propose a FairXAI wheel that encompasses four core properties that must be verified and evaluated. This will serve as a practical tool for researchers and practitioners, ensuring the fairness and transparency of their AI systems. Furthermore, we will identify challenges and conflicts in the interactions between fairness and explainability, which could potentially pave the way for enhancing the responsibility of AI systems. As the inaugural review of its kind, we hope that this survey will inspire scholars to address these challenges by scrutinizing current research in their respective domains.
Article
Research exploring how to support decision-making has often used machine learning to automate or assist human decisions. We take an alternative approach for improving decision-making, using machine learning to help stakeholders surface ways to improve and make fairer decision-making processes. We created "Deliberating with AI", a web tool that enables people to create and evaluate ML models in order to examine strengths and shortcomings of past decision-making and deliberate on how to improve future decisions. We apply this tool to a context of people selection, having stakeholders---decision makers (faculty) and decision subjects (students)---use the tool to improve graduate school admission decisions. Through our case study, we demonstrate how the stakeholders used the web tool to create ML models that they used as boundary objects to deliberate over organization decision-making practices. We share insights from our study to inform future research on stakeholder-centered participatory AI design and technology for organizational decision-making.
Article
An increasing number of decisions regarding the daily lives of human beings are being controlled by artificial intelligence and machine learning (ML) algorithms in spheres ranging from healthcare, transportation, and education to college admissions, recruitment, provision of loans, and many more realms. Since they now touch on many aspects of our lives, it is crucial to develop ML algorithms that are not only accurate but also objective and fair. Recent studies have shown that algorithmic decision making may be inherently prone to unfairness, even when there is no intention for it. This article presents an overview of the main concepts of identifying, measuring, and improving algorithmic fairness when using ML algorithms, focusing primarily on classification tasks. The article begins by discussing the causes of algorithmic bias and unfairness and the common definitions and measures for fairness. Fairness-enhancing mechanisms are then reviewed and divided into pre-process, in-process, and post-process mechanisms. A comprehensive comparison of the mechanisms is then conducted, toward a better understanding of which mechanisms should be used in different scenarios. The article ends by reviewing several emerging research sub-fields of algorithmic fairness, beyond classification.