Available via license: CC BY 4.0
Content may be subject to copyright.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder
Decision-Making
VITTORIA VINEIS, Sapienza University of Rome, Italy
GIUSEPPE PERELLI, Sapienza University of Rome, Italy
GABRIELE TOLOMEI, Sapienza University of Rome, Italy
Conventional decision-support systems, primarily based on supervised learning, focus on outcome prediction models to recommend
actions. However, they often fail to account for the complexities of multi-actor environments, where diverse and potentially conicting
stakeholder preferences must be balanced.
In this paper, we propose a novel participatory framework that redenes decision-making as a multi-stakeholder optimization problem,
capturing each actor’s preferences through context-dependent reward functions. Our framework leverages
𝑘
-fold cross-validation
to ne-tune user-provided outcome prediction models and evaluate decision strategies, including compromise functions mediating
stakeholder trade-os. We introduce a synthetic scoring mechanism that exploits user-dened preferences across multiple metrics to
rank decision-making strategies and identify the optimal decision-maker. The selected decision-maker can then be used to generate
actionable recommendations for new data. We validate our framework using two real-world use cases, demonstrating its ability
to deliver recommendations that eectively balance multiple metrics, achieving results that are often beyond the scope of purely
prediction-based methods. Ablation studies demonstrate that our framework, with its modular, model-agnostic, and inherently
transparent design, integrates seamlessly with various predictive models, reward structures, evaluation metrics, and sample sizes,
making it particularly suited for complex, high-stakes decision-making contexts.
CCS Concepts: •Computing methodologies
→
Learning paradigms;Modeling methodologies;Model verication and
validation;•Human-centered computing →Collaborative and social computing theory, concepts and paradigms.
Additional Key Words and Phrases: Participatory Articial Intelligence, Participatory Training, Multi-Stakeholder Decision-Making
1 Introduction
Advances in articial intelligence (AI) and machine learning (ML) have fueled the development of automated decision
support systems in high-stakes domains such as healthcare, nance, and public policy [
11
]. However, the predominant
top-down design paradigm and the traditional optimization-oriented setup for these systems are increasingly under
scrutiny, particularly due to their tendency to overlook critical stakeholder perspectives and broader societal impacts
[
18
]. In many cases, algorithmic solutions emphasize predictive accuracy above all else, neglecting or underperforming
in aspects such as fairness [
35
], transparency [
20
], and the consideration of heterogeneous and potentially conicting
interests [
27
]. This narrow focus on outcome prediction has led to impact-blind recommendations, which risk per-
petuating biases and exacerbating inequities [
5
], highlighting the need to regulate and promote the creation of more
trustworthy systems [
12
,
16
,
29
]. For example, in healthcare, predictive models used for diagnosing diseases or allocating
resources often fail to account for disparities in access to medical services or variations in patient demographics, leading
Authors’ Contact Information: Vittoria Vineis, vineis@diag.uniroma1.it, Sapienza University of Rome, Italy; Giuseppe Perelli, Sapienza University of
Rome, Italy, perelli@di.uniroma1.it; Gabriele Tolomei, Sapienza University of Rome, Italy, tolomei@di.uniroma1.it.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted.
©2025 Copyright held by the owner/author(s).
ArXiv preprint. Under review.
ArXiv preprint. Under review. 1
arXiv:2502.08542v1 [cs.LG] 12 Feb 2025
2 Vineis et al.
to biased outcomes that disproportionately harm marginalized communities [
33
]. Similarly, in nance, credit scoring
algorithms frequently rely on historical data embedded with systemic biases, which can unjustly exclude individuals
from economic opportunities [
21
], while also ignoring the broader eects such choices may have on the wider social
and economic system.
Recent eorts to mitigate these issues often involve post-hoc fairness adjustments or the adoption of generalized
fairness metrics (e.g., [
31
,
34
]). While these approaches represent steps toward more equitable AI, they often fall short of
addressing the nuanced, context-dependent priorities of real-world stakeholders. In particular, many current solutions
remain single-actor in design – treating the decision-making process as if there were only one objective – thereby
missing the opportunity to model and reconcile multiple, potentially conicting objectives in a transparent manner. In
this context, recent developments in participatory approaches to AI design oer promising avenues for improvement
[13]. However, in most cases, these approaches still lack solutions that are versatile enough for diverse use cases.
To address these limitations, in this paper we introduce a novel multi-actor decision framework that reinterprets
the oine training phase of standard predictive models used to inform decisions as a multi-stakeholder optimization
problem. This shift from a single-objective to a multi-actor perspective is particularly important in domains where
disparate needs and values must be accommodated and justied.
Our proposed framework is characterized by several key features which make it particularly suitable for applications
in high-stakes domains: (i) it explicitly models stakeholder interests by directly encoding diverse preferences into
the decision-making process, ensuring the system reects multiple viewpoints and priorities; (ii) it incorporates a
compromise-driven action selection mechanism to identify actions that balance trade-os across diverse objectives;
(iii) its model-agnostic exibility enables seamless adaptation to various predictive models and application contexts,
making it suitable for a wide range of real-world scenarios; and (iv) it is inherently explainable and transparent,
maintaining an interpretable pipeline and outputs that clarify how dierent objectives inuence the nal decision and
how conicts are resolved.
Overall, this work oers the following novel contributions:
•
We bridge foundational contributions from reward-based learning, game theory, welfare economics, computa-
tional social choice, and optimization to advance the formalization of participatory AI solutions.
•
We introduce a theoretically grounded framework that overcomes the limitations of traditional single-
perspective prediction-based systems by systematically modeling and reconciling diverse and potentially
conicting objectives, while providing context-aware solutions.
•
We demonstrate the eectiveness and generalizability of the proposed framework through rigorous
experiments on two real-world case studies, showing how incorporating stakeholder diversity into the AI
training pipeline improves decision-making outcomes compared to purely predictive baselines when evaluated
across a diverse set of metrics.
•
To enhance transparency and facilitate broader adoption, we provide complete access to the source code and
experimental setup, enabling reproducibility and promoting its application on new use cases.
The remainder of the paper is organized as follows. Section 2 provides a review of the relevant literature. In Section 3,
we present our proposed framework, which is validated through extensive experiments in Section 4. Section 5 outlines
key practical implications of our framework, while Section 6 concludes the paper, mentioning potential directions for
future research.
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 3
2 Related Work
The topics of fairness and accountability in automated decision-making (ADM) systems have garnered signicant
attention among scholars and practitioners due to their increasingly pervasive deployment in multiple domains [
11
] and
their potential social impact [
2
]. Research in this area has extensively explored pre-, in-, and post-processing techniques
to mitigate biases in data and modeling pipelines (for comprehensive reviews see, for instance, [
31
,
34
]), contributing to
the promotion of fairness in algorithmically supported decision systems. However, studies have demonstrated that
fair algorithms alone cannot guarantee fairness in practice [
19
,
26
], and aspects as interpretability and fairness are
inherently interdependent factors adding complexity to their operationalization in real-world AI systems [
14
,
25
,
37
,
40
].
The multifaceted nature of fairness is further inuenced by cultural and social contexts, complicating eorts to
develop fairness frameworks that extend beyond pre-dened universal metrics [
41
]. Moreover, some authors argue
that, while fairness-aware methods often succeed in reducing biases in model outputs, they frequently fail to address
systemic inequities and, for this reason, approaches that more eectively incorporate diverse stakeholder interests and
account for broader societal impacts are needed to promote more equitable social outcomes [17].
Especially in dynamic and interactive decision-making settings, where multiple actors’ interests come into play,
multi-agent systems may oer a promising framework. Contributions to multi-agent reinforcement learning [
47
]
and multi-agent multi-armed bandit frameworks [
24
] leverage, for instance, welfare functions to foster fairness in
multi-agent decision settings. In this context, Wen et al. [
44
] advance these eorts by integrating feedback eects into
Markov decision processes, enabling the modeling of dynamic, long-term impacts of decisions on fairness outcomes, as
demonstrated through a loan approval scenario. Despite their valuable contribution to the integration of the presence
of multiple actors in ADM systems, though, most approaches rely on predened fairness denitions and require specic
problem structures, limiting their adaptability to evolving stakeholder preferences and use-case-specic requirements.
Against this backdrop, Participatory AI has emerged as a signicant paradigm for integrating diverse stakeholder
perspectives throughout the AI lifecycle, oering opportunities to foster context-dependent fairness and promote
accountability [
7
]. This paradigm emphasizes collaboration and co-creation, promoting inclusivity across both technical
and non-technical domains [
6
,
23
]. Contributions in this area span diverse elds, including healthcare [
15
], judicial
systems [
4
], civic engagement [
1
], philanthropy [
28
], and urban planning [
36
]. More technical applications include
collective debiasing [
9
], collaborative debugging [
32
], ranking with partial preferences [
8
] and web-based tools for
democratizing ML workows [
46
] . Collectively, these contributions reect what has been described as a "participatory
turn" in AI design [13].
However, current challenges such as the technical complexity of AI systems, structural and social barriers to
participation, and power asymmetries hinder broader adoption and expose some applications to the risk of what has
been described as "participation washing" [
43
]. Consequently, the scalability and generalizability of Participatory AI
remain limited [
13
], compounded by the lack of exible, generalizable frameworks that can be applied to a wide range
of use cases.
3 The Participatory Training Framework
3.1 Problem Seing
In traditional prediction-oriented decision-making systems the goal is to recommend an action to be taken based on a
set of features and a predicted outcome. In contrast to these approaches, which rely solely on outcome predictions, we
ArXiv preprint. Under review.
4 Vineis et al.
formulate the task of suggesting an action as a multi-actor decision-making problem, where each actor has individual
preferences over possible actions and outcomes.
Formally, let
I={
1
,
2
, . . . , 𝑁 }
represent a set of stakeholders or actors. Without loss of generality, in our framework,
we dene an actor as any real or symbolic entity that is inuenced by the decisions suggested by the system and,
therefore, holds a direct stake in its resulting outputs. In a grant-lending scenario, the categories of actors might include
the nancial institution and its clients, whereas in a healthcare context, they could be represented by the hospital and
the patients. In our context, each actor
𝑖∈ I
evaluates potential actions based on their own preferences and seeks
outcomes that align with these preferences. The decision space consists of a set of possible actions
A={𝑎1, 𝑎2, . . . , 𝑎𝑘}
and a set of feasible outcomes
O={𝑜1, 𝑜2, . . . , 𝑜𝑚}
. This decision-making process occurs within a context
𝒙∈ X
,
representing exogenous conditions or features that can inuence the resulting outcome.
Figure 1 provides a high-level view of our proposed framework, illustrating how it builds upon and extends a
traditional ML pipeline for automated-supported decision-making. The white components depict the standard ML
pipeline for predictive modeling, while the green components show the additional elements introduced by our approach.
The diagram encompasses both the (oine) training phase, where historical data are used, and the (online) inference
phase, demonstrating how the framework operates with new data.
At its core, our framework is domain- and model-agnostic and designed to integrate multiple actor perspectives through
dynamic reward modeling and principled decision-making strategies. It evaluates and ranks decision-making strategies
based on performance across diverse evaluation metrics, addressing heterogeneous stakeholder preferences. As for its
outputs, our framework expands upon traditional approaches by preserving the strengths of purely prediction-based
decision support systems while oering enhanced context-awareness, greater transparency, and a more comprehensive
representation of diverse perspectives. In other words, the proposed method does not diminish the capabilities of
prediction-based decision systems; instead, it complements their suggested decisions with a set of viable alternatives,
thus enriching the expressivity of the decision-making process.
Action spaces and outcomes are handled exibly: the action space
A
may be nite or continuous, and outcomes
O
may be discrete, continuous, or discretized from a continuous domain. When
O
is discrete, a classication-oriented
formulation arises; for a continuous
O
, a regression-oriented model is employed. Without loss of generality, continuous
outcomes can be discretized into a nite set of intervals, yielding a unied perspective for both discrete and continuous
cases. All components of the framework and its end-to-end workow are discussed below.
3.2 Core Components of the Framework
3.2.1 Actor-Based Rewards and Payo Matrices. A central element of our framework is the assignment of rewards to
each action-outcome pair, extending conventional prediction-focused methods to address the diverse preferences of
multiple stakeholders. Inspired by reward-based learning approaches, this mechanism leverages feedback signals to
guide decision-making in a manner consistent with stakeholder priorities.
Formally, each actor 𝑖∈ I species a reward function:
𝑅𝑖:X × A × O → [0,1],
which assigns a score between 0 and 1 to every action-outcome pair
(𝑎, 𝑜)
, given the vector of contextual information
𝒙
.
Higher values of 𝑅𝑖(𝒙, 𝑎, 𝑜)indicate outcomes more desirable to actor 𝑖.
From a computational standpoint,
𝑅𝑖(𝒙, 𝑎, 𝑜)
may be derived via static, domain-specic rules or through a learned
function
𝑞𝑖(𝒙, 𝑎, 𝑜 )
that allows to predict the reward that each actor would associate to each triplet
(𝒙, 𝑎, 𝑜 )
, based on a
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 5
Fig. 1. Overview of the proposed framework as integration of the traditional ML decision-support pipeline
set of rewards associated to the historical set. In the latter case, in fact,
𝑞𝑖
approximates
𝑅𝑖
by training on historical
records of actions, outcomes, and contextual features, subject to bounded approximation error. This exible design
supports both expert-driven reward models and data-driven approaches. Similarly, the historical rewards used to train
actor-specic reward models can be collected from real-world data or generated synthetically – for instance, using a
Large Language Model prompted to mimic the decision-making of a particular actor category1.
In scenarios where actions and outcomes are discrete or discretized, we arrange these rewards into a payo matrix.
Here, the rows correspond to actions, the columns represent outcomes, and each matrix cell species the reward value
associated with a particular (𝑎, 𝑜 )tuple given the context 𝒙:
R𝑖(𝒙)=
𝑅𝑖(𝒙, 𝑎1, 𝑜1)𝑅𝑖(𝒙, 𝑎 1, 𝑜2). . . 𝑅𝑖(𝒙, 𝑎1, 𝑜𝑚)
𝑅𝑖(𝒙, 𝑎2, 𝑜1)𝑅𝑖(𝒙, 𝑎 2, 𝑜2). . . 𝑅𝑖(𝒙, 𝑎2, 𝑜𝑚)
.
.
..
.
.....
.
.
𝑅𝑖(𝒙, 𝑎𝑘, 𝑜1)𝑅𝑖(𝒙, 𝑎𝑘, 𝑜 2). . . 𝑅𝑖(𝒙, 𝑎𝑘, 𝑜𝑚)
.
1Refer to Section 5 for further insights and implications related to modeling actors’ reward functions.
ArXiv preprint. Under review.
6 Vineis et al.
If either
A
or
O
is continuous, this matrix-based representation extends naturally to a continuous function
𝑅𝑖(𝒙, 𝑎, 𝑜)
,
provided 𝑅𝑖remains well-dened for all (𝑎, 𝑜 ).
3.2.2 Outcome Predictions and Expected Rewards. Another key component of the framework, which serves as the
cornerstone of traditional approaches to automated-supported decision systems, is the outcome prediction model. This
model learns to predict outcomes based on historical data comprising context vectors and past actions. Formally, it can
be represented as a predictive function:
𝑓:X × A → Δ( O) or 𝑓:X × A → R,
depending on whether the task is cast as classication or regression, where Δ(O) is the probability simplex over O.
An innovative aspect of our framework lies in its integration of predicted outcomes to compute the expected reward
for each actor. Specically, the predicted outcomes are combined with the actor’s reward function
𝑅𝑖
to evaluate the
desirability of selecting action 𝑎given context 𝒙. For discrete or discretized outcomes, this is expressed as:
E[𝑅𝑖(𝑎|𝒙)] =
𝑚
𝑞=1
𝑃(𝑜𝑞|𝒙, 𝑎)𝑅𝑖(𝒙, 𝑎, 𝑜𝑞),
where
𝑃(𝑜𝑞|𝒙, 𝑎)
is derived from the prediction model
𝑓
. For regression tasks with continuous outcomes, the expectation
generalizes to:
E[𝑅𝑖(𝑎|𝒙)] =∫O
𝑅𝑖(𝒙, 𝑎, 𝑜)𝑝(𝑜|𝒙, 𝑎 )𝑑𝑜,
where
𝑝(𝑜|𝒙, 𝑎)
denotes the predicted outcome density. Alternatively, in deterministic regression models where
𝑓(𝒙, 𝑎)
directly outputs a single most likely outcome ˆ
𝑜, the expected reward simplies to:
E[𝑅𝑖(𝑎|𝒙)] =𝑅𝑖(𝒙, 𝑎, ˆ
𝑜).
Regardless of the specic prediction approach, the expected rewards for actor
𝑖
across all actions can be aggregated
into a vector:
E[𝑅𝑖(𝑎|𝒙)] =hE[𝑅𝑖(𝑎1|𝒙)], . . . , E[𝑅𝑖(𝑎𝑘|𝒙)] i⊤,(1)
providing a concise representation of the actor’s preferences over the action space A.
3.2.3 Decision Strategies. Once the vectors of actor-specic expected rewards are computed based on the predicted
outcomes, a set of decision strategies can be applied to derive the action suggested by the system. In our framework, we
dene
D=C ∪ B
as the set of decision functions that take the expected rewards as input and output the suggested
action from the action space.
In particular, the decision strategies are categorized into two main groups: (i) a set of baseline strategies
B
, and (ii) a
set of strategies designed to nd compromise solutions that balance the preferences of multiple actors, denoted as C.
Compromise Functions. In multi-agent settings, selecting a single action requires balancing the competing preferences
of multiple actors. To achieve this, we dene a set of compromise functions
C={𝐶1,𝐶 2, . . . , 𝐶𝑙}
, where each
𝐶𝑗
represents
a decision-making strategy that aggregates the expected rewards
E[𝑅𝑖(𝑎|𝒙)]
across all actors
𝑖∈ I
and actions
𝑎∈ A
.
These compromise functions encode various principles of decision-making and welfare distribution, ranging from
eciency-focused strategies to fairness-based approaches.
Formally, each compromise function
𝐶𝑗
takes as input the matrix of expected rewards
E[𝑅(𝑎|𝒙)] ∈ [
0
,
1
]𝑁×𝐾
,
where
𝑁
is the number of actors and
𝐾=|A |
is the number of actions, along with a set of additional parameters
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 7
𝒑∈ P
. These parameters
𝒑
may include actor-specic baseline values (e.g., disagreement or ideal points) or contextual
information relevant to the decision-making process. The output is the selected action
𝑎∗∈ A
, aligned with the specied
decision-making principle.
The mapping implemented by 𝐶𝑗is expressed as:
𝐶𝑗:[0,1]𝑁×𝐾× P → A, 𝐶 𝑗E[𝑅(𝑎|𝒙)],𝒑=arg max
𝑎∈ A
Φ𝑗E[𝑅1(𝑎|𝒙)], . . . , E[𝑅𝑁(𝑎|𝒙)] ;𝒑,
where Φ𝑗is a scalar-valued scoring function that encapsulates the decision-making principle.
Examples of compromise principles include maximizing the total reward (utilitarianism), ensuring equal reward
distribution (egalitarianism), or maximizing the minimum reward (max-min fairness). Table 1 summarizes these and
other principles derived from game theory, computational social choice, and welfare economics, illustrating how
dierent normative criteria can guide decision-making in multi-agent settings.
Table 1. Examples of Compromise Functions
Function Formula Description
Nash Bargaining
Solution
𝐶𝑁 𝐵𝑆 =arg max𝑎∈ A Î𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)] − 𝑑𝑖
Maximizes the product of utility gains
above actor-specic disagreement pay-
os, balancing fairness and eciency.
Proportional
Fairness
𝐶𝑃𝐹 =arg max𝑎∈ A Í𝑁
𝑖=1logE[𝑅𝑖(𝑎|𝒙) ]
Promotes balanced improvements in
collective well-being, ensuring fair
trade-os.
Nash Social Wel-
fare
𝐶𝑁 𝑆𝑊 =arg max𝑎∈ A Î𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)]
Maximizes the product of actors’ utili-
ties, equivalent to proportional fairness
under a logarithmic transformation.
Maximin 𝐶𝑀𝑀 =arg max𝑎∈ A min𝑁
𝑖=1E[𝑅𝑖(𝑎|𝒙)]
Safeguards the most disadvantaged ac-
tor by maximizing the minimum utility
across actors.
Compromise
Programming
𝐶𝐶𝑃 −𝐿2=arg min𝑎∈ A Í𝑁
𝑖=1𝑤𝑖𝑢∗
𝑖−E[𝑅𝑖(𝑎|𝒙)] 2
Minimizes the weighted Euclidean dis-
tance between actors’ utilities and their
ideal points 𝑢∗
𝑖.
Kalai-
Smorodinsky
Solution
𝐶𝐾𝑆 =arg max𝑎∈ A min𝑁
𝑖=1
E[𝑅𝑖(𝑎|𝒙) ]−𝑑𝑖
𝑢∗
𝑖−𝑑𝑖
Maximizes proportional gains toward
each actor’s ideal payo relative to their
disagreement payo.
Baseline Strategies. To contextualize and evaluate the proposed compromise functions, we compare their performance
against two baselines: the outcome predictor baseline (
𝐵pred
) and the individual reward maximization baseline (
𝐵𝑖
max
).
These baselines serve as reference points and represent simple decision strategies that ignore the complexities of a
multi-actor setting.
The outcome predictor baseline provides a straightforward decision-making strategy, leveraging a predened mapping
from predicted outcomes to corresponding actions. This baseline operates under two paradigms, namely (i) a bijective
mapping where each outcome is uniquely associated with the action that optimizes utility for that outcome and (ii) a
heuristic or naive rule, according to which actions are determined based on the predicted outcome or another variable
of interest, without requiring a strict one-to-one correspondence. For instance, in a loan approval scenario, if repayment
is the most likely predicted outcome, this baseline recommends granting the loan. Formally, let
𝑜best
denote the outcome
ArXiv preprint. Under review.
8 Vineis et al.
with the highest predicted probability given context 𝒙. The recommended action is:
𝑎pred =arg max
𝑎∈ A 𝑃(𝑜best |𝒙,𝑎 ).
Alternatively, in cases where decision-making depends on the predicted value of a variable, this baseline prioritizes
actions that optimize that variable. For example, in medical treatment selection, a naive strategy might choose the
treatment that maximizes the predicted eect size or minimizes the predicted cost.
The individual reward maximization baseline, on the other hand, focuses solely on the perspective of a single actor,
recommending actions that maximize that actor’s expected reward without considering the impact on others. For each
actor 𝑖, the recommended action is:
𝑎(𝑖)
max =arg max
𝑎∈ A
E[𝑅𝑖(𝑎|𝒙)].
3.2.4 Evaluation Metrics and Optimal Action Selection. Our framework can present all actions recommended by dierent
decision strategies, giving end users tools to make more informed decisions. However, if the goal is to identify the
decision function that yields the best performance across a set of user-dened metrics for decision-making on new data,
a scoring mechanism can be employed to rank the decision strategies and select the most eective one. These metrics
may capture multiple objectives, ranging from standard performance and fairness measures to domain-specic criteria
(e.g., protability in loan decisions, treatment costs in healthcare).
In this context, our framework leverages a
𝑘
-fold cross-validation mechanism not only to tune the outcome prediction
and reward models but also to determine the best decision function based on historical data.
Consider a dataset of exhaustive triplets
{𝒙𝑡, 𝑎, 𝑜 }
, where
𝒙𝑡∈ X
represents the observed context for each sample,
𝑎∈ A
denotes a potential action, and
𝑜∈ O
corresponds to a possible outcome. For a dataset with
𝑇
samples, the total
number of triplets is 𝑇· |A| · | O |, since each sample is associated with every possible action–outcome pair.
To evaluate a decision function
𝐷∈ D
against a set of user-dened metrics
M={𝑀1, 𝑀2, . . . , 𝑀𝑧}
, we compute its
performance across all triplets:
P(𝐷, 𝑀 )=
1
𝑇𝛼
𝑇
𝑡=1
𝑎∈ A
𝑜∈ O
Θ𝐷, 𝑀 |𝒙𝑡, 𝑎, 𝑜 ,
where
Θ𝐷, 𝑀 |𝒙𝑡, 𝑎, 𝑜
quanties the performance of
𝐷
with respect to metric
𝑀
for the triplet
(𝒙𝑡, 𝑎, 𝑜 )
. The parameter
𝛼indicates whether metrics are averaged (𝛼=1, e.g., accuracy) or summed (𝛼=0, e.g., total cost).
To ensure comparability across metrics with dierent scales, the raw performance scores
P(𝐷, 𝑀 )
are normalized to
the [0,1]range:
˜
𝑃(𝐷, 𝑀 )=
P(𝐷, 𝑀 ) − min𝐷′∈ D P(𝐷′, 𝑀 )
max𝐷′∈ D P(𝐷′, 𝑀) − min𝐷′∈ D P(𝐷′, 𝑀 ),
where
max𝐷′∈ D P(𝐷′, 𝑀)
and
min𝐷′∈ D P(𝐷′, 𝑀)
denote the highest and lowest performance values across all decision
functions for metric
𝑀
. Because each metric
𝑀
targets a specic optimization objective-maximization (e.g., accuracy),
minimization (e.g., cost), or convergence to a target value (e.g., zero disparity across groups) – this normalization is
adapted so that more desirable outcomes map to 1, and less desirable outcomes map to lower scores.
We then compute an aggregated score 𝑆(𝐷)by combining the normalized metrics:
𝑆(𝐷)=
𝑧
ℎ=1
𝑤ℎ˜
𝑃𝐷, 𝑀ℎ,
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 9
where
𝑤ℎ
represents the relative importance of metric
𝑀ℎ
, subject to
Í𝑧
ℎ=1𝑤ℎ=
1. If all metrics are of equal importance,
then
𝑤ℎ=
1
/𝑧
. Naturally, both the selection of metrics and the assignment of weights can be tailored by the user to
match specic preferences.
At this point, we identify the decision function with the highest aggregated score:
𝐷∗=arg max
𝐷∈ D 𝑆(𝐷).
Assuming the historical data and future contexts share a consistent distribution,
𝐷∗
or the induced ranking guides
future decisions.
When presented with a new context
𝒙
, the outcome prediction model estimates the likelihood of each potential
outcome (or produces a real-valued prediction), and the expected rewards
{E[𝑅𝑖(𝑎|𝒙)] }𝑁
𝑖=1
for all actions
𝑎∈ A
are
computed. The optimal action 𝑎∗is then determined via 𝐷∗:
𝑎∗=arg max
𝑎∈ A 𝐷∗{E[𝑅𝑖(𝑎|𝒙)]}𝑁
𝑖=1.
3.3 Computational Complexity Analysis
The additional computational complexity of the proposed framework, beyond classical cross-validation, arises primarily
from the reward prediction models, expected reward computation, and decision-making processes. Training the reward
models scales with the number of reward types (
𝑁
) and the size of the training dataset (
𝑇
), with the complexity
determined by the underlying model architecture. This complexity can be expressed as
𝑂(𝑐𝑡𝑟 𝑎𝑖𝑛 (𝑇, 𝑔 ))
, where
𝑔
is the
number of features considered in the reward models.
During the evaluation phase, the expected reward computation requires predicting rewards for each combination of
actions and data points, with the test or validation set size denoted as
𝑇′
. This step scales linearly with the number of
features considered in the reward prediction model (
𝑔
), the number of actions (
|A |
), and the number of reward types
(
𝑁
): the resulting complexity is given by
𝑂(𝑇′·𝑐𝑖𝑛𝑓 (𝑔) · |A| · 𝑁)
, where
𝑐𝑖𝑛𝑓 (𝑔)
represents the inference cost of a single
prediction of the reward model. Similarly, decision-making strategies involve generating compromise and baseline
solutions by iterating over all actions and reward types for each data point. This results in an overall complexity of
𝑂(𝑇′· |D | · |A | · 𝑁), where |D| is the total number of decision strategies.
The evaluation of decision outcomes involves calculating fairness-related, performance-related, and case-specic
metrics. This process usually scales proportionally with the number of metrics (
|M |
) and the size of the data points
being evaluated (
𝑇′
). Ranking and weighted aggregation of normalized scores introduce additional computational
overhead, with complexity scaling as 𝑂(|M | · 𝑁· | A|).
Since the framework scales linearly with its core components, it can handle moderately large datasets while ensuring
robust and explainable decision recommendations. This linear scalability makes the framework ecient for real-world
applications, where the number of actors, actions, metrics, and decision strategies is typically limited.
4 Experiments
Given the specic scope of our framework, its primary outputs are: (i) a list of actions suggested by each decision
strategy for a given context vector, and (ii) a ranking of decision functions derived from historical data and user-
dened preferences regarding the relative importance of various evaluation metrics. Consequently, the eectiveness of
the framework stems from its ability to provide users with an interpretable tool for making decisions that are both
context-sensitive and aligned with the interests of diverse stakeholders. For this reason, it is not feasible to establish a
ArXiv preprint. Under review.
10 Vineis et al.
universally applicable evaluation framework, as its relevance is highly dependent on the specic use case. Consequently,
the primary objectives of this Section are to demonstrate the versatility of the framework across diverse use cases and
to analyze how its outputs adapt to variations in its core components. To demonstrate the practical applicability of the
proposed framework, we present two representative scenarios: a loan approval scenario, showcasing decision-making in
a multi-classication problem, and a health treatment scenario, illustrating a causal inference-based regression problem.
In both scenarios, reward structures are predened using stakeholder-specic heuristics that are aimed to mimic and
approximate real-world preferences and objectives. To reect the inherent variability in human preferences and assess
the robustness of the learned reward models, uniform noise is added to the synthetic rewards used for training.
While the task of modeling actor-specic rewards is critical and complex, it lies beyond the primary scope of this
work. Instead, the reward structures in our experiments serve as illustrative examples to demonstrate the utility and
advantages of the framework. Furthermore, although the actors in the two scenarios represent prototypical stakeholder
groups, the framework is designed to naturally support any number and type of actors – from broad categories to
individualized reward models tailored to specic decision-making processes. This adaptability ensures relevance across
diverse domains and decision-making contexts. The code to reproduce the experiments is fully documented and available
in the following GitHub repository: https://anonymous.4open.science/r/participatory_training-502B
4.1 Real-World Use Cases
4.1.1 Use Case 1: Lending Scenario. In the lending scenario, real-world data from the Lending Club database
2
is used.
Our setup is structured as a 3×3 problem, where the decision space
A
comprises three options ("Grant," "Grant lower
amount" and "Not Grant") and the outcome space
O
consists of three discrete options ("Fully Repaid," "Partially Repaid,"
or "Not Repaid") depending on the real repayment status reported in the dataset. The context includes applicant-specic
features such as credit score, income, nancial history and demographic attributes.
This scenario involves three key stakeholder categories:
•
Bank, who seeks to maximize protability. Its rewards are tied to repayment probabilities, assigning higher
rewards for fully repaid loans and lower rewards for partial or non-repayment.
•
Applicant, who prioritize loan access to meet their nancial needs. Their rewards reect the utility derived from
loan approval, modulated by the obligations of repayment. Full approval typically yields higher rewards, while
partial or no approval reduces applicant satisfaction.
•
Regulatory Body, responsible for ensuring nancial stability and inclusivity in lending practices. Its rewards
balance the stability of the nancial system with the willingness to promote social benets, placing particular
value on providing access to credit for vulnerable applicants.
In addition to the baseline strategies described earlier, we benchmark the performance of compromise functions
against an Oracle strategy, which assumes a bijective mapping between the actual outcomes and the optimal actions,
simulating an idealized decision-making process. Besides performance- and fairness-oriented metrics, we include
case-specic evaluation metrics, namely the percentage of total prot achieved by the bank, the percentage of losses
relative to the total potential loss and the proportion of unexploited prot resulting from suboptimal granting decisions.
In the baseline example, we use a Random Forest as the outcome prediction model. For the ablation study, we examine
the framework’s behavior using a simpler model, specically a k-Nearest Neighbor algorithm. In both this and the
subsequent example, the reward prediction models, which are used to mimic actor-based preferences given a new
2https://www.kaggle.com/datasets/wordsforthewise/lending-club
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 11
context vector, are also implemented as Random Forests, as this model architecture is widely employed by real-world
practitioners and provides sucient learning power for our experiments.
4.1.2 Use Case 2: Healthcare Scenario. For the healthcare scenario, we use the rst realization of the Infant Health and
Development Program (IHDP) dataset, as reported by [
30
]
3
. This dataset, rstly introduced by [
22
], originates from a
randomized experiment studying the eect of home visits on cognitive test scores for infants. It has been widely used
in the causal inference literature [30, 42, 45].
In this setting, the action space consists of two possible actions: assigning or not assigning the treatment to the
candidate patient. The outcome is represented as a scalar real value, indicating the cognitive score achieved by the child
under the given treatment. The context vector comprises 25 binary and continuous features describing both the child
and their family.
It is important to note that, as this is a synthetic benchmark dataset, the true values of the outcomes under treatment
and control are available. These true values are used to evaluate the performance of the causal eect estimation
model but are, of course, not utilized during training. We use this dataset to demonstrate that our framework can be
extended to contexts where the predicted target is linked to a causal eect, thereby showcasing its ability to enhance
decision-making also in causal inference scenarios. In this case, the stakeholder categories considered are the following:
•
Healthcare Provider, focused on improving patient outcomes while managing costs. Its rewards are based on
normalized outcome improvements relative to a baseline, with penalties for higher treatment costs.
•
Policy Maker, committed to maximizing societal benets and promoting fairness. Its rewards emphasize outcome
improvements normalized by potential gains, with additional weighting to promote equity across demographic
groups.
•
Parent, prioritizing the well-being of their child. Their rewards are directly proportional to normalized outcomes,
reecting the straightforward utility parents derive from improved health or cognitive scores.
As in the previous example, random noise is introduced to the rewards to account for real-world variability. In addition
to individual maximization strategies, we also consider a baseline strategy that aims to maximize the achieved cognitive
score. As a natural consequence, this strategy would suggest treating all potential patients. As case-specic evaluation
metrics, we used the mean outcome values for the treated and control groups, as well as the absolute dierence between
these values. To estimate the Conditional Average Treatment Eect (CATE), we use an X-regressor meta-learner [
10
]
that leverages XGBRegressor as its base learner.
4.2 Discussion of Key Insights
Figures 2, 3 and 4 illustrate how variations in key aspects of the framework inuence the performance of dierent
decision functions (baseline strategies and compromise functions) across various evaluation metrics related to decision
accuracy, fairness, and case-specic outcomes. The mean values presented in all Figures are calculated over four runs
using dierent random seeds. These Figures provide a comparative analysis of the performance of decision functions
relative to one another and the Oracle, highlighting trade-os among the metrics, in the previously described lending
scenario. In general, baseline strategies achieve the highest decision accuracy compared to real outcomes. However, they
demonstrate suboptimal performance in terms of fairness (see subplots on Demographic_Parity), where compromise
functions tend to enable more equitable outcomes. This is particularly evident when compared to the outcome prediction
baseline (Outcome_Pred_Model in the plots). Notably, despite all decision functions share the same predicted outcome
3https://github.com/AMLab-Amsterdam/CEVAE
ArXiv preprint. Under review.
12 Vineis et al.
values, since they are based on identical vectors of expected rewards derived from context vectors, compromise functions
implicitly introduce a fairness correction. This correction mitigates biases in baseline strategies, which often favor
specic groups, thereby improving fairness in the decision-making process.
Figure 2 specically examines how variations in the reward structures of the actors (namely the Bank and the
Applicant) aect decision outcomes suggested by each strategy, as reected in the evaluation metrics. In this analysis,
two distinct reward structures are compared: a balanced baseline version and a stricter conguration, where the Bank
values granting loans only when full repayment is expected, and the Applicant prioritizes loan approval regardless of
repayment likelihood. As shown in Figure 2, more "selsh" individual preferences exacerbate the trade-os between
performance- and fairness-oriented metrics, leading to more conservative decisions.
In a similar manner, Figure 3 showcases the eect of training sample sizes. As expected, larger sample sizes consistently
improve all metrics, primarily thanks to the greater accuracy of the outcome prediction model. However, interestingly,
the percentage improvement in case-specic metrics such as the percentage of prot achieved by the Bank relative
to the total obtainable amount under omniscient conditions (Total_Prot) and the percentage of loans fully granted
(Percentage_Grant) exceeds the improvement in decision accuracy. This suggests that variations in the performance of
the outcome model have diversied eects on dierent metrics, inuenced by the unique interrelation of predictions
with the stakeholders’ reward models. Nevertheless, the positive impact of increased sample sizes is also evident in
the improved alignment between performance on the training and test sets. Figure 5 illustrates the mean absolute
dierence in the metrics between training and test data, across all decision functions. Notably, as the number of training
samples increases, this dierence diminishes, indicating that larger sample sizes enable the training performance
to more accurately reect future performance on the test set. This trend underscores the value of larger datasets in
enhancing the reliability and generalization of decision strategies.
Figure 4 explores the impact of the complexity of the outcome prediction model by comparing a key Nearest Neighbor
(kNN) and Random Forest (RF) models. As can be seen, RF consistently outperforms kNN in metrics like accuracy and
precision, highlighting the benets of increased model complexity for predictive performance. However, as previously
observed, fairness and case-specic metrics experience dierent variations compared to prediction accuracy as the
outcome model improves, as they are inuenced by the interplay between the reward models and the rationale behind
decision strategies.
Finally, Figure 6 refers to the healthcare scenario and demonstrates how the proposed framework can be used also to
evaluate the distribution of treatment eects across patients, based on treatment assignment decisions and the expected
treatment eects. This analysis underscores the framework’s ability to assess not only traditional metrics associated
with causal eects but also the equity and impact of treatment allocation across dierent population groups. Thus, once
again, it highlights the framework’s versatility in addressing diverse contexts and applications, as well as its potential
to enhance equity and transparency in decision-making, also in scenarios involving causal eect estimation.
5 Practical Implications
When applying the framework, a couple of important considerations must be taken into account to ensure its eective
and ethical use. A critical aspect is the requirement for a vector of rewards associated with the historical dataset to enable
training and mimic a participatory decision-making process on new data. This step must be handled carefully, as it carries
the risk of introducing bias. Ideally, these reward vectors should be informed by real stakeholder participation or objective
real-world measurements of the eects actions have on the actors, ensuring a comprehensive and accurate representation
of diverse perspectives and impacts. In fact, designing rewards articially or "on desk" risks oversimplifying the
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 13
Fig. 2. Eects of reward structures on average test set performance: comparison of evaluation metrics across decision functions in the
lending scenario
Fig. 3. Eects of training sample sizes on average test set performance: comparison of evaluation metrics across decision functions in
the lending scenario
Fig. 4. Eects of outcome prediction model on average test set performance: comparison of evaluation metrics across decision
functions in the lending scenario
Fig. 5. Impact of sample size on training-test performance alignment (Mean Absolute Dierence) across evaluation metrics
complexities of real-world scenarios and misinterpreting the needs and interests of underprivileged stakeholders,
particularly if they are not directly consulted. For this reason, while the modeling of reward functions allows to
incorporate multiple viewpoints, the construction of these reward functions is critical. Proper modeling ensures that
ArXiv preprint. Under review.
14 Vineis et al.
Fig. 6. Key evaluation metrics across decision functions in the causal healthcare scenario
the resulting decisions are equitable, contextually appropriate, and truly reective of the diversity of stakeholder
preferences, rather than inadvertently reinforcing existing biases or misrepresenting key perspectives.
Another structural characteristic of the framework lies in the assumption of a non-adversarial nature among
stakeholders’ preferences used to train the reward models. This technical choice was motivated by the need to allow
the algorithm to learn from, at least theoretically, the actors’ honest preferences. However, it is important to clarify
that the framework is not intended to mimic a real collective decision-making process, where actors interact and can
strategically adapt their declared preferences; instead, it is designed to systematically represent and reconcile diverse
perspectives and interests in a structured manner.
6 Conclusion and Future Work
In many traditional data-driven systems, a single predictive model recommends an action solely based on historical
data and xed performance metrics. Such methods often overlook the diversity of stakeholder preferences and run the
risk of propagating historical biases in action assignments. By contrast, our multi-actor framework explicitly models
the distinct interests of various stakeholders, integrating their preferences into the decision-making process. This
compromise-driven mechanism allow to nd actions that balance these dierent preferences while still optimizing
user-dened metrics. Moreover, because the framework is model-agnostic, it can be paired with a wide range of
predictive models and applied across diverse domains. Its design also emphasizes explainability, providing clear insight
into how actions are chosen and trade-os are managed. A further advantage is that our approach can indirectly foster
fairer outcomes even without constraining the framework to a specic fairness denition or modifying the underlying
outcome prediction model. As our experimental results illustrate, incorporating multiple stakeholder perspectives can
eectively mitigate biases stemming from historically skewed action assignments. In doing so, the system can promote
more equitable decision-making, outperforming single-model approaches on both case-specic and fairness metrics.
Overall, our framework is built on structural choices that ensure decision recommendations are based on representations
of real stakeholder preferences or the actual eects of actions on them, thereby enhancing the expressive power of
prediction-based decision support systems. While this approach allows for a fair representation of various perspectives
and provides an equitable foundation for identifying compromise solutions, the framework does not capture the dynamic
and strategic interactions that often arise in real-world consultation processes. In such settings, stakeholders may
strategically adjust their revealed preferences in response to those of other actors to inuence outcomes in their favour.
Addressing these competitive or adversarial dynamics, for instance analysing the strategy-proofness [
3
,
38
,
39
] of this
approach, represents a signicant avenue for future work.
ArXiv preprint. Under review.
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making 15
References
[1]
M. Arana-Catania, F. A. V. Lier, R. Procter, N. Tkachenko, Y. He, A. Zubiaga, and M. Liakata. 2021. Citizen participation and machine learning for a
better democracy. Digital Government: Research and Practice 2, 3 (2021), 1–22.
[2]
Theo Araujo, Natali Helberger, Sanne Kruikemeier, and Claes H De Vreese. 2020. In AI we trust? Perceptions about automated decision-making by
articial intelligence. AI & society 35, 3 (2020), 611–623.
[3] Kenneth J. Arrow. 2012. Social Choice and Individual Values. Yale University Press. http://www.jstor.org/stable/j.ctt1nqb90
[4]
C. Barabas, C. Doyle, J. B. Rubinovitz, and K. Dinakar. 2020. Studying up: reorienting the study of algorithmic fairness around issues of power. In
Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 167–176.
[5] Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Calif. L. Rev. 104 (2016), 671.
[6] A. Berditchevskaia, E. Malliaraki, and K. Peach. 2021. Participatory AI for humanitarian innovation. Nesta, London.
[7]
Abeba Birhane, William Isaac, Vinodkumar Prabhakaran, Mark Diaz, Madeleine Clare Elish, Iason Gabriel, and Shakir Mohamed. 2022. Power
to the people? Opportunities and challenges for participatory AI. In Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms,
Mechanisms, and Optimization. 1–8.
[8]
Kathleen Cachel and Elke Rundensteiner. 2024. PreFAIR: Combining Partial Preferences for Fair Consensus Decision-making. In The 2024 ACM
Conference on Fairness, Accountability, and Transparency. 1133–1149.
[9]
Eunice Chan, Zhining Liu, Ruizhong Qiu, Yuheng Zhang, Ross Maciejewski, and Hanghang Tong. 2024. Group Fairness via Group Consensus. In
The 2024 ACM Conference on Fairness, Accountability, and Transparency. 1788–1808.
[10]
Huigang Chen, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. 2020. Causalml: Python package for causal machine learning. arXiv
preprint arXiv:2002.11631 (2020).
[11] Fabio Chiusi, Brigitte Alfter, Minna Ruckenstein, and Tuukka Lehtiniemi. 2020. Automating society report 2020. (2020).
[12]
EP Council. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying Down Harmonised Rules on
Articial Intelligence and Amending Regulations (EC) No 300/2008,(EU) No 167/2013,(EU) No 168/2013,(EU) 2018/858,(EU) 2018/1139 and (EU)
2019/2144 and Directives 2014/90/EU,(EU) 2016/797 and (EU) 2020/1828 (Articial Intelligence Act) O. J. Eur. Union 50 (2024), 202.
[13]
Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. 2023. The participatory turn in ai design: Theoretical foundations and the
current state of practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. 1–23.
[14]
Jonathan Dodge, Q Vera Liao,Yunfeng Zhang, Rachel KE Bellamy, and Casey Dugan. 2019. Explaining models: an empirical study of how explanations
impact fairness judgment. In Proceedings of the 24th international conference on intelligent user interfaces. 275–285.
[15]
J. Donia and J. A. Shaw. 2021. Co-design and ethical articial intelligence for health: An agenda for critical research and practice. Big Data & Society
8, 2 (2021), 20539517211065248.
[16] Luciano Floridi et al. 2021. Ethics, governance, and policies in articial intelligence. Springer.
[17] A. Gerdes. 2022. A participatory data-centric approach to AI Ethics by Design. Applied Articial Intelligence 36, 1 (2022), 2009222.
[18]
Frederic Gerdon, Ruben L Bach, Christoph Kern, and Frauke Kreuter. 2022. Social impacts of algorithmic decision-making: A research agenda for
the social sciences. Big Data & Society 9, 1 (2022), 20539517221089305.
[19]
N. Goel, A. Amayuelas, A. Deshpande, and A. Sharma. 2021. The importance of modeling data missingness in algorithmic fairness: A causal
perspective. In Proceedings of the AAAI Conference on Articial Intelligence, Vol. 35. 7564–7573.
[20]
Stephan Grimmelikhuijsen. 2023. Explaining why the computer says no: Algorithmic transparency aects the perceived trustworthiness of
automated decision-making. Public Administration Review 83, 2 (2023), 241–262.
[21]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29
(2016).
[22]
Jennifer L Hill. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20, 1 (2011), 217–240.
[23]
S. Hossain and S. I. Ahmed. 2021. Towards a New Participatory Approach for Designing Articial Intelligence and Data-Driven Technologies. arXiv
preprint arXiv:2104.04072 (2021).
[24]
Safwan Hossain, Evi Micha, and Nisarg Shah. 2021. Fair algorithms for multi-agent multi-armed bandits. Advances in Neural Information Processing
Systems 34 (2021), 24005–24017.
[25] Aditya Jain, Manish Ravula, and Joydeep Ghosh. 2020. Biased models have biased explanations. arXiv preprint arXiv:2012.10986 (2020).
[26]
H. Jeong, H. Wang, and F. P. Calmon. 2022. Fairness without imputation: A decision tree approach for fair prediction with missing values. In
Proceedings of the AAAI Conference on Articial Intelligence, Vol. 36. 9558–9566.
[27]
Benjamin Laufer, Thomas Gilbert, and Helen Nissenbaum. 2023. Optimization’s Neglected Normative Commitments. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency. 50–63.
[28]
M. K. Lee, D. Kusbit, A. Kahng, J. T. Kim, X. Yuan, A. Chan, and A. D. Procaccia. 2019. WeBuildAI: Participatory framework for algorithmic
governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–35.
[29]
Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, transparent, and accountable algorithmic decision-making
processes: The premise, the proposed solutions, and the open challenges. Philosophy & Technology 31 (2018), 611–627.
[30]
Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal eect inference with deep latent-variable
models. Advances in neural information processing systems 30 (2017).
ArXiv preprint. Under review.
16 Vineis et al.
[31]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning.
ACM computing surveys (CSUR) 54, 6 (2021), 1–35.
[32]
Yuri Nakao, Simone Stumpf, Subeida Ahmed, Aisha Naseer, and Lorenzo Strappelli. 2022. Toward involving end-users in interactive human-in-the-
loop AI fairness. ACM Transactions on Interactive Intelligent Systems (TiiS) 12, 3 (2022), 1–30.
[33]
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health
of populations. Science 366, 6464 (2019), 447–453.
[34] Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Computing Surveys (CSUR) 55, 3 (2022), 1–44.
[35]
Eike Petersen, Melanie Ganz, Sune Holm, and Aasa Feragen. 2023. On (assessing) the fairness of risk score models. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency. 817–829.
[36]
S. J. Quan, J. Park, A. Economou, and S. Lee. 2019. Articial intelligence-aided design: Smart design for sustainable city development. Environment
and Planning B: Urban Analytics and City Science 46, 8 (2019), 1581–1599.
[37]
Resmi Ramachandranpillai, Ricardo Baeza-Yates, and Fredrik Heintz. 2023. FairXAI-A Taxonomy and Framework for Fairness and Explainability
Synergy in Machine Learning. Authorea Preprints (2023).
[38]
Mark Allen Satterthwaite. 1975. The Existence of a Strategy Proof Voting Procedure. Evanston, IL : Northwestern University, Kellogg School of
Management, Center for Mathematical Studies in Economics and Management Science (1975).
[39]
Mark Allen Satterthwaite. 1975. Strategy-proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and
social welfare functions. Journal of Economic Theory 10, 2 (1975), 187–217. doi:10.1016/0022- 0531(75)90050-2
[40]
Jakob Schoeer, Maria De-Arteaga, and Niklas Kuehl. 2022. On the relationship between explanations, fairness perceptions, and decisions. arXiv
preprint arXiv:2204.13156 (2022).
[41]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical
systems. In Proceedings of the conference on fairness, accountability, and transparency. 59–68.
[42]
Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment eect: generalization bounds and algorithms. In International
conference on machine learning. PMLR, 3076–3085.
[43]
M. Sloane, E. Moss, O. Awomolo, and L. Forlano. 2022. Participation is not a design x for machine learning. In Equity and Access in Algorithms,
Mechanisms, and Optimization. 1–6.
[44]
Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for fairness in sequential decision making. In International Conference on Articial
Intelligence and Statistics. PMLR, 1144–1152.
[45]
Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018. Representation learning for treatment eect estimation from
observational data. Advances in neural information processing systems 31 (2018).
[46]
Angie Zhang, Olympia Walker, Kaci Nguyen, Jiajun Dai, Anqing Chen, and Min Kyung Lee. 2023. Deliberating with AI: improving decision-making
for the future through participatory AI design and stakeholder deliberation. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1
(2023), 1–32.
[47]
Matthieu Zimmer, Claire Glanois, Umer Siddique, and Paul Weng. 2021. Learning fair policies in decentralized cooperative multi-agent reinforcement
learning. In International Conference on Machine Learning. PMLR, 12967–12978.
ArXiv preprint. Under review.