Conference PaperPDF Available

"It is currently hodgepodge": Examining AI/ML Practitioners' Challenges during Co-production of Responsible AI Values

Authors:

Abstract and Figures

Recently, the AI/ML research community has indicated an urgent need to establish Responsible AI (RAI) values and practices as part of the AI/ML lifecycle. Several organizations and communities are responding to this call by sharing RAI guidelines. However, there are gaps in awareness, deliberation, and execution of such practices for multi-disciplinary ML practitioners. This work contributes to the discussion by unpacking co-production challenges faced by practitioners as they align their RAI values. We interviewed 23 individuals, across 10 organizations, tasked to ship AI/ML based products while upholding RAI norms and found that both top-down and bottom-up institutional structures create burden for different roles preventing them from upholding RAI values, a challenge that is further exacerbated when executing conflicted values. We share multiple value levers used as strategies by the practitioners to resolve their challenges. We end our paper with recommendations for inclusive and equitable RAI value-practices, creating supportive organizational structures and opportunities to further aid practitioners.
Content may be subject to copyright.
“It is currently hodgepodge”: Examining AI/ML Practitioners’
Challenges during Co-production of Responsible AI Values
Rama Adithya Varanasi
Information Science
Cornell University
New York, NY, USA
rv288@cornell.edu
Nitesh Goyal
Google Research
Google
New York, NY, USA
teshgoyal@acm.org
Intelligence (RAI) [
99
]. RAI is an umbrella term that comprises dif-
ferent human values, principles, and actions to develop AI ethically
ABSTRACT
Recently, the AI/ML research community has indicated an urgent
need to establish Responsible AI (RAI) values and practices as part
of the AI/ML lifecycle. Several organizations and communities are
responding to this call by sharing RAI guidelines. However, there
are gaps in awareness, deliberation, and execution of such practices
for multi-disciplinary ML practitioners. This work contributes to
the discussion by unpacking co-production challenges faced by
practitioners as they align their RAI values. We interviewed 23
individuals, across 10 organizations, tasked to ship AI/ML based
products while upholding RAI norms and found that both top-down
and bottom-up institutional structures create burden for dierent
roles preventing them from upholding RAI values, a challenge that
is further exacerbated when executing conicted values. We share
multiple value levers used as strategies by the practitioners to re-
solve their challenges. We end our paper with recommendations
for inclusive and equitable RAI value-practices, creating supportive
organizational structures and opportunities to further aid practi-
tioners.
CCS CONCEPTS
Human-centered computing Empirical studies in HCI.
KEYWORDS
Responsible AI, RAI, ethical AI, value levers, co-production, collab-
oration, XAI, FAT, fairness, transparency, accountability, explain-
ability
ACM Reference Format:
Rama Adithya Varanasi and Nitesh Goyal. 2023. “It is currently hodgepodge”:
Examining AI/ML Practitioners’ Challenges during Co-production of Re-
sponsible AI Values. In Proceedings of the 2023 CHI Conference on Human Fac-
tors in Computing Systems (CHI ’23), April 23–28, 2023, Hamburg, Germany.
ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3544548.3580903
1 INTRODUCTION
In November 2021, the UN Educational, Scientic, and Cultural Or-
ganization (UNESCO) signed a historic agreement outlining shared
values needed to ensure the development of Responsible Articial
This work is licensed under a Creative Commons Attribution International
4.0 License.
CHI ’23, April 23–28, 2023, Hamburg, Germany
© 2023 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9421-5/23/04.
https://doi.org/10.1145/3544548.3580903
and responsibly [
3
,
21
,
37
,
82
]. Through UNESCO’s agreement, for
the rst time, 193 countries have standardized recommendations
on the ethics of AI. While unprecedented, this agreement is just
one of several eorts providing recommendations on dierent RAI
values to be implemented within AI/ML systems [40, 101, 102].
In response, several industry organizations have begun to im-
plement the recommendations, creating cross-functional RAI in-
stitutional structures and activities that enable practitioners to
engage with the RAI values. For instance, several big-tech com-
panies are implementing common RAI values, such as Fairness,
Transparency, Accountability, and Privacy, as part of their RAI ini-
tiatives [
9
,
29
,
41
,
51
,
74
]. However, such RAI values have minimal
overlap with the values prescribed by UNESCO’s framework that
promotes non-malecence, diversity, inclusion, and harmony [
100
].
Scholars have attributed the lack of overlap to dierent business
and institutional contexts involved in developing AI/ML
1
systems
[
50
,
56
]. Subsequently, it is essential to understand these contexts
by engaging with practitioners across multiple roles who come
together to co-produce and enact such RAI values. Co-production
is an iterative process through which organizations produce collec-
tive knowledge [
58
]. During co-production, individual practitioners
may hold certain values (e.g., social justice), yet their teams might
prioritize other values. Rakova et al
. [82]
hints at potential chal-
lenges that can arise due to such mismatches in RAI values. Our
study builds on this critical gap by giving a detailed analysis of
those challenges and strategies (if any) devised to overcome such
strains as practitioners co-produce AI/ML systems.
We interviewed 23 practitioners across a variety of roles to under-
stand their RAI value practices and challenges. Our ndings show
that institutional structures around RAI value co-production con-
tributed to key challenges for the practitioners. We also discovered
multiple tensions that arose between roles and organizations during
prioritization, deliberation, and implementation. Interestingly, we
also observed development of ten dierent RAI value levers [
91
,
92
].
These are creative activities meant to engage individuals in value
conversations that help reduce value tensions. In the remainder of
the paper, we rst discuss related work about collective values in
Responsible AI from an HCI perspective and outline our research
questions. We then present the research methodology and results
of the study. We conclude with a discussion of our contributions in
improving RAI co-production practices of AI practitioners. Overall,
this work makes several contributions. First, we describe the expe-
riences and the organizational environment within which AI/ML
1hereafter, we use AI/ML systems to refer to both AI products and ML models
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
practitioners co-produce RAI values. Second, we illustrate multiple
challenges faced by AI practitioners owing to dierent organiza-
tional structures, resulting in several tensions in co-production.
Third, we unpack ten RAI value levers as strategies to overcome
challenges and map them on to the RAI values. Lastly, we provide
essential strategies at the dierent levels (individual, and organiza-
tional) to better facilitate and sustain RAI value co-production.
2 RELATED WORK
2.1 The Field of Responsible AI
In the last decade, Responsible AI (RAI) has grown into an overarch-
ing eld that aims to make AI/ML more accountable to its outcomes
[
1
,
3
,
20
]. One of the eld’s roots lies in Ethical AI, where critical
engagement with ethical values in the otherwise traditional AI/ML
eld have been encouraged [
2
,
39
,
63
,
104
]. Example studies include
engagement with core ethical values to provide nuance in technical
AI/ML discourse [
47
], translating ethical values into implementa-
tion scenarios [
39
,
49
], and AI/ML guidelines [
45
,
47
,
52
,
60
,
103
],
and seminal studies that brought critical ethical problems to the
forefront [4, 24, 76, 113].
RAI has also drawn its inspiration from AI for Social Good (AI4SG
[
89
]) research to study human values more broadly, going beyond
“ethical” values. AI4SG helped the RAI eld to translate such values
embedded in AI/ML systems into positive community outcomes [
6
]
by eliciting specic values (e.g., solidarity [
31
]), developing meth-
ods (e.g., capabilities approach [
11
]), and producing representations
(e.g., explanations [
79
]) that strongly align with community goals
(e.g., the UN Sustainable Development Goals [
37
]). For example,
studies have explicitly engaged with underserved communities to
examine the impact of the embedded values within AI/ML systems
in their lives (e.g., in agriculture [
70
], health [
53
,
107
], and education
[
14
] domains). More recent studies have shed light on how certain
practitioners’ (e.g., data and crowd workers) practices, contribu-
tions, and values are often ignored while developing the AI/ML
systems [
85
,
97
,
105
]. Some have looked at dierent values (e.g.,
fairness) that are often ignored by the discriminatory algorithms
[
16
,
46
,
61
]. More recent work at the intersection of these two by
Goyal et al
. [42]
has also highlighted the impact of data workers of
marginalized communities on the AI/ML algorithms to highlight
complexity when building for RAI values like equity.
Lastly, RAI has also drawn motivation from recent movements
associated with specic value(s). One such movement is around
the value of explainability (or explainable AI) that arose from the
need to make AI/ML systems more accountable and trustwor-
thy [
5
,
22
,
25
]. A similar movement within the RAI’s purview fo-
cused on FATE values (Fairness, Accountability, Transparency, and
Ethics/Explainability) [
18
,
66
,
75
,
93
]. While both movements have
challenged the notion of universal applicability of RAI values, our
study illustrates how these challenges do indeed appear in practice
and the strategies used by practitioners on the ground to resolve
these challenges.
Taken together, RAI has emerged as an umbrella term that encap-
sulates the above movements, encouraging critical value discourses
to produce a positive impact. At the same time, departing from
previous movements that focused on specic issues within AI/ML
practices, RAI takes a broad institutional approach that promotes
disparate AI/ML practitioners to come together, share, and enact
on key values [
82
]. Our study expands RAI discipline by surfacing
on-ground challenges of diverse AI/ML practitioners attempting to
engage in shared RAI responsibilities, such as collaborative value
discourse and implementation. In the next section, we further un-
pack the notion of values, examine their roots in HCI and their
current role in RAI.
2.2 Collective Values in Responsible AI: HCI
perspectives
Science & Technology Studies eld has long examined how values
are embedded in technology systems in various social and political
contexts [
67
,
96
,
109
]. In recent years, studies within HCI have built
on this foundation to bring a critical lens into the development
of technology. Initial studies conducted by Nissenbaum and col-
leagues argued against the previously held belief that technology
is “value-neutral”, showcasing how practitioners embed specic
values through their deliberate design decisions [
30
,
33
]. Value-
sensitive Design (VSD) by Friedman et al
. [36]
was another step in
this direction. It has been used as a reective lens to explore tech-
nological aordances (through conceptual and empirical inquiry)
as well as an action lens to create technological solutions (techno-
logical inquiry) [
34
,
48
]. While VSD’s core philosophy remained
the same, it has been extended, often in response to its criticisms
[54, 68].
A criticism relevant to this study is practitioners’ ease in apply-
ing VSD in the industry contexts [
90
]. VSD is perceived to have
a relatively long turnaround time, often requiring specialists for
implementation. To overcome these challenges, Shilton proposed
‘value levers’, a low-cost entry point for value-oriented conversa-
tions while building technology artifacts in the organization [
90
,
91
].
Value levers are open-ended activities that engage participants in
value-oriented discourses to develop common ground. With cre-
ative representations, value levers can transform slow and cumber-
some value conversations into creative and fruitful engagements
[
90
]. While previous studies have applied and shaped the notion of
value levers in a very specic set of contexts, such as showcasing
how designers employ them in their practices [
90
], this work shows
a broader utility of value levers among a diverse set of practitioners
while navigating the complex space of RAI value discourse.
Within AI/ML research, initial exploration of values were still
primarily computational in nature, such as performance [
15
], gen-
eralizability [
55
], and eciency [
27
]. With the advent of HCI and
critical studies focusing on discriminatory algorithms [
10
] and re-
sponsible AI, the discussions shifted to much broader values, such
as societal and ethical values, within the AI/ML eld [
9
]. These
studies focused on exposing inherent biases in the models due to ab-
sence of substantive social and ethical values. For instance, Burrell
[13]
demonstrated how complex ML models have inherent inter-
pretability issues stemming from a lack of transparency about how
predictions were achieved. Another set of studies by Eubanks
[28]
and Noble
[78]
scrutinized several algorithms governing the digital
infrastructures employed in our daily lives to expose discriminatory
behaviors within dierent situations, especially in the context of
fairness, against marginalized populations. In a similar vein, sev-
eral studies have explored individual values that they felt were
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
critical for models, such as fairness [
16
,
32
,
75
], explainability [
22
],
non-malfeasance [
60
,
77
], and justice [
8
,
46
,
61
], reecting societal
norms. A common underlying factor among several of these studies
was that they focused on individual values enacted in their own
spaces. Recently, however, a few studies have adopted contrasting
perspectives which argue that values do not exist in isolation, but
often occupy overlapping and contested spaces [
36
,
114
]. Our study
aims to provide much-needed deeper insights within this complex
space by showing how practitioners engage with and prioritize
multiple values in a contested space.
Another value dimension explored is whose values should be
considered while producing AI/ML algorithms [
12
]?” Most studies
have engaged with end-users values, lending a critical lens to the
deployed models and their implications on society [
64
,
84
,
94
,
110
].
These studies challenged whether developing fair algorithms should
be primarily a technical task without considering end-users’ values
[
50
,
83
]. Subsequently, researchers leveraged action research (e.g.,
participatory approaches [
19
]) to design toolkits, frameworks, and
guidelines that accommodated end-user values in producing ML
models [62, 65, 88].
A more relevant set of studies have recognized the importance
of understanding the values that dierent practitioner roles em-
bedded while producing responsible algorithms [
81
,
86
,
112
]. Such
practitioner-focused studies are critical in understanding “how” and
“why” particular values are embedded in AI/ML models early on
in the life cycle. However, these studies have explored particular
practitioners’ values in silos, leaving much to be learned about
their collective value deliberations. A nascent group of studies has
answered this call. For example Madaio et al
. [71]
focused on con-
trolled settings, where specic practitioners’ values could co-design
a fairness checklist as one of their RAI values. Jakesch et al
. [56]
ex-
plored a broader set of values of practitioners and compared it with
end-users in an experimental setting. Another relevant study by
Rakova et al
. [82]
has explored RAI decision-making in an organiza-
tional setting, laying a roadmap to get from the current conditions
to aspirational RAI practices.
Our study contributes to this developing literature in four ways.
First, within the context of Responsible AI practices, our study goes
beyond the scenario-based, controlled settings, or experimental se-
tups by focusing on natural work settings [
56
,
71
], which echoes the
sentiment of some of the previous open-ended qualitative studies
that were conducted in the organizations [
81
,
112
], but not in the
context of Responsible AI practices. Second, we focus on a diversity
of stakeholder roles, who are making an explicit eort to recognize
and incorporate RAI values, unlike siloed studies previously. Third,
we leverage the lens of co-production [
58
] to study RAI values in
natural work settings. Fourth, our study extends [
82
] by explicitly
unpacking the co-production challenges deeply rooted in RAI val-
ues. To this end, we answer two research questions:
(RQ-1): What challenges do AI/ML practitioners face when co-producing
and implementing RAI values ?
(RQ-2): In response, what strategies do practitioners use to overcome
challenges as they implement RAI values ?
2.3 Co-production as a Lens
To answer our research questions, we employed the conceptual
framework of co-production proposed by Jasano
[58]
. She dened
co-production as a symbiotic process in which collective knowledge
and innovations produced by knowledge societies are inseparable
from the social order that governs society. Jasano characterized
knowledge societies broadly to include both state actors (e.g., gov-
ernments) and non-state actors (e.g., corporations, non-prots) that
have an enormous impact on the communities they serve. Studying
co-production can help scholars visualize the relationship between
knowledge and practice. Such a relationship oers new ways to
not only understand how establishments organize or express them-
selves but also what they value and how they assume responsibility
for their innovations.
To operationalize co-production in our study, we invoke three
investigation sites, as Jasano proposed. The rst site of exploration
is the institutions containing dierent structures that empower or
hinder individuals to co-produce. The second site examines dif-
ferent types of discourse that occur as part of the co-production
activities. Solving technological problems often involve discourses
producing new knowledge and linking such knowledge to practice.
The last site of co-production is representations produced both dur-
ing co-production to facilitate discourses and after co-production
in the form of the end-product. The three sites of the co-production
framework are appropriate for understanding the current industry
challenges around RAI innovation for several reasons. Techno-
logical corporations developing AI/ML innovations have a robust
bi-directional relationship with their end-user communities. More-
over, for successful RAI value implementation, practitioners need
to leverage the complex structures within their organizations that
are invisible to external communities. RAI value implementations
occur through strategic discourses and deliberations that translate
knowledge to eective execution. Lastly, in the process of RAI value
deliberations, individuals co-create representations that further the
implementation eorts of RAI.
3 METHODS
To answer our research questions, we conducted a qualitative study
consisting of 23 interviews with active AI/ML practitioners from 10
dierent organizations that engaged in RAI practices. After receiv-
ing internal ethics approval from our organization, we conducted
a three month study (April-June, 2022). In this section, we briey
talk about the recruitment methods and participant details.
3.1
Participants Recruitment and Demographics
To recruit AI/ML practitioners who actively think and apply RAI
values in their day-to-day work, we partnered with a recruitment
agency that had strong ties with dierent types of corporate or-
ganizations working in the AI/ML space. We provided diverse re-
cruitment criteria to the agency based on several factors, including
gender, role in the company, organization size, sector, type of AI/ML
project, and their involvement in dierent kinds of RAI activities.
Using quota sampling technique, the agency advertised and ex-
plained the purpose of our study in diverse avenues, such as social
media, newsletters, mailing lists, and internal forums of dierent
companies. For the participants that responded with interest, the
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
agency arranged a phone call to capture their AI/ML experience,
as well as their experience with dierent RAI values. Based on
the information, we shortlisted and conducted interviews with 23
AI/ML practitioners who t the diverse criteria mentioned above.
The aforementioned factors were used to prioritize diverse partici-
pants with experience working on RAI projects within their team
in dierent capacities. For example, while shortlisting, we excluded
students working on responsible AI projects as part of their in-
ternships and included individuals who were running startup RAI
consultancy rms.
Out of the 23 practitioners, 10 identied themselves as women.
Participants comprised of product-facing roles, such as UX design-
ers, UX researchers, program/product mangers, content & support
executives, model-focused roles, such as engineers, data scientists,
and governance focused-roles, such as policy advisors and audi-
tors. Out of 23 practitioners, all but one participant worked for
a U.S. based organization. However, participants were geograph-
ically based in both Global North and Global South. Participants
also worked in a wide variety of domains, including health, energy,
social media, personal apps, nance and business among other,
lending diversity to the captured experiences. Three participants
worked for independent organizations that focused exclusively
on RAI initiatives and AI governance. twelve participants had a
technical background (e.g., HCI, computer-programming), four had
business background, two had law background and one each spe-
cialized in journalism and ethics. For more details, please refer to
Table 1.
3.2 Procedure
We conducted semi-structured interviews remotely via video calls.
Before the start of the each session, we obtained informed consent
from the participants. We also familiarized participants with the
objective of the study and explicitly mentioned the voluntary na-
ture of the research. The interviews lasted between 40 minutes and
2 hours (avg.= 65 mins.) and were conducted in English. Interviews
were recorded, if participants provided consent. Our interview ques-
tions covered dierent co-production practices. First, in order to
understand dierent co-production challenges (RQ-1), we asked
questions about (1) how practitioners faced challenges when shar-
ing RAI values across roles (e.g., Can you describe a situation when
you encountered problems in sharing your values? ) and (2) how
practitioners faced challenges when collaborating with dierent
stakeholders (e.g., What challenges did you face in your collabo-
ration to arrive at shared common responsible values?”). Second, to
understand dierent co-production strategies (RQ-2) we asked (3)
how practitioners handled conicts (e.g., Can you give an example
where you resisted opposing peers’ values?”) and (4) how practition-
ers sought assistance to achieve the alignment in RAI values (e.g.,
What was the most common strategy you took to resolve the con-
ict?”). To invoke conversations around RAI values, we used a
list of RAI values prepared by Jakesch et al
. [56]
as an anchor to
our conversations. After rst few rounds of interviews, we revised
the interview script to ask newer questions that provided deeper
understanding to our research questions. We stopped our inter-
views once we reached a theoretical saturation within our data. We
compensated participants with a 75$ gift-voucher for participation.
3.3 Data collection and Analysis
Out of 23 participants, only three denied permission to record audio.
We relied on extensive notes for these users. Overall 25.5 hours
of audio-recorded interviews (transcribed verbatim) and several
pages of interview notes were captured. We validated accuracy of
notes with the respective participants. Subsequently, we engaged in
thematic analysis using the NVivo tool. We started the analysis by
undertaking multiple passes of our transcribed data to understand
the breadth of the interviewee’s accounts. During this stage, we
also started creating memos. Subsequently, we conducted open-
coding on the transcribed data while avoiding any preconceived
notions, presupposed codes, or theoretical assumptions, resulting in
72 codes. We nalized our codes through several iterations of merg-
ing the overlapping codes and discarding the duplicate ones. To
establish validity and to reduce bias in our coding process, all the au-
thors were involved in prolonged engagement over multiple weeks.
Important disagreements were resolved through peer-debrieng
[
17
]. The resultant codebook consisted of 54 codes. Example codes
included, ‘social factors’, ‘prior experience’, ‘enablers’, ‘RAI push-
back’. As a nal step, we used an abductive approach [
98
] to further
map, categorize, and structure the codes under appropriate themes.
To achieve this, we used three key instruments of co-production
framework developed by [
58
], namely, making institutions, making
discourses, and making representations. Examples of the resultant
themes based on the co-production instruments included ‘value
ambiguity’, ‘exploration rigidity’, ‘value conicts’, and ‘value lever
strategies’. Based on the instruments of co-production framework,
we have present our resultant ndings in the next section.
4 FINDINGS
Our overall ndings are divided based on dierent sites of explo-
ration proposed by Jasano
[58]
. The rst section answers RQ-1
by exploring several institutional challenges that hinder the co-
production of RAI values among practitioners (Section 4.1). The
second section explores subsequent knock-on challenges that unsta-
ble institutional structures create in co-production discourses (Sec-
tion 4.2). The last section answers the RQ-2 by presenting carefully
thought out representations that overcome challenges deliberation
and execution of RAI values using the concept of value levers [
90
].
(Section 4.3).
4.1 RAI Value Challenges within the
Institutional Structures
Institutional structures are essential in enabling co-production of
new knowledge [
58
]. It is these structures that facilitate relation-
ships for deliberation, standardize democratic methods, and validate
safety of new technological systems before information is dissemi-
nated into the society. We found two key institutional structures
that facilitated deliberation around RAI values within AI/ML com-
panies. These structures brought about dierent RAI challenges.
Bottom-up: Burdened Vigilantes. The First type of structures
were bottom-up. Within these structures, RAI conversations de-
veloped through RAI value sharing in the lower echelons of or-
ganizations, often within AI/ML practitioners’ own teams. In our
interviews, eight practitioners, namely a UX researcher, designer,
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
Participants 23 Gender Men: 13 , Women: 10
Age (years) Min: 30-35; Max: 55-60; Avg: 35-40
Experience (years)
Min: 1; Max: 15.6; Avg: 3.94
Company
type Product: 16, Service: 7 Company Scale Small: 4, Medium: 4, Large: 15
Region Global North: 18, Global South: 5 Roles
Engineer: 5, UX Designer/Researcher: 3, Prod-
uct/Program Manager: 5, Senior Management
(e.g., director): 3, Content & Support: 2, Policy
& governance: 5
ML Type
Supervised: 7, Unsupervised: 12, Reinforce-
ment Learning:9 Deep learning: 13 ML application
Health: 6 , Energy: 1, Social media: 3, Personal
apps: 7, Finance: 2, Business: 5, Work: 2
Table 1: Practitioners’ Demographic Details.
content designer, and program manager from two mid-size organi-
zations and two large-size organizations experienced or initiated
bottom-up practices that engaged with RAI values. One of the en-
ablers for such bottom-up innovation was individuals’ sense of
responsibility towards producing AI/ML models that did not con-
tribute to any harm in society. A few other practitioners paid close
attention to social climate (e.g., LGBTQ month, hate speech inci-
dents) to elicit particular RAI values. For instance, P08, a program
manager in a large-scale technology company took responsibility
for RAI practices in their team but soon started supporting team
members to come together and share RAI values:
We cater to projects that are very self-determined, very
bottom-up aligned with our values and priorities within
the organization . . . These are what I call responsible
innovation vigilantes around the company. I also started
that way but have grown into something more than
that. You’ll see this at the product or research team level,
where somebody will speak up and say, ‘Hey, I want to
be responsible for these RAI values, make this my job
and nd solutions’. So you start to see individuals in
dierent pockets of the company popping up to do RAI
stu.
A key challenge with such bottom-up structures was that the
responsibility of engaging with RAI value conversations implicitly
fell on a few individual vigilantes”. They had to become stalwarts
of particular RAI values and take out substantial time out of their
work to encourage and convince their teams to engage with RAI
values. They also actively seeked out RAI programs available within
and outside their organization. When such RAI programs were not
available, individuals took it upon themselves to create engage-
ment opportunities with other members within the organization.
These bottom-up structures were useful in breaking the norms of
“boundary-work” that are often set within AI and similar technical
organizational work where only engineers and high-ocials in the
company maintain control [
38
]. It allowed non-technical roles such
as user experience researchers, product managers, analysts, and
content designers to a create safe space and lead the RAI eorts.
While such eorts early on in AI/ML lifecycle minimized the po-
tential harm of their ML models or AI products, it often happened
at the cost of their overworked jobs.
Bottom-up: Burdened with Educational Eorts. Apart from
self-motivated vigilantes, the burden of RAI value exploration also
fell on a few practitioners who were implicitly curious about RAI
innovation. Unlike the vigilantes, these participants were pushed
to become the face of their team’s RAI initiatives since there was
no-one else who would. P14, a product manager working for two
years at a medium size company within the energy sector shared,
When I came in to this team, nobody really believed in it
[RAI] or they really didn’t think it [RAI] was important.
I was personally interested so I was reading about some
of these principles . . . When there was an indication of
a future compliance requirement, people didn’t want to
take up this additional work . . . somebody had to do it.
Similarly P05, a technical manager leading their team on data
collection for the development of knowledge graphs, revealed how
they were considered the face of privacy for the team. Therefore,
P05 was expected to foster awareness and common understanding
among internal stakeholders and external partners and ensure they
strove towards similar RAI standards and appreciated data-hygiene
practices (e.g., data cleaning and de-identication). Practitioners
like P14 and P05 had to assume the responsibility of guring out
the RAI space by presenting their team’s needs and asking forma-
tive questions even when their objectives around RAI were often
not clear, such as which values to consider (e.g., privacy or trans-
parency?”), what certain values mean (e.g., what trustworthiness
as an RAI value should mean to the model and the team”), how to
operationalize specic values (e.g., “How does trustworthiness apply
to rule-based models? What kind of RAI values to invoke while col-
lecting data?”, and how to interpret outcomes and map them on to
their team’s objectives.
Participants (n=5) shared how leading such RAI initiatives bur-
dened their professional lives in various ways. Multiple participants
reported that the RAI eld was still in its infancy and taking up
responsibilities in such conditions meant that their eorts were
not deemed a priority or sometimes even ocially recognized as
AI/ML work [
71
]. Consequently, the practitioners possessed lim-
ited understanding of the direction to take to educate their team,
convert their eorts into tangible outcomes, and eectively align
their educational outcomes to the team’s objectives. P13, an RAI
enthusiast and an engineer at a large-scale social media company
shared how their RAI eort seemed like an endless eort, “At this
point, I probably know more about what things (RAI values) we don’t
want in it (model) than what we do want in it . . . It’s like I am just
learning and guring out what’s missing as I take every step . . . It
is unclear which [RAI] direction will benet the the team. More-
over, the burden of educating multiple team members was on the
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
Figure 1: A summary of co-production activities mapped to Jasano
[58]
’s co-production sites, along with the themes, RAI
values invoked, and key ndings and takeaways.
shoulders of a very few practitioners tantamounting to substantial
pressure.
Metcalf et al
. [72]
in their paper around technology ethics put
forward the term ethic owners’. This role shares similarity with
the bottom-up vigilantes and the educators as they both are mo-
tivated and self-aware practitioners, invested in foregrounding
human values by providing awareness and active assistance while
institutionalizing the processes. However, Metcalf’s ethic owners’
responsibilities were clearly dened. Their tasks of working with
teams or higher management were perceived as visible, prioritized
work for which they would be credited for career growth or other-
wise. While bottom-up informal roles in our research performed
similar set of tasks, their eorts were seen as tangential, ‘admin-
istrative’, and underappreciated. It is not just that there was an
additional burden, but even the outcomes of taking on this addi-
tional burden for bottom-up informal roles were dissimilar to the
ethic owners. Taking up informal RAI work was more problematic
when the requirements in the later stages of ML were unprompted,
compelling practitioners to focus on these eorts at expense of
their own work.
In our ndings, one form of the need came as academic criticism
or critique around particular values that were seen concerning a
particular product (e.g., what steps are you taking to ensure that
your model is equitable?”). Another form of need came from end-
users’ behavior who experienced the models through a particular
product. P20, a user experience researcher working with deep learn-
ing models in nance, shared how user feedback brought about
new RAI needs that became their responsibility:
Once the users use our product and we see the feedback,
it makes us realize, oh, people are sometimes using this
feature in an unintended way that might in turn impact
the way we are going about certain values, say trans-
parency’ . . . . Initially we were like, ‘We should strive for
transparency by adding a lot of explanations around
how our model gave a particular output’. Later we real-
ized too many explanations [for transparency] fostered
inappropriate trust over the feature.. . UXR represents
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
user needs so its on me to update the team on the issues
and suggest improvements.
A few practitioners (n=2) also mentioned how the constant jug-
gling between their own role-based work and the unpredictability
of the RAI work pushed them to give-up the RAI responsibilities
all-together.
Top-down: Rigidity in Open-discovery. While the burden of
ownership and execution of RAI values in bottom-up structures
were on small group of individuals, they had the exibility to choose
RAI values that were contextual and mattered to their team’s ML
models or projects. On the contrary, we found that top-down in-
stitutional structures limited the teams’ engagement to key RAI
values that impacted organization’s core business values. For in-
stance, P15’s company had trust as a key value baked into their
business, requiring P15 to focus on RAI values that directly reduced
specic model’s biases, thereby increasing the company’s trust
among their users. Consequently, several RAI practitioners had
to skip RAI value exploration and sharing. Instead they directly
implemented predetermined RAI values by the management just
before the deployment. P06, an engineer at a large tech company
working in conversational analysis models, described this lack of
choice:
To be honest, I imagine lots of the conversations, around
the sort of values that need to go into the model, hap-
pened above my pay grade. By the time the project
landed on my desk to execute, the ethics of it was cleared
and we had specic values that we were implementing.
Public-oriented legal issues and ethical failures, especially when
launching innovative models (e.g., transformer networks), also de-
termined the RAI values that were prioritized and the subsequent
formal RAI structures that were established by the organizations.
P19, a policy director at a RAI consultation rm facilitating such
structures, shared how such impromptu structures were quite com-
mon in response to ever-changing laws around AI governance:
Even if you’re conservative, the current climate is such
that it’s going to be a year or two max from now, where
you will start to have an established, robust regulatory
regime for several of these (RAI) issues. So a good way to
be prepared is to create the [RAI] programs in whatever
capacity that enables companies to comply with the
new regulations, even if they are changing because if
you have companies doing Responsible AI programs, it
eventually gets compliance and executive buy-in.
Instead of investing in RAI structures to comply with dierent
regulations in Tech Ethics such as codes of ethics, statements of
principle, checklists, and ethics training as meaningful, organiza-
tions perceive them as instruments of risk that they have to mitigate
[
72
]. In line with the previous literature [
7
,
44
], our ndings indi-
cate that practitioners often nd false solace in such structures as
they run the risk of being supercial and relatively ineective in
making structures and practices accountable and eective in their
organizations. However, adding nuance to this argument in the case
of RAI practices, we found that practitioners more broadly devoted
time and energy to follow established and prioritized values (e.g.,
trust or privacy) due to directed and concerted focus. It allowed for
organization-wide impact since the “buy-in” already existed [71].
Top-down: Under-developed Centralized Support. However,
in the case of less clearly dened values (e.g., non-malecence or
safety) we observed a limited scope for nuance and despite best
eorts, the centralized concerted direction does not always pan
out as intended. Further, while laws continue to evolve in this
space, participants felt that pre-mediated RAI values might not
longitudinally satisfy the growing complexity of the ML models
being implemented (e.g., multimodal models). Hence, while it might
seem that setting up a centralized top-down approach might be
ecient, the current execution leaves much to be desired. In fact,
based on data from over half the participants, we found that ve top-
down structured companies integrated lesser known RAI values into
their workows in multiple ways without establishing a centralized
workow. Those who did establish centralize workows created
consulting teams to advise on RAI Practices (similar to ethic owners
Metcalf et al. [72]).
However, these top-down centralized RAI consulting teams were
not always set up to succeed. As is the nature of consulting, people
did not always know the point-of-contact or when and how to reach
out. The consulting teams needed to also consider opportunities
to advertise about themselves and engagement mechanisms,which
was dicult due to the lack of context and nuance around the
teams’ projects. Consequently, it was dicult for such teams to
generate organic interest, unless the teams were already aware of
their RAI requirements and knew a point of contact. P10, a manager
who facilitated one such top-down RAI program in a large-scale
technology company for AI/ML teams, described lack of xed ways
in which teams engaged with them on RAI values, making it a
dicult engagement:
We have a bunch of internal Web pages that point
you in all dierent types of directions. We don’t have a
singular voice that the individuals can speak with . . . . Its
currently hodgepodge. Some teams come to us willingly.
They had already thought about some harms that could
occur. They say, ‘Here’s our list of harms, here’s some
ideas on what we want to do’ They’d already done pre-
work and are looking for some feedback. Other teams
come to us because they’ve been told to. . . . They haven’t
thought much about RAI and need longer conversations
. . . Other teams were told to go track down an individual
or team because they are doing ML stu that will require
RAI assistance, but they don’t know about us
4.2 Challenges within RAI value Discourses
Fruitful co-production requires well-established institutional struc-
tures that can empower stakeholders to engage in stable democratic
discourses with an aim of knowledge production [
58
]. In the pre-
vious section, we uncovered dierent structural challenges at the
institutional level that contributed to knock-on eects, creating
further challenges for practitioners during the co-production and
implementation of RAI values.
Discourse: Insucient RAI Knowledge. A key challenge that
many practitioners experienced in co-producing RAI values in team
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
was the diculty in engaging deeply with new and unfamiliar RAI
values deemed important by the team members. P07, a policy advi-
sor in a large technology company, who regularly interacted with
those who implemented RAI values, described the supercial en-
gagement with values as an act of ineective moralizing, wherein
practitioners struggled to develop deeper interpretations of the
team’s shared values and contextualize them in relation to the ML
models they were developing.
P07 mentioned several key critical thinking questions that AI/ML
practitioners did not deliberate within their teams, such as Is this
RAI value applicable to our product?”, how does this value translate
in diverse use cases?”, or should this value be enabled through the
product? The need for deeper engagement becomes particularly
important in a high-stakes situation, such as healthcare, where cer-
tain conditions have an unequal impact on particular demographics.
P12 experienced this complexity while overseeing the development
of a ML model focused on health recommendations:
So a lot of models are geared towards ensuring that we
have we are predictive about a health event and that al-
most always depends on dierent clinical conditions. For
example, certain ethnic groups can have more proclivity
to certain health risks. So, if your model is learning cor-
rectly, it should make positive outcomes for this group
more than the other groups. Now, if you blindly apply
RAI values without thinking deeply about the context, it
might seem that the model is biased against this group
when in reality these group of people are just more likely
to have this condition, which is a correct conclusion, not
a biased one.
Such deeper analysis requires hands-on practice and contextual
training in the eld and formal RAI education. In our ndings, top-
down structures were only eective to ll the gap for key values
that aligned with the company’s vision, leaving a much needed
requirement for contextual, high-quality RAI education for more
emergent RAI values that could be modularized for specic teams.
P02, a content designer for a large health technology company
shared how this gap looked like for their team that was designing
content for a machine translation team,
“One thing that would have been benecial is if I or
my team could somehow get more insights on how to
think about trustworthiness in the context of the con-
tent produced by our machine translation model and
probably evaluate it . . . Often time, I just go to someone
who is known to have done some work in this [RAI] and
say, ‘Hey, we want to design and publish the content
for the model like in a month from now, what is bare
minimum we could do from [RAI] principles point of
view?’. . . Sometimes it’s never-ending because they say I
have not thought about this at all and that it is going to
take a month or maybe much more longer to get these
principles implemented.”
Participants like P02 had no alternative but to reach out to their
bottom-up structures to seek assistance, discuss, and reduce gaps
in their RAI knowledge. On occasions, such avenues of discussion
were non-conclusive. Prior literature in AI/ML and organization
studies have shown how such unequal dependence on bottom-up
structures over top-down in deliberation can contribute to tensions,
and in turn propose an open, federated system linking dierent
actors, resources, and institutions to provide a community based
support [82, 87].
Discourse: Deprioritized Unfamiliar & Abstract Values. Natu-
rally, practitioners tried to solve the supercial engagement prob-
lem by de-prioritizing values that they found unfamiliar. In our
study, most practitioners (n=18) said that they were familiar and
comfortable talking about RAI values like privacy and security as
they were already established and had matured over time”. They
sharply contrasted this perceived familiarity with other RAI val-
ues like explainability and robustness. The familiar values were
well backed with stable top-down structures and dedicated teams,
such as compliance departments and dedicate RAI personnel , mak-
ing it easy for practitioners to develop mental models of deeper
engagement. P20 shared their experience in this regard in their
organization:
The ideal situation would be like, ‘Oh, I have certain
RAI principles that I want to make sure our product
has or addresses’. In reality not all the principles are
thought out the same way and applied in the rst go. It
usually happens in layers. First and foremost, people will
look at privacy because that’s super established, which
means everyone knows about it, they already have done
probably some work around it, so its easy to implement.
And then after that, they’re like, ‘Okay, now let’s look
at fairness or explainability’ . . . We usually have to be
quick with turnaround like one or two months. Its nice
to bring up values that are new but naturally they also
require familiarizing and implementation eort within
the team and people see that
Other practitioners (n=3) also followed a similar de-prioritization
process for RAI values that they felt were abstract and did not have
a measurement baseline (benchmarks) as opposed RAI values that
could be easily measured quantitatively against a baseline. An ex-
ample observed in this category was the contrast between RAI
values like interpretability, which had concrete implementation
techniques and measurements (e.g., LIME) and non-malecence,
which did not have a clear implementation technique or measure-
ments. Similarly, practitioners (n=2) who went out of their way to
understand and suggest new interpretability techniques for model
debugging techniques (e.g., Integrated Gradients, SHAP) found it
disempowering when their team members often negotiated for
easier and computationally cheaper values like accuracy (e.g., P/E
ratio) for implementation.
Discourse: Value Interpretation Tensions. Even in situations,
when dierent practitioners took a similar balanced approach to
prioritization, tensions emerged as dierent roles interpreted and
contextualized the RAI values dierently during the value delib-
erations. We found these tensions occurring within practitioners
when dierent practitioners dened RAI values (e.g., equity) and
mapped them to RAI features and metric (e.g., skin tone) dierently.
P18, a senior data scientist leading an AI team in a non-prot insti-
tute, shared one such similar tension among their team members
working on the same project,
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
Probably half of my colleagues do believe that there is
a cultural, and historical set of RAI values that can be
applied to all the products organization wide. Other half
are vehemently opposed to that concept and say that
[RAI] values are always model and project dependent. So
if you are talking about our long-term goal to establish
a set of RAI principles, whose perspectives should be
considered?. . . This is an uneasy space that needs careful
navigation.
While deliberations might occur between team-members, they
might occur within a practitioner, or between the team and the
end-consumers of the product/service. Latter two usually surfaced
with user-facing roles, e.g., Product Managers or UX Researchers.
These roles have the responsibility to understand, internalize, and
embody end-user values in addition to their own values. Overall,
we found that practitioners in these roles had to invest more eort
to tease out their own values from that of the end-users. P04 was a
user experience researcher working on interfacing a large language
model for natural conversations with users. While P04 was inter-
ested in eliciting better insights from the model’s behavior issues
(i.e. interpretability [
22
]), end-users were interested in a simplied
understanding of the model’s opaque behavior (i.e. comprehensibil-
ity [
22
]). A UX Researcher is, however, expected to be the voice of
the user in the process. Consequently, they had the constant burden
to elicit both sets of values appropriately.
Another set of tensions also occurred between practitioners and
end-users. P22, an analyst in a nancial rm, described how ML
practitioners perceived RAI values to be mutable and negotiable, al-
lowing them to implement a particular RAI value in stages instead of
all at once. Such a process allowed P22 (and three other participants
who reported similar narratives) to build the required experience
and embed the value in the ML model or AI product. However,
end-users expected these embedded RAI values as absolute and
non-negotiable and not on a sliding spectrum because they are
they are often the list of ignored rights”, leading to practitioner-user
RAI tensions.
Our ndings show that tensions that arose from non-uniform
RAI value knowledge and subsequent disparate value interpreta-
tions were unproductive and a signicant obstacle in the overall
co-production process of RAI values. This can be attributed to a
nascent RAI eld that has given rise to new forms of values (e.g., ex-
plainability, interpretability) whose denitions and contexts which
keep changing. This is in contrast with prior value studies in HCI
studies where the tensions and conicts around relatively more
established values (e.g., privacy) do not occur until the implemen-
tation stage [
26
,
35
,
95
]. Our ndings show that the majority of
value tensions occur much earlier in the value interpretation stage,
often contributing to the abandonment of the value discussions
altogether.
Implementation: RAI Values and Conicts within. Implemen-
tation of RAI values was also not a straight forward process as
implementing certain RAI values created conict with other RAI val-
ues. For instance, P1, an engineer working on classication models
in VR environments shared how their decision to improve accuracy
by excluding instances of objects with sensitive cultural meanings
(e.g., objects with LGBTQ references) also had direct repercussions
on the diversity and fairness of the model. Implementing RAI values
also created cascading dependencies on the inclusion of other RAI
values. For instance, P16, a program manager working as an RAI
facilitator for a big tech company, shared the issues team members
experienced around cascading RAI values:
One common issue I see is with teams that are dealing
with model Fairness issues. Most often the solution for
them is to improve their datasets or sometimes even
collect new forms of demographic data to retrain their
model and that opens up another rabbit hole around
privacy that the team now has to navigate through and
ensure that their data adhere to our privacy standards.
More often than not, teams don’t even realize they are
creating a new issue while trying to solve their existing
problem.
Implementation challenges also occurred when organization’
business values were in tension with those of external clients. In
such cases, the team’s commitment to engage with RAI was at odds
with clients’ business priorities. P02, a technical program manager
for a service company that developed ML models for clients in the
energy sector, had a similar issue when their team was building a
model for street light automation. After P02’s team received the
data and started developing the model, they pushed for the value
of safety. However, it was at odds with the company’s value of
eciency,
We should prioritize model optimization in those areas
where there are higher crime rates . . . we don’t want
blackouts, right? . . . Their argument was if there was a
very high crime rate, such areas will also have high rate
of purposefully damaging the lighting infrastructure.
Prioritizing service to such areas will only create high
amounts of backlogs as people will just vandalize it
again. . . . So they just have dierent priorities. After
that, our team just stopped following it up as it went
into the backlog.
P02’s team gave up RAI value deliberation and implementation
altogether after their clients either deprioritized their multiple at-
tempts to make them RAI ready or took an extremely long time to
approve their requests.
Implementation: Unexpected Late-stage Value Changes. An-
other challenge practitioners faced was encountering new RAI
values during late stages of implementation. These values were
not initially shortlisted. Instead, they were brought out later and
sometimes championed by a very vocal practitioner, who felt deeply
about it. Such late-stage RAI values also became a part of the discus-
sion when practitioners in the team uncovered last-moment issues
(e.g., bias) during implementation that signicantly impacted the
model. Several participants (n=3) shared how such late-stage RAI
values decreased the productivity of their overall RAI discourse and
implementation eorts, leading to a negative experience. While
such last-minute changes are not welcomed, P12, an engineer shared
how it also gives an opportunity to the developers to ship a bet-
ter product before any harm might have been done. This tension
between potentially better outcomes and slower implementation
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
was visible in how the implementation eorts and timelines were
impacted [71].
Such values also disrupted a planned implementation by tak-
ing the spotlight and pushing the team in navigating the com-
pany’s non-standardized approvals, thereby signicantly altering
the project timeline. For example, P23, an ML engineer shared how
when they received issues around fairness from other stakehold-
ers, it meant substantial changes to the model from the ground-up,
because most of the time, issues with fairness stem from the data”.
It meant revisiting the data and redoing data collection or further
debugging to remove the issues. Moreover, when new and untested
RAI values assumed prominence (e.g., interpretability), more time
and eort was required from the practitioners during implemen-
tation. RAI facilitators are essential in easing the tension in such
situations by engaging in back-and-forth conversations with the
teams to reduce the eort, streamline the process, and help practi-
tioners appreciate the eventual consequences of implementing the
RAI values.
Implementation: Perceived Misuse of RAI values. Lastly, we
also observed tensions between individual eorts in implementing
RAI values and their organization’s use of such eorts for the overall
business purposes. For instance, P15, research director of a large-
scale technology company overseeing research in large language
models, shared how he was actively supporting a few teams in his
company to co-produce and embed explainability into their models.
However, he also expressed his concern about how companies could
misrepresent such embedded RAI values,
I worry that explainable AI is largely an exercise in
persuasion. ‘This is why you should trust our software’
rather than ‘This is why our software is trustworthy’
. . . I’m not saying everybody who does explainable AI
is doing that kind of propaganda work, but it’s a risk.
Why do we want our AI to be explainable? Well, we’d
like people to accept it and use it . . . Explainability part
is ethically complicated . . . even for explainability for
the practitioners the company wants it to be explainable,
transparent, reliable, all those things as a means to an
end. And the end is ‘please like our model, please buy
our software’
We found two more practitioners who raised similar concerns
with other RAI values, such as privacy and trust. They were con-
cerned that making their product completely responsible could
enable companies to market their products as nearly perfect, lead-
ing to overtrust and over-reliance. These ndings align with the
ethics-washing phenomenon within the tech ethics literature which
argues that companies sometimes invest in ethics teams and in-
frastructure, adopting the language of ethics to minimize external
controversies and supercially engage with the proposed regula-
tions [
44
,
106
]. Practitioners who expressed these sentiments were
quite dissatised with their RAI implementation work as they felt
their actions were merely a “band-aid” solution for the organiza-
tion, instead of meaningfully altering organization’s culture and
practices.
4.3 Representational Strategies to Mitigate RAI
Challenges
In response to the challenges mentioned in the aforementioned
sections, we saw several strategies used by the practitioners to
overcome the limitations in RAI value co-production. To present
the strategies, we use a form of representations called values levers
[
90
], a set of activities that facilitate opportunities to share and
collaborate around values. We show how dierent practitioners
use value levers to build upon current RAI institutional structures
and make their RAI co-production manageable. In an ideal situ-
ation, value levers can also be employed in any situation of RAI
co-production. For example, organizations created several formal
RAI structures for practitioners to facilitate sharing and delibera-
tion of values. These included top-down standardized guidelines,
such as guidebooks (e.g., PAIR [
80
], HAX [
73
]) around established
RAI values, bringing in experts to share their experiences around co-
production (lectures), and enabling shared spaces for co-production.
However, in this section, we will be looking at value levers speci-
cally developed in response to the challenges experienced in RAI
value co-production.
Institutional Value Levers: External Expertise and Certi-
cations to reduce Ambivalence. One of the ways in which or-
ganizations brought stability to their inconsistent top-down RAI
institutional structures was by taking assistance of independent
agencies or professionals who specialized in establishing values
levers that helped streamline their existing structures. One such
value lever was Responsible AI certications that were designed to
bring dierent recognized and informal RAI co-production activi-
ties under one-roof. These programs act as facilitators between the
non-technical and technical workforce by enabling co-production
around RAI values to make them compliant with upcoming regula-
tions. Participants reported that dierent activities were packaged
into the RAI certication program, such as getting buy-in for par-
ticular RAI values, leveraging trusted partners for running impact
assessments, engaging key actors in value discovery and prioriti-
zation, and implementing appropriate RAI methods. P19, policy
director of one such RAI certication organization, shared how
these certications are eective in sectors, such as energy, mining,
and human resources that often have a limited technology work-
force. They described eort of facilitating RAI value conversations
within their client teams as a key part of the certication process:
It is important to have everybody on board for those
[RAI] value conversations. So we try really hard to have
all the dierent teams like internal or external audit,
legal, business, data and AI team come together, brain-
storm, discuss dierent [RAI] issues in specic contexts
and shortlist [the RAI values], even if we just get a little
bit of their time . . . everyone needs to be brought in early
because we conduct a lot of activities likes audit analysis,
bias testing. . . it saves time, addresses several concerns,
and establish streamlined [RAI] processes. . . . For sim-
plicity, we just package all of the dierent activities we
do under RAI certication. . . . Some times few activities
are already being executed by the organization, we just
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
do the job of aligning them in a way that works for the
organization.
Such external expertise and certications can provide an oppor-
tunity for open discovery, bolster existing centralized support, and
identify RAI values that might otherwise be discovered at the last
stages.
Institutional Value Levers: Activities to Distribute RAI bur-
den. We also found several nascent but more focused value levers
in bottom-up institutions focused on distributing the burden expe-
rienced by a few roles more widely within the team. These value
levers provided opportunities for increased participation from stake-
holders, especially in the starting stages by enabling them to bring-
in complementary RAI values into the team. Most commonly used
levers in this context included scenario-based narratives and role
plays, and open-ended activities that engaged practitioners in opin-
ion formation and sharing. Other value levers included conducting a
literature review of specic RAI values and applicable cutting-edge
methods, denitions, and guidelines around them to share and in-
voke feedback from the team. We also observed more experimental
value levers that were geared towards bringing-in complementary
RAI values of external stakeholders (e.g., end-users) into the team.
For example, P18, a data scientist working in a startup, hosted a
panel to capture complementary perspectives around AI explain-
ability. Visibility into how explainability was perceived dierently
by dierent community members, such as NGOs and government,
contributed to a better understanding and alignment within the
team to develop explainable models. In a similar example, P09, an
engineer working on a reinforcement learning model in the context
of healthcare for low resources in India, facilitated eld visits to the
end-user communities. Such exposure helped roles that were pas-
sive in sharing their values as well as roles that were thinking about
new values, such as social justice, in the RAI discourse. Overall,
these value levers (narratives, role-plays, literature reviews, panels,
and eld visits) focused on primarily bottom-up structures, which
helped reduce pressure on specic roles and limit supercial value
engagements.
Discourse Value Levers: Facilitating Disagreements. Moving
our focus to RAI value co-production, we saw user-facing practi-
tioners create explicit opportunities for disagreements and healthy
conicts to tackle the problem of supercial value engagement and
improve the quality of their teams’ deliberations. Disagreements in
co-production phase allowed practitioners like UX researchers and
product managers to think inclusively, capture diverse perspectives
and expert knowledge, and more importantly predict future value
conicts. For example, P04, a UX researcher, created bottom-up
adversarial prioritization framework. In the starting phases of this
framework, the UX researcher pushed team members to go broad
and co-produce values by wearing other practitioner’s hats and
invoking their RAI values. This practice allowed them to bring for-
ward interesting disagreements between dierent roles that were
then resolved and prioritized to achieve a small set of meaningful
RAI values. P04 recalled two of the values that received maximum
disagreements were diversity and inclusion. Wearing complemen-
tary user hats enabled practitioners to familiarize with these values
that were otherwise unfamiliar in their own roles. Other top-down
RAI programs also facilitated similar structures, explicitly providing
narratives that brought out disagreements, e.g.,:
Usually I will write something in the prompts that I
think that the team absolutely needs to hear about but
is controversial and opposing. But what I do is I put it in
the voice of their own team so that it is not coming from
us. It is not us scrutinizing them. That promotes inter-
personal negotiation that pushes individuals to really
defend their values with appropriate reasoning.
According to P19, having such RAI system in place early also allows
companies to judge its ML models benchmarks when compared to
their competition. Leveraging the adversarial prioritization frame-
work appropriately in both top-down and bottom-up structures can
enable open-discovery, and surface the values and related conicts
for resolution.
Discourse Value Levers: Model Cards & Visual Tools to Re-
duce Abstractness from Values. We found that practitioners also
created succinct representations and simplied documentation to
bring much needed clarity to various RAI values and simply as-
sociated models. For instance, engineers shared documentation of
model and data cards, making it easier for non-engineering and
engineering roles to grasp the information. P23, a senior engineer
at an AI startup looking into sustainability, shared the process:
Even we have introduced this concept of a model card,
wherein if a model is developed, the model card has to
be lled out. So what is a model card? A model card is
a series of questions that captures the basic facts about
a model at a model level at an individual model level.
What did you use to build a model? What was the pop-
ulation that was used? What is the scoring population?
It is like, having all of that in a centralized standard
format. Goes a long way to roll it up because the product
can be very complex as well, right? With multiple play-
ers and whatnot. But having that information collected
in this way benets other roles that own the product to
think about dierent values that are missing
UI/UX designers, UX researchers, and analysts also used similar
documentation tools to initiate discussions and receive feedback
from other practitioners in the team. P20, a UX researcher, used
presentation slides containing model features to facilitate brain-
storming sessions and receive feedback from other roles. They
also repurposed tools and methods used in their own work to give
shape to their peers’ abstract values. For example, P20 reused on-
line jam-boards containing key RAI values and user ndings for
anity diagramming, enabling the team to categorize the ndings
and map them to specic RAI values”. Other RAI levers in this cate-
gory included designing and sharing infographics and regular RAI
standups where practitioners took it upon themselves to be stew-
ards of RAI principles for the team to give updates, receive feedback
and learn about team’s perspectives on specic RAI values.
Implementation Value Levers: Oce Hours, User stories, Safe
Spaces to Reduce Tensions.
A few value levers that were part of top-down RAI programs
were also eective in reducing various value tensions that occurred
between dierent practitioners (n=2). One such program was RAI
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
oce hours that was available for elicitation and production, but
were also extremely eective for tension-resolution. A typical of-
ce hour was 30 minutes in which practitioners engaged with a
relevant expert and an experienced facilitator. One key way experts
solved the tensions in these sessions was by collecting and provid-
ing concrete case-study examples. For example, P21, an RAI oce
hour facilitator, shared an example about the use of his oce hours.
The practitioners were at odds with each other in implementing
explainability and trustworthy features. During the oce hours,
P21 responded by sharing an edge case scenario where even good
explanations might backre, such as, “If a pregnant woman had a
miscarriage, showing even good end-user explanations around why
they are seeing infant-related content can be very problematic. Ex-
plainability should be carefully teased out based on the context in
which it is applied.
Another set of value levers used especially by the roles facing
end-users were user stories and scenarios to inuence and persuade
users to change their value priorities and align with rest of the team.
These levers were also used by the roles to converge on key values
after engaging in healthy conicts within the divergence phase. For
example, P04, exposed dierent pain points and key user journeys
by highlighting the clip of a user that is really, really amazing story
that is either very painful or poignant”. Interestingly, P04 was aware
how such value levers had to be evoked carefully,
“If that story is not representative, I’m manipulating
the system. If it is representative, I’m inuencing the
system...I will have to be careful not operate on the side
of manipulation and try to be very squarely on the side
of inuence. So, I do like regular checks for myself to
make sure that I am operating on inuence, not manip-
ulation, in terms of the stories that I am allowing people
to amplify.
Lastly, in order to tackle several types of value conicts in the
co-production of RAI values, we found dierent RAI value levers
that focused on improving value alignment. One key alignment
strategy was to create structures and activities that aligned the
team’s RAI values pretty early in the process. One such activity
that we saw in both practitioner’s and RAI programs was providing
a safe space to encourage open discussions among individuals to
empathize with other members. P09 shared,
One alignment strategy was open discussion with the
safe space, where team members could fail, be called
out and to learn from each other as we were developing
values. So say someone nds the value of democratiza-
tion really important, they are made to articulate what
they mean by it.. . . . It is easy if there are dierent buck-
ets in which they can categorize and explain because
then people can easily surface all the dierent ways
they think and prioritize values and that helps with
alignment
5 DISCUSSION AND FUTURE WORK
Overall, our ndings show that co-production of RAI values in
practice is complicated by institutional structures that either sup-
port top-down decision-making by leadership or are inhabited by
bottom-up practitioners exercising voluntary agency (section 4.1).
In other case, multiple challenges exist when practitioners have
to reconcile within their internally held values and RAI values
expected from their roles; and between themselves and their team-
members. Our ndings also show that discourse around alignment
and prioritization of RAI values can sometimes be unproductive,
non-conclusive, and disempowering when practitioners have to
implement said RAI values (section 4.2). We observed a lack of
transparency, and unequal participation within organizations; and
between organizations and end-users of their products/services
(section 4.2). Despite the relatively complicated lay of the land,
practitioners have been pushing ahead and discovering multiple
strategies on how to make progress (section 4.3). In the subsections
below we will unpack these challenges, strategies and potential
future work across the three sites of co-production: institutions,
discourse, and representations.
5.1 Envisioning balanced Institutions:
Middle-out RAI Structures
Inequity in Burden. According to Jasano
[57]
, strong institutions
provide a stable environment for eective knowledge co-production.
They can also act as safe spaces for nurturing and transforming con-
tested ideas to eective practices leading to long-lasting impact on
the immediate ecosystem. Recent scholarship by Rakova et al
. [82]
has put faith in an aspirational future where organizations would
have deployed strong institutional frameworks for RAI issues. Our
ndings show that as of today, top-down structures are underde-
veloped. Organizations have deployed structures that range from
being reactive to external forces (e.g., compliance, public outcry)
by tracking teams that implement RAI to proactively establishing
structures that make teams RAI-ready (e.g., oce hours). Further-
more, stable workows have been established for a limited number
of values or use-cases, restricting the number of teams that could
leverage such workows.
Being in the midst of restrictive structures, particular practi-
tioner roles embraced the persona of bottom-up vigilantes and
self-delegated themselves to be champions of lesser known RAI
values (e.g., non-malecence and trustworthiness). They initiate
open-ended exploration for value discourses and subsequent value
implementation. However, such bottom-up structures also put bur-
den and occupational stress on selective roles, risking the imple-
mentation success of such RAI projects. In particular, we have found
that roles like UX researchers, designers, product managers, project
managers, ethicists have been taking the brunt of this work. These
ndings build on the previous work [
71
,
82
], highlighting existing
inequity and subsequent individual activism being performed by
some - either by volition or due to lack of alternatives.
Enabling Equal Participation. Going forward, there is a need
for a holistic middle-out approach that seeks a balanced synergy
between top-down and bottom-up structures while balancing for
the challenges that each of the structures provide. For instance,
organizations can work with RAI value stalwarts and champions to
formalize and streamline bottom-up workows, making it a stan-
dard practice for all teams to engage in open-ended exploration of
RAI values. Such a process can enable teams to look beyond loosely
applicable organization-recommended RAI values and shortlist
Examining AI/ML Practitioners’ Challenges CHI ’23, April 23–28, 2023, Hamburg, Germany
Sn.No
Challenges Strategies
1. Complexity of RAI values, unstable centralized structures RAI value certication to streamline ambivalent structures
2. Burden on a few vigilantes who have taken on responsibility
Stress on a few curious practitioners to educate the team
Scenario-based narratives, adversarial role, panels, and eld visits to
distribute RAI burden
3.
Top-down RAI in response to external pressures. Lesser RAI Autonomy
RAI value certication to streamline ambivalent structures
4. Lack of RAI knowledge, supercial engagement
RAI values without business case, benchmarks deprioritized
Model cards & visual tools to provide clarity and initiate discussions
5. Certain RAI values in conict with other values
Organization values in conict with client’s business needs
Late stage value changes, substantial development eort
Practitioner’s RAI values in conict with organization’s business val-
ues
Prioritization framework for facilitating disagreements
Oce hours for conict resolution, user-stories, & safe spaces to
resolve challenges
Table 2: Summary of challenges mapped against strategies.
those values that actually matter and apply to their team. To stan-
dardize the structure, organizations can leverage independent (at)
team/roles that can guide the target team through the team while
giving enough room for exploration.
Organizations can also use a middle-out approach to reduce the
burden and occupational stress on specic roles through several
top-down activities. One such way is to facilitate structures that can
lower the barrier for diverse internal stakeholders to engage in RAI
value co-production, regardless of their proximity to AI products
or ML models. For instance, data-workers and teams/roles that do
internal testing of the models (dogfooding) can contribute to the
RAI value co-production. Same structures can also enable engage-
ment with external stakeholders, such as end-user communities,
policy, experts, and governance agencies in the initial stages of
value co-production. Consequently, practitioners’ chances to fore-
see or anticipate changing requirements could improve, especially
in the later stages of AI/ML lifecycle. Better yet, this could poten-
tially improve not just the “user experience of value discourse, but
also the eciency of implementation - a goal valued by private
companies. This could be a win-win situation for multiple stake-
holders by helping the top-down RAI structures align with business
goals. While, our research uncovered only top-down and bottom-
up structures that were mutually exclusive, other structures might
exist. For example, while we envisage middle-out structures to be
advantageous, future research is needed to operationalize and sim-
ulate such structures; and discover existing implementations. There
might be some challenges uniquely inherent in those structures.
We encourage future researchers to continue this line of enquiry
5.2 Envisioning better Discourses: Enabling
Critical RAI Value Deliberations
Negativity in Deliberations. The ultimate aim of co-production
discourse is to engage in competing epistemological questions.
Jasano calls it interactional co-production [
59
, ch 8] because it
deals with explicitly acknowledging and bringing conicts between
two competing orders: scientic order brought about by techno-
logical innovations and social order brought about by prevalent
sociocultural practices within the community. In our ndings, sev-
eral underlying conicts surfaced between scientic and social
order (section 4.2). In one instance, practitioners had to choose
between socially impactful but lesser explored RAI values (social
order) and lesser applicable but established values with measurable
scientic benchmark (scientic order). In another instance of ten-
sion an RAI value occupied competing social spaces (e.g., equity).
The underlying issue was not the conicts itself but lack of sys-
tematic structures that enabled positive conict resolution around
the conict. Such implicit conicts often met with deprioritization
and conversations ended unresolved. There is an urgent need to
transform such implicit conicts to explicit positive deliberations.
Organizations aiming for successful RAI co-production need to
be more reexive [
59
, ch 9][
111
], mobilize resources to create safe
spaces and encourage explicit disagreements among practition-
ers positively, enable them to constantly question RAI values or
co-production procedures that help embed them. While we saw
some instances of these explicit practices in the form of value lever
strategies, such instances were sporadic and localized to very few
teams.
Acknowledging Dierences Safely. Our ndings around chal-
lenges within RAI value discourse also showcase the politically
charged space that RAI values occupy in an organization. In their
critical piece around implementation of values in organization,
Borning and Muller
[12]
bring out a few value characteristics that
are applicable to our study. First, values are not universal. Values are
recognized, prioritized, and embedded dierently based on a bunch
of factors, such as practitioner’s role, organizational priorities and
business motivations which make RAI as a complex space. In our
ndings, roles prioritized those RAI values that were incentivized
by the organizations and the AI/ML community through compu-
tational benchmarks focused on the RAI value outcomes. This is
problematic as the RAI values that have computational benchmarks
and have implicit organizational incentives might not map onto
the RAI issues that are pertinent in the community. One way to
solve this mismatch is by rethinking the denition of benchmarks
from value outcome to value processes taken up by dierent roles
(or teams) [
33
]. For example, organizations can encourage teams to
document their co-production journeys around lesser known RAI
values that can act as RAI benchmarks.
CHI ’23, April 23–28, 2023, Hamburg, Germany Varanasi et al.
The second issue that Borning and Muller
[12]
bring out is whose
value interpretations should be prioritized and considered? In our
ndings, tensions emerged as same values were interpreted and
prioritized dierently by dierent stakeholders. End-users viewed
RAI values as immutable and uncompromisable, whereas practi-
tioners viewed them as exible and iterable. Similar tensions were
also