ArticlePDF Available

To Engage or Not to Engage with AI for Critical Judgments: How Professionals Deal with Opacity When Using AI for Medical Diagnosis

Authors:

Abstract and Figures

Artificial intelligence (AI) technologies promise to transform how professionals conduct knowledge work by augmenting their capabilities for making professional judgments. We know little, however, about how human-AI augmentation takes place in practice. Yet, gaining this understanding is particularly important when professionals use AI tools to form judgments on critical decisions. We conducted an in-depth field study in a major U.S. hospital where AI tools were used in three departments by diagnostic radiologists making breast cancer, lung cancer, and bone age determinations. The study illustrates the hindering effects of opacity that professionals experienced when using AI tools and explores how these professionals grappled with it in practice. In all three departments, this opacity resulted in professionals experiencing increased uncertainty because AI tool results often diverged from their initial judgment without providing underlying reasoning. Only in one department (of the three) did professionals consistently incorporate AI results into their final judgments, achieving what we call engaged augmentation. These professionals invested in AI interrogation practices—practices enacted by human experts to relate their own knowledge claims to AI knowledge claims. Professionals in the other two departments did not enact such practices and did not incorporate AI inputs into their final decisions, which we call unengaged “augmentation.” Our study unpacks the challenges involved in augmenting professional judgment with powerful, yet opaque, technologies and contributes to literature on AI adoption in knowledge work.
Content may be subject to copyright.
This article was downloaded by: [199.111.241.60] On: 11 January 2022, At: 05:33
Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA
Organization Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org
To Engage or Not to Engage with AI for Critical
Judgments: How Professionals Deal with Opacity When
Using AI for Medical Diagnosis
Sarah Lebovitz, Hila Lifshitz-Assaf, Natalia Levina
To cite this article:
Sarah Lebovitz, Hila Lifshitz-Assaf, Natalia Levina (2022) To Engage or Not to Engage with AI for Critical Judgments: How
Professionals Deal with Opacity When Using AI for Medical Diagnosis. Organization Science
Published online in Articles in Advance 10 Jan 2022
. https://doi.org/10.1287/orsc.2021.1549
Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-
Conditions
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
Copyright © 2022, INFORMS
Please scroll down for article—it is on subsequent pages
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
To Engage or Not to Engage with AI for Critical Judgments:
How Professionals Deal with Opacity When Using AI for
Medical Diagnosis
Sarah Lebovitz,
a
Hila Lifshitz-Assaf,
b
Natalia Levina
b
a
McIntire School of Commerce, University of Virginia, Charlottesville, Virginia 22903;
b
Stern School of Business, New York University,
New York, New York 10012
Contact: sl5xv@comm.virginia.edu,https://orcid.org/0000-0003-4853-3220 (SL); hlassaf@stern.nyu.edu,
https://orcid.org/0000-0002-3461-003X (HL-A); nlevina@stern.nyu.edu (NL)
Received: January 15, 2020
Revised: December 29, 2020; June 19, 2021;
September 13, 2021
Accepted: October 2, 2021
Published Online in Articles in Advance:
https://doi.org/10.1287/orsc.2021.1549
Copyright: © 2022 INFORMS
Abstract. Articial intelligence (AI) technologies promise to transform how professionals
conduct knowledge work by augmenting their capabilities for making professional judg-
ments. We know little, however, about how human-AI augmentation takes place in prac-
tice. Yet, gaining this understanding is particularly important when professionals use AI
tools to form judgments on critical decisions. We conducted an in-depth eld study in a
major U.S. hospital where AI tools were used in three departments by diagnostic radiolog-
ists making breast cancer, lung cancer, and bone age determinations. The study illustrates
the hindering effects of opacity that professionals experienced when using AI tools and ex-
plores how these professionals grappled with it in practice. In all three departments, this
opacity resulted in professionals experiencing increased uncertainty because AI tool results
often diverged from their initial judgment without providing underlying reasoning. Only
in one department (of the three) did professionals consistently incorporate AI results into
their nal judgments, achieving what we call engaged augmentation. These professionals in-
vested in AI interrogation practicespractices enacted by human experts to relate their own
knowledge claims to AI knowledge claims. Professionals in the other two departments did
not enact such practices and did not incorporate AI inputs into their nal decisions, which
we call unengaged augmentation.Our study unpacks the challenges involved in augment-
ing professional judgment with powerful, yet opaque, technologies and contributes to liter-
atureonAIadoptioninknowledgework.
History: This paper has been accepted for the Organization Science Special Issue on Emerging Technolo-
gies and Organizing.
Keywords:articial intelligence opacity explainability transparency augmentation technology adoption and use uncertainty
innovation professional judgment expertise decision making medical diagnosis
Introduction
Articial intelligence (AI) technologies are edging
closer to human capabilities and are often positioned
as a revolutionary resource promising continuous im-
provements in problem-solving, perception, and rea-
soning (Rai et al. 2019). These technologies are seen as
enablers of a fundamental organizational transforma-
tion (Faraj et al. 2018, von Krogh 2018, Kellogg et al.
2019), especially when it comes to professional work
(Barley et al. 2017, Erickson et al. 2018). Heated de-
bates are emerging around whether, over time, AI
technologies are more likely to automateprofes-
sional work on certain tasks by fully replacing human
input or to augmentit by keeping human experts in
the loop (e.g., Brynjolfsson and Mitchell 2017, Kellogg
et al. 2019, Seamans and Furman 2019). Private and
public organizations increasingly opt for human-AI
augmentation, assuming it will generate value through
the synergistic integration of the diverse expertise that
AI and experts each offer. In this paper, we study how
human-AI augmentation for critical decisions unfolds
in practice by closely investigating how professionals
use AI tools to form three different medical diagnosis
judgments.
Human-AI augmentation is increasingly depicted as
human-AI collaboration(e.g., Wilson and Daugherty
2018, Puranam 2021, Raisch and Krakowski 2021), em-
phasizing the need to integrate potentially divergent
viewpoints. Drawing on the organizational literature
on collaboration, we know that such integration in-
volves transforming knowledgea process that re-
quires both understanding the meaning behind others
1
ORGANIZATION SCIENCE
Articles in Advance, pp. 123
ISSN 1047-7039 (print), ISSN 1526-5455 (online)
http://pubsonline.informs.org/journal/orsc
January 10, 2022
inputs and being willing to change ones initial position
(Carlile 2004,Maguireetal.2004,Hardyetal.2005,Lev-
ina 2005). It is well known that achieving effective
collaboration in knowledge work is difcult as experts
cannot always explain their reasoning because of the
tacit nature of knowledge (Polanyi 1958,1966), and their
collaborators may not be willing to listen to unfamiliar
viewpoints (Carlile 2004,Maguireetal.2004,Levina
2005).
The problems of establishing an understanding
across diverse bases of expertise and being open to al-
ternative viewpoints are exacerbated in situations when
the reasoning behind them is inaccessible. This is partic-
ularly likely to occur when humans face a divergent
viewpoint expressed by an AI toolthe so-called
opaque AIproblem. Modern AI tools, such as deep
learning algorithms, often appear as black boxesto
users because it may be very difcult or even impossi-
ble to examine how the algorithm arrived at a particular
output (Pasquale 2015,Christin2020,Diakopoulos
2020). Although experiencing opacity and using black
boxtechnologies (e.g., cars or computers) is ubiquitous
(Anthony 2021), problems arise when there is a need to
integrate diverse knowledge claims into a single deci-
sion that a human expert can stand behind. This is the
case for many scenarios of AI use for critical decisions,
such as in medicine, human resource management, and
criminal justice, where opacity associated with AI use
is particularly problematic (Waardenburg et al. 2018,
Christin 2020, Van Den Broek et al. 2021).
In professional collaboration, human experts inte-
grate diverse knowledge by developing joint practices
based on shared interests and common understand-
ings (Bechky 2003b). This enables them to engage in
dialogue, at least partially uncovering one anothers
reasoning in order to arrive at a joint decision. What
would it take for human experts to be able to trans-
form their knowledge based on inputs from black box
machines? We set out to explore how experts using AI
tools are dealing with opacity and considering wheth-
er to alter their initial knowledge claims based on the
AI input.
Following a rich tradition of organizational stud-
ies investigating technology in work practices (e.g.,
Orlikowski 1992, Leonardi and Bailey 2008, Barrett
et al. 2012, Mazmanian et al. 2013, Lifshitz-Assaf
2018), we conducted an ethnographic eld study
within a major tertiary hospital in the United States
that is using AI technologies for diagnostic radio-
logy. Medical diagnosis in general and diagnostic
radiology in particular represent some of the pre-
mier examples of professional work that is expected
to undergo dramatic transformation as AI techno-
logies continue advancing.
1
We investigate radio-
logistsuse of AI tools for diagnostic processes in
three different departments, focusing on their work
practices in diagnosing lung cancer, breast cancer,
and bone age.
We show how radiologists invested their efforts
into reducing uncertainty when forming their diag-
nosis judgments and how the opacity they experi-
enced when using AI tools initially increased this
uncertainty in all three settings. Of the three depart-
ments we studied, only in one (when diagnosing
lung cancer) were the professionals able to use AI re-
sults to enhance their own expertisethe stated goal
of the human-AI augmentation. This was a case of
what we call engaged augmentation, where professio-
nals were regularly integrating the AI knowledge
claims with their own. They were able to relate AI
results to their initial judgment and reconcile diver-
gent knowledge claims by enacting AI interroga-
tion practices,which required a signicant resource
investment on behalf of the professionals who were
already highly overextended in their daily work. In
the other two departments (when diagnosing breast
cancer and bone age), professionals enacted what we
call unengaged augmentation,where they were
either regularly ignoring AIs input or accepting it
without much reection. Our study contributes to
the nascent understanding of human-AI augmenta-
tion practices by unpacking how humans experience
and deal with opacity when using AI tools.
Background Literature
Augmenting Professional Expertise with AI
Two scenarios of AI use, either through automation or
augmentation, are increasingly debated across academic,
practitioner, and policy communities (e.g., Brynjolfsson
and Mitchell 2017, Benbya et al. 2021,Cremerand
Kasparov 2021, Raisch and Krakowski 2021). In this
study, we concentrated on the augmentation scenario,
which the literature largely equates with humaninthe
loopAI use, whereby human experts and AI technolo-
gies work together to accomplish a task. The word aug-
mentation is dened as a process of enlargement or
making something grander or more superior. Indeed,
scholars describe human-AI augmentation as an expan-
sion of expertise or knowledge where humans and ma-
chines combine their complementary strengthsand
are multiplying their capabilities(Raisch and Krakow-
ski 2021,p.193).Throughthisexpansionofexpertise,
human-AI augmentation is expected to positively im-
pact organizations through superior performance or im-
proved efciency (e.g., Brynjolfsson and McAfee 2014,
Davenport and Kirby 2016,DaughertyandWilson
2018).
Embracing the vision of multiplying diverse
expertise, many scholars describe human-AI aug-
mentation as humans and machines collaborating
together (e.g., Wilson and Daugherty 2018,Boyaci
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
2Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
et al. 2020,Khadpeetal.2020, Gao et al. 2021,
Puranam 2021). Prior organizational literature on ef-
fective collaboration among diverse human experts
shows how experts learn ways of working together
to leverage and combine the complementary capabil-
ities (Maguire et al. 2004,Hardyetal.2005). Effective
collaboration in knowledge work involves trans-
forming and integrating knowledge through a pro-
cess of relating the knowledge of others to ones
own knowledge (Carlile 2004,Levina2005,Levina
and Vaast 2005). This requires collaborators to be
willing and able to understand the meaning behind
othersinputaswellastopotentiallychangeones
knowledge claims (Carlile 2004,Levina2005). A col-
laboration that effectively integrates divergent
knowledge results in individuals not only adding
tobut also, challengingone anothersinput,
which is distinguished from merely ignoringinput
without reection (Levina 2005). Extending this liter-
ature to AI use, the expectation is that human ex-
perts collaboratingwith AI tools are transforming
their knowledge by integrating AI results in a way
that potentially challenges an expertsinitialjudg-
ment. Indeed, Raisch and Krakowski (2021,p.202)
assert this expectation when describing augmenta-
tion as a tight coupling of human experts and ma-
chines inuencing one another, wherein machine
outputs are used to challenge humans, and human
inputs to challenge machines.
Transforming knowledge is challenging when
collaborators are unable to interrogate the others
knowledge claims. Human experts develop collabo-
ration practices based on their shared interests
and common understandings that allow them to de-
liberate each others knowledge claims (Carlile 2004,
Maguire et al. 2004,Levina2005), despite their inabil-
ity to fully explicate their reasoning (Polanyi 1958,
1966). Although we have been investigating how
knowledge workers deal with tacit knowledge over
the last three decades of organizational scholarship
(e.g., Kogut and Zander 1992), we know relatively
little about dealing with the opacity of modern
technologies.
Opacity and AI Technologies
Issues of opacity, or the antithesis of transparency,
associated with organizational adoption of modern
technologies have increasingly been a topic of dis-
cussion and concern in many research and practi-
tioner communities (e.g., Zuboff 2015,Turco2016,
Albu and Flyverbom 2019, Leonardi and Treem
2020). Opacity refers to the difculty to understand
the reasoning behind a given outcome when such
reasoning is obscured or hidden from view (Stohl
et al. 2016). Although initially, researchers argued
that the use of information technology will lead to
increased transparencyas more information about
activities and decision making was captured digital-
ly and could potentially be accessed and examined
by third partiesrecent writings have pointed out
the fallacy of this thinking (Hansen and Flyverbom
2015,Stohletal.2016, Leonardi and Treem 2020).
Studying social media platforms as an example,
Stohl et al. (2016,p.125)identifyatransparency
paradox,arguing that, although increased use of
information technology may increase how visible in-
formation may be, in certain cases, it may actually
reduce transparency. This line of argument may be
extended to the adoption and use of modern AI
tools. Today, such tools are developed with the
aim of transforming the glut of big datainto a di-
gestible piece of highly relevant informationthe al-
gorithmic output. Today, these outputs are often
presented to users with minimal transparency into
how the AI tool generated them. Yet, because of con-
straints of limited time and bounded rationality,
even if all the data and logic underlying an algorith-
mic output became accessible, transparency may still
not be likely (Leonardi and Treem 2020).
The concept of opacity has gained prominence in the
context of organizational adoption of AI tools (e.g.,
Burrell 2016, Faraj et al. 2018,Christin2020), especially
those tools that use deep learning methodologies.
These methods often rely on numerous algorithms cal-
culating weighted probabilities that are transferred and
transformed through complex multilayered networks
before a given output is generated for users. AI tools
using such methods are often referred to as black
boxesbecause they may generate unexpected or sur-
prising outputs that end users and even AI developers
are unable to explain or understand (Pasquale 2015,
Dourish 2016,Diakopoulos2020). In the current litera-
ture, opacity of AI tools typically describes the lack of
explanations provided as to why a specicdecision
was made that are understandable to users, even when
they have little technical knowledge(Glikson and
Woolley 2020,p.631).Thisfocusesonenactedmo-
ments of AI use, whereby individuals lack the practical
ability to know the reasoning behind aspecic AI result
presented to them, which is distinct from how individ-
uals may lack the ability to evaluate a particular AI tool
when examining its technical methodology, training
and validation data, and performance measures (Lebo-
vitz et al. 2021).
Although the goal of achieving transparency in AI
tools seems more necessary than everas more and
more critical judgments are involving AI toolsthe
ability to achieve this goal seems more elusive than
ever. Scholars, including some computer scientists,
are now discussing AIsfundamental opacity,argu-
ing that transparency may be technically infeasible
(e.g., Ananny and Crawford 2016, Xu et al. 2019).
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 3
Supporters of this view argue that, given the growing
complexity of methods and input data sets, there
may be something, in the end, impenetrable about al-
gorithms(Gillespie 2014, p. 192). Some scholars go so
far as to say that achieving transparency in the use of
AI is so difcult that it may be necessary to avoid us-
ing machine learning algorithms in certain critical domains
of application(Burrell 2016, p. 9, emphasis added).
Not only computer scientists but also, scholars from
awiderangeofelds including law, ethics, political
science, information sciences, and management are
arguing that using AI for judgments with serious
individual or societal consequences may be prob-
lematic. This challenge has led to the creation of
multidisciplinary research communities focused on
issues of transparency, ethics, and fairness in tech-
nology (e.g., Caplan et al. 2018,Crawfordetal.
2019). The research in this community broadly cov-
ers three areas.
The rst area explores how the design of algorithmic
models can be more transparent to help address issues
of fairness and social justice (e.g., Barocas et al. 2020,Bird
et al. 2020,Kauretal.2020). For instance, some scholars
in this area are focused on developing models that can
show how unjust outcomes produced by machine learn-
ing models are highly related to bias that exists in the
training data. Despite this communitysprogressusing
advanced computational methods to improve transpar-
ency toward fairness and equality (e.g., Hooker et al.
2019, Samek et al. 2019,Fern
´
andez-Lor´
ıa et al. 2020),
most AI tools (and the potential impact of their results
on such issues) are still perceived as largely opaque by
their users. This is in part because of the growing com-
putational complexity of deep learning models and the
curse of dimensionalitywhen attempting to assert
what features from massive sets of input data are yield-
ing specicpredictions(Domingos2015).
The second area of research explores the relationship
between algorithmic transparency and professional ac-
countability (Pasquale 2015,Diakopoulos2020). This
work is based on the reasoning that a system can be
better governed if its inner workings are more transpar-
ent and known (Ananny and Crawford 2016). This is
critical because introducing AI tools into a professional
work setting may transform existing distributions of
responsibility and accountability without providing the
ability to view or understand the underlying logic (e.g.,
Scott and Orlikowski 2012, Ananny and Crawford
2016, Caplan et al. 2018). Related questions are also be-
ing raised about the impact of opacity on new forms of
algorithmic management and control, as workers are
often unaware of how algorithms are directing and
evaluating their work (Kellogg et al. 2019,Watkins
2020).
The third area of research focuses on classifying and
characterizing the types and sources of transparency
and opacity associated with AI systems. Some work in
this area has focused on distinguishing, for example,
between the transparency of a systems training data
sets and the transparency about the specic features
and weights that led an algorithm to a given outcome
(e.g., von Krogh 2018, Diakopoulos 2020). Another
area within this topic has investigated the reasons be-
hind opacity of AI tools, such as intentional organiza-
tional or managerial secrecy, technical complexity of
the tools, and structural factors that preexisted the
AI system, among other reasons (Burrell 2016,Christin
2020).
Today, despite the enduring challenges of opacity,
AI tools are increasingly being implemented in con-
texts where professionals are expected to integrate
their own knowledge with AI results when forming
judgments in critical contexts (Nunn 2018, Razorthink
Inc. 2019). Prior research has shown knowledge work-
ers attempting to examine the underlying logic as
they encounter new technologies, such as digital sim-
ulation technology in manufacturing (Bailey et al.
2012) and engineering (Dodgson et al. 2007). Howev-
er, in modern contexts of human-AI augmentation,
professionals are expected to collaborateand trans-
form knowledge without the practical ability to exam-
ine or evaluate AI knowledge claims. Thus, our study
focuses on the following question. How do professio-
nals experience and deal with opacity when using AI
tools to form critical judgments?
Investigating Opacity of AI-in-Use Through
Sociomaterial Practices of Knowledge Work
To investigate this question, we focus theoretically
on the sociomaterial practices of knowledge work
that AI tools are involved in. We adopt a relational
ontology that assumes the entangled nature of actors
and materials and foregrounds the performativity of
practices (Barad 2003,Suchman2007). This perspec-
tive emphasizes the way in which technologies and
actors are inseparable and continually (re-)produce
one another through practices situated within partic-
ular social and historical contexts (Orlikowski 2007,
Suchman 2007, Orlikowski and Scott 2008, Leonardi
2011). This lens has been used to uncover important
insights when studying organizational uses and im-
pacts of other technologies, such as enterprise inte-
gration platforms (Wagner et al. 2010,2011), social
media tools (Scott and Orlikowski 2014), online com-
munityplatforms(Barrettetal.2016), and robotic
tools (Barrett et al. 2012, Beane and Orlikowski
2015). We follow the argument of Suchman (2007,
p. 1) to shift from categorical debates,in our case,
around AI and opacity, to empirical investigations
of concrete practices,in which individuals and
technologies act together.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
4Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
Adopting this view means focusing on the genera-
tive materiality of technical infrastructures and treat-
ing the technologies-in-use (AI and otherwise) as part
of the sociomaterial conguration (Mazmanian et al.
2014, Scott and Orlikowski 2014, Barrett et al. 2016). In
particular, focusing on situated congurations empha-
sizes that individualsunderstandings about a given
technology vary across local meaning systems (Pinch
and Bijker 1987,Mol2003, Leonardi and Barley 2010).
This means leaving opacity to be realized in practice
depending on the actors situatedness(Haraway
1988). Therefore, instead of conceptualizing opacity
as inherent or xed features of AI tools, we view opac-
ity as something produced and enacted through prac-
tices situated in specic organizational congurations
(Orlikowski 2000, Leonardi 2011). Using this lens, we
set out to examine how opacity of AI-in-use is experi-
enced and dealt with when professionals use AI when
forming judgments.
Methods
Research Setting
We conducted an in-depth eld study within three
different departments in a large diagnostic radiology
organization at Urbanside, a teaching hospital in a
major U.S. city. Diagnostic radiology is a specialized
medical eld in which medical imaging is analyzed to
diagnose and treat diseases, and it has been at the
forefront of adopting cutting-edge technologies (AI
and non-AI) for decades (e.g., Barley 1986). Recently,
a great debate has been unfolding as to the impact of
AI tools on professionals in this eld and how AI may
entirely replace the radiology profession (Mukherjee
2017, Recht and Bryan 2017, Grady 2019). We de-
signed our study, following the tradition of eld stud-
ies of technologies, work, and organizations (Barley
1990, Orlikowski 2000, Lifshitz-Assaf 2018, Bechky
2020), to investigate three radiology departments
within the same organization and enable us to deepen
our investigation of professionalswork with AI tools.
Data Collection
Starting in late 2018, we immersed ourselves in the
eld of diagnostic radiology, attending professional
conferences, symposia, and vendor events, to under-
stand the opportunities and challenges on the profes-
sional elds horizon. Ethnographic eld work began
in January of 2019 and studied 40 radiologists (li-
censed doctors or senior fellows offered positions
upon completing their fellowship) across three de-
partments actively using AI tools: breast imaging,
chest imaging, and pediatric imaging.
Observation. The primary source of data for this
study is ten months (over 500 hours) of ethnographic
observation (Van Maanen 1988). We documented over
1,000 cases of radiologists forming diagnoses in de-
tailed written observational notes, which were tran-
scribed and supplemented upon leaving Urbanside
facilities each day. Because Urbanside radiologists
trained medical students and residents, we often
captured radiologists verbally articulating their diag-
nostic reasoning, drawing on past experiences and re-
search, describing common errors and strategies to
avoid them, and so forth. Radiologists often quizzed
trainees about important diagnostic practices and
philosophies (e.g., What might hypoination indicate
in a newborn?or What might indicate stroke on
MRI?) and then offering their own thoughts. During
periods of observation, we paid close attention to the
technologies-in-use, capturing the role of the tools in
the diagnostic process, the results they produced,
what meanings emerged around the tools, and so
forth. Over the course of our eld work, we observed
diagnostic cases involving and not involving AI tools.
Observing cases not involving AI tools strengthened
our understanding of radiologistsanalytical practices.
Even for diagnosis scenarios typically involving AI
tools, we also observed cases of radiologists not using
the tools, such as during technical outages or when
working for satellite locations with different technical
infrastructures.
Interviews. Observational data were enriched through
33 semistructured interviews (Spradley 1979). Twenty-
one informal interviews took place as radiologists con-
ducted their work or during short breaks, covering
questions about unclear aspects of diagnoses for recent
patient cases, interactions with their colleagues or pa-
tients, or specic moments of using or not using vari-
ous technologies. Twelve formal interviews allowed
us to deepen our understanding of what it means to
be a radiologist, how they go about their diagnostic
work, their perceptions of various technologies, and
so forth. All formal interviews and some informal in-
terviews were recorded (with informantsconsent)
and transcribed.
Documentation and Artifacts. Finally, we collected
documentation and artifact data, which served multi-
ple purposes in our study. First, we captured artifacts
produced and used by radiologists in their daily
work, including medical notes and photographs or
drawings of medical images they were referencing.
These materials supplemented observational notes
and strengthened our analysis when reconstructing
their diagnosis process. Next, we collected technical
research papers, regulatory lings, and vendor docu-
mentation to study the three focal AI tools and the
nature of their outputs. In the United States, after reg-
ulators approve a clinical AI system, it can no longer
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 5
change (or actively learn). Vendors can request ad-
ditional approval for updated software versions,
which can then be deployed in clinical settings. Thus,
we observed the use of unchanging technologies
throughout our study.
Data Analysis
In keeping with the principles of grounded theory de-
velopment, we engaged in iterative rounds of data
analysis during and throughout our data collection
(Glaser and Strauss 1967, Charmaz 2014). In the early
stages, we conducted open coding to capture a broad
range of emerging themes. Within the rst few
months, the prominence of radiologists expressing
doubt, asking questions, double checking, and con-
ducting deep analytical practices was striking in the
data. We were also struck by the frequency of ques-
tions and confusion surrounding the AI results radiol-
ogists viewed. We, therefore, conducted targeted
rounds of data collection and analysis to deepen our
understanding of these themes.
Although all radiologists appeared to be using
the AI tools (clicking to display its results after form-
ing their initial judgment), we noticed different pat-
terns in the degree that AI results were inuencing
radiologistsnal judgments (e.g., pausing to con-
sider AI results,”“updating original diagnosis,or
quickly disregarding AI results). In all three de-
partments, the AI results and the radiologistsopin-
ions often diverged, and confusion and frustration
often followed. Deeper analysis led us to relate their
frustration to the lack of understanding of why a
given AI result was produced (e.g., questioning
what the AI is looking ator guessing factors be-
hind AI output). When we investigated the three AI
tools and the nature of their output, we found many
similarities; each reported high-performance met-
rics, used neural network classication methods,
and offered no explanation of its results to users.
Yet, despite similarities in the tools and radiologists
consistent frustration, only radiologists diagnosing
lung cancer were regularly incorporating AI results,
whereas the other radiologists mostly ignored the
toolsresults.
Next, we set out to understand what was behind
these divergent patterns. We mapped step by step
how radiologists formed each different type of diagno-
sis and analyzed their process along multiple dimen-
sions, such as what aspects of the diagnosis prompted
doubt, how evidence was analyzed, perceptions of
the AI tool and its results, and so forth. We studied ra-
diologistssimilarities and differences among the diag-
nostic settings and their analytical practices and saw
noteworthy differences in the materialities of the imag-
ing technologies-in-use (computed tomography (CT)
scans, mammography, and x-ray) and the breadth and
depth of analysis that were afforded. Iterating with the
literature on professional adoption of technology led
us to analyze how senior and junior radiologists used
the tools similarly and how all radiologists held simi-
lar attitudes about AI adoption. Further analysis led us
to focus on a key difference in how radiologists inte-
grated the AI result (or not) using what is called AI
interrogation practices.We continued to sharpen our
analysis by consulting literatures on epistemic uncer-
tainty and opacity, which further enhanced our formal
theory development; we describe this in the following
section.
Findings
Diagnosing patients is a critical process that requires
the extensive expertise, training, and judgment of di-
agnostic radiology professionals. Radiologists devel-
op deep expertise in diagnosis through at least six
years of intense, immersive education after medical
school. In their daily work, they strive to provide the
best possible care to their patients and take their role
in patientshealth outcomes very seriously. They
work under resource constraints and time pressure,
as healthcare facilities respond to intense pressures
to increase patient volumes and reduce costs. In re-
cent years, powerful diagnostic AI tools have cap-
tured the attention of radiologists and healthcare
leadership. We present how radiologists in three de-
partments at Urbanside worked with AI tools to pro-
vide three critical types of medical diagnoses.
Producing Lung Cancer Diagnoses Using
AI Tools
Diagnosing lung cancer was a key focus of Urban-
side radiologists specializing in chest imaging. Like
others across the eld, these radiologists were com-
mitted to producing the most accurate diagnoses
possible and positively impacting patientstreat-
ment and health outcomes. As in other Urbanside
departments, they faced high workload demands
and felt strong pressure to work quickly. At the
same time, they provided thorough analyses, requir-
ing intense concentration and careful deliberation.
When diagnosing lung cancer, radiologists faced the
challenging task of identifying difcult to detect
lung nodulesand characterizing their likelihood
of malignancy. Radiologists were deeply aware of
the signicant consequences of their diagnoses, both
the cost of falsely diagnosing a healthy patient and
thecostofmissingsignsofcancer,andworkedwith
high diligence.
Forming Critical Judgments (Without AI): Experienc-
ing High Uncertainty. While forming a lung cancer di-
agnosis, radiologists experienced three main sources
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
6Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
of uncertainty when reviewing their primary source
of evidence, CT imaging: multiple series of high-
resolution images (in ve- and one-millimeter (mm)
slices) that digitally reconstructed three-dimensional
cross-sections of a patients upper body and supported
numerous settings and projections (e.g., from side or
overhead views with varying degrees of contrast).
First, they experienced great uncertainty while dis-
cerning lung nodulesfrom the healthy lung tissue.
This involved searching for small white-appearing
circles within the varying shades of white to dark gray
lung tissue on the CT images. However, hundreds of
small white circular areas may be visible on a given CT
that represent normal tissue or bone (see Figure A.1),
and radiologists often wavered considerably while de-
ciding whether a particular area was a nodule or not.
One afternoon, a physician called Dr. Es phone, re-
questing her opinion about a potential nodule on her
patients CT. After several moments of searching and
deliberating over the phone, Dr. E asked the physician,
Do you mind if I look more closely and gure it out
and call you back?Hanging up, she leaned closer to
the monitors and continued her analysis before nally
returning the call: Itsverylowdensity.Its looking al-
most fat-like [which appears more gray than typical a
nodule]. But it actually does look like a nodule. Some-
times Imlike,Am I going crazy?’”
The second source of uncertainty emerged from ra-
diologiststask of identifying each and every nodule
in the patients lung tissue. Very frequently, they ex-
pressed concern about the possibility of missing a
nodule, fearful of making consequential errors of
omission: Idont see anything major jumping out.
Hopefully Im not missing anything(Dr. Y). This
struggle was related to the CT imaging not always
clearly capturing every region of a patients lung tis-
sue where nodules may be positioned, as in the
following case of Dr. E deliberating aloud: Am I hal-
lucinating a nodule? I think its there, but its hard
to see. Its in a bad location Its behind two ribs, so
its impossible to get a good look there.Dr. J ex-
plained how theres all this lung tucked in front and
behind right there that you just dont see [on CT imag-
ing].Radiologistsdifculty examining these
impossibleareas of lung tissue using CT imaging
raised their uncertainty. In fact, they often concluded
that a seemingly nodule-free CT scan was not deni-
tive: If you dont see the nodule on one image, that
doesnt mean its not there A lot of missed cancers,
like ten percent, were seen only from one view and
not the other(Dr. J). The CT, like other imaging tech-
nologies, may also be difcult to analyze when pa-
tients shift or fail to inhale deeply during the scan, as
in the case of Dr. S struggling to discern a particularly
blurry CT image: Its hard to tell because its such a
crappy study. He did not take a deep breath, did he?
Radiologists worked to address these rst two
sources of uncertainty by investing in various ana-
lytical practices during the nodule search.They
methodically combed through the CT images nu-
merous times, starting with the less granular set of
ve-mm images and then the more granular one-
mm set, as Dr. J explained to medical students ob-
serving her work: There is so much volume of data
on the images to deal with We scroll faster at
rst. Its good to get a general overview rst, then
we go to the smaller ones for deeper investigation.
Then, their focus turned to further evaluating each
potential nodule they identied, scrolling slowly
through the neighboring slices to assess if it ap-
peared to owin a continuous path (indicating
normal blood vessels) or disappear abruptly (indi-
cating a nodule): You have to follow the vessels. If
its something youre able to follow, then itsproba-
bly just a vessel youre catching, not a nodule.They
increased their condence using a technique called
windowingor assessing the different properties of
the tissue by adjusting the settings of the CT image
or changing its grayscale contrast: Oh, I think itsa
vessel. Yeah, I dont think its a nodule. Ah, yeah,
Im pretty sure. Windowing really helps(Dr. E). As
anal measure to address lingering concern or con-
fusion, they may request additional imaging, as Dr. J
explained to onlooking medical students: This area
looks ill-dened. So, somebody could call it a nod-
ule. We try to make a rm guess but sometimes
we call for follow-up imaging because we really
cant decide.
Finally, radiologists faced a third, and relatively less
acute, source of uncertainty during the task of charac-
terizing each nodules likelihood of malignancy. Radi-
ologists applied fairly explicit criteria and standards
to each nodule they had identied: Almost everyone
has nodules, but some of them can be cancer You
go through each nodule and make sure its solid.
Then with the prior [CT images] that youre compar-
ing to, you actually look at each nodule and visually
make sure they look the same[Dr. Y]. They rst
gauged the patients overall risk level by reviewing
their medical details (e.g., clinical symptoms, age, his-
tory of illness). Next, they scrolled through the CT im-
ages several times to explore the nodule. They used
digital tools to precisely measure its dimensions and
noted whether it was larger than the ve-mm stan-
dard associated with malignancy. They analyzed prior
CT imaging (if available), looking for changes or sta-
bility in the nodules appearance or size over time.
When Dr. J noted that a three-mm nodule was present
on a CT scan from ve years earlier where it also mea-
sured three mm, she felt highly certain in characteriz-
ing the nodule as benign: There it is [in the CT from
2014]! So its there [not new]. Oh, thats stable [noting
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 7
the consistent 3mm measurement]. There it is. Okay,
now Im good.
Experiencing Opacity of AI-in-Use (and Increasing
Uncertainty). After completing their initial analysis,
radiologists then viewed the results of an AI tool im-
plemented to aid their lung cancer diagnosis. The Ur-
banside chest imaging department purchased an AI
tool several years prior, which we refer to as the CT
AI tool,as an add-on to the CT digital imaging tech-
nology from a leading healthcare technology provid-
er. Over the years, the tool was updated numerous
times to improve its technical sophistication and per-
formance. At the time of this studys observation, the
tool performed imaging processing, segmentation,
and classication tasks utilizing articial neural net-
works that were trained and validated using large
data sets of long-term radiological outcomes. Pub-
lished research showed these AI toolsability to iden-
tify and classify nodules was similar to radiologists
cancer detection rates. Following regulatory guide-
lines, the CT AI tool was deployed as an aidto radi-
ologists, designated to be used after the radiologist
rst formed his or her independent judgment.
Clicking an icon on the digital workstation, instan-
taneously the display jumped to the rst AI result, a
circle annotation placed on a precise location of the
CT image (Figure A.2). In the intermittent cases where
the AI result and the radiologistsjudgment con-
verged that we observed, they quickly moved on to
complete the nal report. Radiologists expressed de-
light and relief when the AI results conrmed their
previously uncertain assessment that no nodules were
present: This time, [CT AI] found nothing. Any time
that happens, it puts a big smile on my face(Dr. F).
They experienced a boost in condence and certainty
after viewing the AI results, as Dr. W expressed: If I
don't see any lung nodules, and [CT AI] doesn't see
any lung nodules, then okay, we're good! I now feel
very comfortable saying there's no lung nodules.
However, in the majority of cases we observed, the
AI tools results presented a divergent view from the
radiologists initial view. Regularly, the CT AI tool
did not mark a nodule the radiologist had identied.
Even more frequently, the tool agged additional
areas that the radiologist had not identied. Radiolog-
ists began experiencing opacity, as they were unable
to understand these divergent AI results. They ques-
tioned what features of underlying lung tissue were
relevant to the tools decision: How does [the AI
tool] know that this is a nodule, but this isnt?(Dr. V).
Radiologists were deeply committed to providing
judgments with maximum certainty, but they ex-
pressed difculty feeling certain given the opacity
they experienced when considering divergent AI re-
sults: I just don't know of any radiologist who's not
looking closely at the case because they have AI. Be-
cause at the end of the day, you're still responsible.
How can you trust the machine that much?(Dr. E).
Dealing with AI Opacity: Enacting AI Interrogation
Practices and Incorporating AI Results. On the sur-
face, it may seem that using the AI tools (and
experiencing opacity) increased the overall uncertain-
ty these radiologists experienced; however, in fact, us-
ing the AI tool resulted in radiologists experiencing
less uncertainty making their nal judgments. They
achieved this by using AI interrogation practicesor
practices that human experts enact to relate their own
knowledge claims to the AI knowledge claims. For
these radiologists, enacting AI interrogation practices
involved building an understanding of the AI result
and then reconciling the divergent viewpoints. They
examined the suspected area in question, zooming in
on that region of the CT image and scrolling forward
and backward to assess the tissue surrounding the
AI-marked region. They changed the contrast settings
on the CT to analyze the areas size, shape, and densi-
ty and reviewed prior CT images to understand how
those features may have changed over time. They
were examining and probing the AI results in order to
understand them and ultimately, integrate them with
their own viewpoint.
Enacting AI interrogation practices led radiologists
to consistently integrate the AI results into their nal
judgments. Radiologists regularly updated their initial
opinion after interrogating the AI results, either
through synthesizing the divergent opinions into a
new insight or through reectively agreeing with the
AI result, as in the following case. After completing
his initial analysis, Dr. T was puzzled by three AI re-
sults suggesting nodules he had not initially agged.
He began interrogating each area marked by the AI
tool, analyzing the CT imaging to try to understand
the AI result and how it related to his own view. He
decided to overrule one AI result and expand his orig-
inal opinion to include the two new additional ones.
Even when radiologists decided to overrule the AI re-
sults, they experienced higher condence reporting
that nal diagnosis. This was the case after Dr. F swift-
ly interrogated two unexpected AI-marked areas and
related them to his own analysis: This is what [CT
AI] picked up: there and there. Its just normal stuff,
parts of the bones protruding from the chest which
sometimes looks like it could be a nodule.
Enacting AI interrogation practices required radiol-
ogists to invest additional time and analysis. They
were willing to make that investment time and time
again, which reected their positive views of the AI
resultsvalue, as expressed by Dr. F: I know my limi-
tations and I know this [CT AI] is going to help them
[nodules] stand out a little better. Its worth the extra
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
8Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
time in my mind.They viewed the AI results as dis-
tinct and complementary to their own capabilities and
expressed strong positive opinions about the tools
value in their work. This was vividly expressed by Dr.
W, a senior radiologist who moved from another hos-
pital where they did not have a CT AI tool: I actually
think [CT AI] is mission-critical. For me to read cases,
I absolutely love having the [CT AI]. I used to not
have it in my prior place [hospital]. I thought it was
the worst thing ever. And then when I came here, I
was amazed.
Indeed, the practice of interrogating and integrating
the AI results had become a critical step in how these
experts formed their nal judgments. This was re-
ected in Dr. Vs response one afternoon when the AI
results were unexpectedly unable to load for a CT she
was assessing. She instant messaged the CT techni-
cian, requesting the AI results for that study, and fol-
lowed up with a phone call when the technician did
not respond. She minimized that CT and began ana-
lyzing another case while she waited; a few minutes
later, she learned of technical issues disrupting the AI
services. Flustered, she returned to the minimized
case, scrolled through the CT several more times, and
reluctantly wrote the diagnosis report without AI in-
put: Once in a while, it denitely picks up things that
you looked at yourself and you totally ignored, that
you just couldnt see. Knowing that every now and
then it picks up something real makes you always
want to go back to it.
Producing Breast Cancer Diagnoses Using
AI Tools
As breast cancer is prevalent and highly dangerous,
diagnosing it at the earliest and most treatable stage
was a great priority for radiologists specializing in
breast imaging. On a typical day, each Urbanside
breast radiologist evaluated over 100 patientsa
highly demanding workloadand was providing
life-or-death judgments in every case: We have to
give our full attention to make the right call, but we
have so much volume were supposed to get through.
Its a conicting thing(Dr. Q). On average, they
spent less than three minutes evaluating a case, an
amount of time that did not allow extensive delibera-
tions. The consequences of making these evaluations
were extremely high, as patients were either informed
they were not currently at risk or recommended to un-
dergo additional testing, biopsy, or treatment, which
resulted in patients bearing signicant physical, emo-
tional, and nancial costs.
Forming Critical Judgments (Without AI): Experienc-
ing High Uncertainty. While making critical judgments
about breast cancer, radiologists experienced two
main sources of uncertainty. First, like radiologists
conducting the lung nodule search, breast radiologists
wrestled with identifying abnormal areas within the
complex breast tissue anatomy. The main source of ev-
idence is mammography imaging: digital x-ray imag-
ing that provided four two-dimensional images and
four three-dimensional images (side and overhead
views of each breast). For certain patient scenarios, tar-
geted ultrasound imaging was also used.
Breast radiologists worked to detect every poten-
tial abnormality in the patientsbreastimagingand
knew that overlooking a single abnormality carried
extremely high consequences. On mammography,
abnormalities typically appear as small bright white
patches amidst normal tissue ranging from white to
dark gray (see Figure A.3). Because of the subtle dif-
ferences in tissue appearance, and the difculty of
interpreting mammogram imaging, radiologists fre-
quently expressed concern about missing critical
ndings. Dr. C explained: The [abnormalities] we
worry about are really faint and tiny ones: those are
signs of early cancer. Theyre the ones you can bare-
ly see A[n abnormality] is going to be really real-
ly masked you just cantseethecancer Its
like looking for a snowball in a snowstorm.In some
cases, radiologists requested additional imaging to
be more sure, especially when the mammogram did
not capture areas of the patients body (often near
the armpit): If I could see clearly in this area [point-
ing just outside the border of the image], I wouldnt
be so concerned(Dr. L).
To increase their condence that they identied all
abnormalities, radiologists used careful analytical
practices. They combed over mammogram images,
zooming in closely on each region and scrolling
through each three-dimensional view multiple times.
They were searching for unusual patterns in the tissue
that may indicate masses, calcications, skin thicken-
ing, changes to the tissue, axillary lymph nodes, or
distortion(Dr. K). They examined asymmetries in
the appearance of the left and right breast tissue, as
Dr. P described, referring to images on her screen:
This is an area that caught my eye. This is the right
side, and this is the left. The right looks obviously dif-
ferent than the left. This is one of the things that our
eyes are trained to look for.Using careful systematic
analysis that provided diverse views and evidence
helped ward off radiologistsuncertainty, as Dr. G ex-
plained: I zoom in even more, so I'm going to see
even the tiniest nding. I zoom in, like, a lot until
I'm pretty sure I see all of them.
The second, and more intense, source of uncertainty
was characterizing each abnormalitys likelihood of
being malignant or benign. Making this distinction for
breast cancer diagnosis was challenging. Radiologists
described breast cancer as a complex disease that may
develop in unexpected ways that often varied from
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 9
patient to patient. Breast tissue anatomy is complex,
and often, malignant breast tissue may closely resem-
ble healthy tissue on mammography. Numerous
pieces of evidence needed to be analyzed and synthe-
sized: the size, shape, edges, and density of the abnor-
mality on mammography, ultrasound, and MRI (mag-
netic resonance imaging) (if available); the degree of
change across prior imaging; a patients genetic make-
up; prior history of disease; and lifestyle choices, as
well as clinical symptoms and physical examinations.
Occasionally, the evidence would overwhelmingly
support a benign judgment, as in this case: Those cal-
cications are really big and chunky, so once I look at
it closely, then I can immediately ignore it [because it
is likely benign](Dr. C). More frequently, however,
some factors would suggest benign, whereas others
did not, and radiologists struggled to reduce the acute
uncertainty: If I know something is ne or I know
something is bad, then I know right away. But there
are really a lot of cases I wafeon(Dr. B). This is il-
lustrated by the following case. During her analysis,
Dr. C noted a small gray oval abnormality on a pa-
tients mammogram. Through her analysis, she noted
the ovals small size and sharply dened edges, that
the area appeared stable for several years, and that
the medical history did not suggest increased risk (all
suggesting benign). Yet, she felt uncertain and ex-
haled deeply in frustration before ultimately recom-
mending the patient undergo additional testing: It's
probably normal tissue, but it looks so oval.Ive gotta
call her back [for additional testing]. I just cant ignore
that spot.
They expressed deeper anguish and deliberation
when judging the malignancy of abnormalities than
when searching for them, as Dr. Z explained:
Deciding what to do with an abnormal nding [decid-
ing malignant versus benign]as opposed to detect-
ing a nding in the mammogramthat takes much
more discerning.Colleagues often disagreed about
an areas likelihood of malignancy, especially because
the mammogram imaging and its various features
were open to multiple interpretations. Even after com-
pleting their full analysis, radiologists often second
guessed their nal judgment, as portrayed by Dr. Gs
continued wavering: Do I do a follow up or do I just
return to routine screening? That's really the difference
between being a cyst [benign] and something being a
solid mass [malignant]. And we can't always tell the
difference.
To build certainty in this judgment, radiologists
used a variety of analytical practices. They zoomed in
on the mammogram to examine the appearance of the
abnormality and its density, size, shape, and edge clari-
ty. They gauged whether the abnormality had changed
or remained stable across prior yearsmammograms.
They also studied the patientshealthrecords(e.g.,
physical symptoms, personal and family history, pa-
thology and surgical records) to gauge the patients
overall risk level and inform their emerging judgment.
In one case, Dr. L decided to recommend a biopsy after
considering a patients elevated risk factors, despite
the areas otherwise benign appearance: Its not overt-
ly suspicious: itsfairlycircumscribedanditsnot
very oval [both suggesting benign diagnosis]. But
this patient is here because she just found out from a
genetic risk screen that she is at increased risk for
breast cancer.
Experiencing Opacity of AI-in-Use (and Increasing
Uncertainty). After forming their initial judgment, ra-
diologists then reviewed the results of an AI tool im-
plemented to aid their diagnosis process. Several
years ago, Urbanside purchased an AI tool, which we
call the Mammo AI tool,as an add-on product to
the mammography software from the imaging tech-
nology vendor, one of the leading U.S. healthcare
technology providers. Since its implementation at Ur-
banside, the vendor provided numerous updates im-
proving the tool. During this studys observations, the
Mammo AI tool performed imaging processing, seg-
mentation, and classication tasks utilizing articial
neural networks trained and validated using large-
scale data sets with long-term radiological outcomes.
Published research reported that the tool could identi-
fy malignancies at similar rates as trained radiologists
and showed some indication of increasing radiolog-
istsoverall cancer detection rates.
2
Following regula-
tory guidelines, the tools were deployed as an aid
to radiologists, who were required to only view AI re-
sults after forming their independent evaluation. The
tool was designed so that a single mouse click dis-
played the AI tool results: a series of shapes
3
marking
the specic location on the mammogram that was
classied as malignant, with no further information
(Figure A.4).
Clicking the designated button, the AI results ap-
peared on the mammogram image, which the radiol-
ogist compared with her initial judgment. In the
infrequent cases we observed where the opinions
converged, they swiftly proceeded to the nal diag-
nostic report. However, in the large majority of cases
we observed, the AI results and the radiologists
judgment diverged. On occasion, the AI tool did not
ag an area that was initially judged as abnormal,
and far more frequently, the AI tool agged addi-
tional areas that the radiologist had not.
Radiologists experienced opacity as they encoun-
tered the AI tools unexplained results. They were un-
able to see what aspects of that tissue were causing
the AI tool to produce a given result: I dont know
why they marked these calcications, what about all
these other calcications (that the tool did not mark)?
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
10 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
They all look identical to me(Dr. C). They expressed
frustration in their inability to understand the diver-
gent AI results: What is it telling me to look at? At
this tissue? It looks just like the tissue over here, which
is perfectly normal I have no idea what it's
thinking(Dr. K). Radiologists had no practical means
of knowing the underlying reasoning of a given AI re-
sult and experienced the opacity of AI-in-use, as Dr. H
explained: [The AI tool] just points an area out and
leaves you to gure it out. Its like its saying, This is
weird; what do you want to do with it?’”
Dealing with AI Opacity: Not Enacting AI Interrogation
Practices and Not Incorporating AI Results. Like chest
radiologistsuse of the CT AI tool, in this department,
breast radiologists using the Mammo AI tool experi-
enced opacity and a surge in their level of uncertainty.
However, in this department, radiologists did not en-
act AI interrogation practices and ultimately, did not
regularly incorporate AI results into their nal judg-
ments. Instead, when faced with divergent opinions,
the radiologists tended to review the image underly-
ing the AI result in a perfunctory way before ignoring
it, or blowing it off,as Dr. G described: Iblowso
many things [AI results] off. Like if theres normal
scarring or stable calcs [benign tissue], its [AI tool]
going to pick up everything.They quickly dismissed
AI-marked areas that they previously deemed normal
without deeper inspection, writing them off as false
positives.”“I already knew that stuff it marked didnt
matter. I saw the mass was there a couple of years ago
[in prior imaging](Dr. Z). It was also common for
them to ignore AI results when the AI tool did not
ag an area they initially considered abnormal: If
theres something thats concerning to you, based on
your initial interpretation, that the [AI tool] is saying,
Oh, this looks normal,you couldnt use that informa-
tion and say, Were not going to biopsy it’” (Dr. I).
Radiologists already faced extreme uncertainty and
intense time pressure, which suddenly multiplied
when they had to reconcile the (frequently) divergent
opinions of the Mammo AI tool: So many different
factors are standing out to you all at once and giving
you conicting information, and then theres the result
from the software [Mammo AI](Dr. L). They ex-
pressed strong opinions that the Mammo AI results
did not add value to their process based on years of
repeatedly spending valuable time reviewing diver-
gent and unexplained AI results: It isnt helping any-
body. It's actually just another step for me to do(Dr.
K). Radiologists expressed negative views of having
to tediously check and ultimately, blow offAI re-
sults for every patients case, especially given the high
time pressure they faced: Its not worth my time to
evaluate it(Dr. L). Only under specic conditions
(when analyzing highly dense breast tissue) did some
radiologists comment on the potentially complemen-
tary nature of the tools results: Calcications can be
really little and sometimes hard to see. It [Mammo AI
tool] sees those calcications better than I do. But it
also sees all kinds of calcications that are neither here
nor there(Dr. B).
4
Yet, in the same breath, Dr. B con-
veyed her view (shared by most of her colleagues)
that the AI results were often useless when making
her nal judgments (they were neither here nor
there).
In the end, because of the lack of full feedback on
patientshealth over time, it is unclear whether radi-
ologistsdecisions to not incorporate AI results led
to more effective treatment or not. It is possible that
for some cases, had an AI result been incorporated,
additional patient testing may have been avoided.
For instance, Dr. L was examining new images for a
patient who had been recommended for additional
imagingbyDr.Ls colleague the week before. Dr. L
opened the patients original mammogram (from
the prior week) and reviewed the AI output, which
had not agged the area that prompted her col-
leaguesconcern:[Mammo AI] didntmarkany-
thing on this one [the prior weeks image]. It didnt
even mark the lesion that caught the radiologists
attention!(Dr. L). Interestingly, after Dr. L review-
ed the patients new images, she recorded her opin-
ion that the area was benign. This pattern was not
uncommon; radiologists often recorded benign
judgments after reviewing additional imaging. In
this case, the original AI result was consistent with
the radiologists ultimate benign diagnosis; howev-
er, its accuracy is unclear without long-term patient
health outcomes.
Producing Bone Age Diagnoses Using AI Tools
Bone ageevaluation involves radiologists specializ-
ing in pediatric imaging to assess the skeletal maturity
of children experiencing delays in growth or puberty.
This important diagnosis factors into considering
whether to treat the child with daily growth hormone
injections for a period of time. This diagnosis involves
comparing a childs bone development with estab-
lished pediatric standards to determine whether it
falls within a normalor abnormalrange for the
patients age. A pediatric radiologist at Urbanside
may perform seven or eight bone age evaluations on a
given day, among the variety of 4050 other diagnoses
they provide (e.g., evaluating lung disease on CT
scans, gastrointestinal issues on ultrasound, or scolio-
sis on x-ray). Like in the previous departments, these
radiologists faced acute pressure to work quickly and
provide high-quality, time-sensitive assessments to
physician teams caring for young patients and their
concerned parents.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 11
Forming Critical Judgments (Without AI): Experienc-
ing Lower Uncertainty. Unlike the previous two spe-
cialties, pediatric radiologists viewed this evaluation
as a straightforward comparison task and did not ex-
perience particularly high uncertainty, or as Dr. O de-
scribed, I dont think its a very sophisticated thing.
After rst quickly noting the patients age and gender,
they reviewed the sole source of evidence for this
evaluation: a single digital x-ray representing the pa-
tients hand, ngers, and wrist (see Figure A.5). They
studied the size, shape, and appearance of specic
bones visible on the x-ray and drew on their knowl-
edge of how certain parts of the hand develop differ-
ently over time: I use the phalanges [ngers] as the
gold standard, but theres also carpal bones [wrist]
and the radius ulna [forearm]. But theyre more vari-
able, so I dont look at that as much(Dr. R). Dr. D ex-
plained how she considers and weighs multiple bone
areas to build a more certain judgment: I give more
credence to distal bones [closer to ngertips], al-
though endocrinologists like the proximal [lower n-
gers or wrist], which is probably more representative
of overall height growth If theres variation, or if
theres discordance between different bones, I mental-
ly give more weight to some than others.
Then, they compared the patients bone develop-
ment with the curated set of x-ray images in the text-
book of standards used across pediatric radiology. A
single x-ray image was used to depict a childs ex-
pected bone development at each one-year increment.
Radiologists compared the appearance of the patients
hand x-ray with the standard images in the book,
searching for the closest match: Im looking at the
different shapes and seeing these are bigger than sev-
en years(Dr. D). In the following assessment, Dr. N
went back and forth between the 18- and 19-year
standards,
5
noting slight differences in the bone devel-
opment: You see here, the bones are all ssured
[pointing to patients x-ray]. And here [in the 18-year
standard image], theres still a tiny physis.A faint
white line (the tiny physis) ran horizontally be-
tween the knuckle and ngertip in the standard 18-
year-old image, but no horizontal line appeared on
the patients x-ray (it was all ssuredor no gap be-
tween the bones). Dr. N interpreted this to mean the
patients bone age was greater than 18 and thus, re-
ported his judgment of 19 years.
Lastly, radiologists performed a calculation of the
normalrange of bone ages using a data table of stan-
dard deviations for each consecutive age printed in the
textbook and reported whether their judgment of the
patients bone age fell within or outside that range.
Experiencing Opacity of AI-in-Use (and Increasing
Uncertainty). After forming their initial judgment,
the radiologist then viewed the result of the AI tool. In
2018, the Urbanside pediatric department imple-
mented a cutting-edge tool, which we refer to as the
x-ray AI tool,to aid in bone age diagnosis. Citing
the fairly straightforward comparison or pattern rec-
ognitionnature of this task, Urbanside pediatric radi-
ologists expressed high enthusiasm for using the
x-ray AI tool, as Dr. N explained: I think [the AI tool]
can be very useful You have to look very nely
and carefully at a bunch of different images. Its visu-
ally overwhelming. But I think its something a
computer is really good at Its just pattern recog-
nition.The tool was developed by a reputable re-
search institution and used deep learning methods at
the forefront of diagnostic AI development at the
time, involving multiple stages of convolutional
neural networks performing image processing, seg-
mentation, and classication tasks. Published studies
reported that the tools results matched the normal
versus abnormaljudgments of pediatric radiologists
in over 95% of test cases. Urbanside radiologists ea-
gerly agreed to participate in a multiinstitution effort
to further study the tool in settings of clinical use.
After implementation, every bone age evaluation
was automatically processed and analyzed by the
x-ray AI tool before entering the radiology work
queue. Upon opening a bone age case, the digital
x-ray displayed on the center monitor, and the diag-
nostic report software loaded on the side monitor. The
x-ray AI tool automatically populated the diagnostic
report with the AI result, a specic bone age measure-
ment, and its corresponding normalor abnormal
evaluation. Like in the previous two cases, radiolog-
ists rst formed their initial opinion, and then, they
viewed the AI result and decided how to use it.
Viewing the AI results, all of a sudden, radiologists
experienced a new surge of uncertainty, rooted in
their inability to understand or explain the AI result.
In about a third of the cases, the AI tools bone age
roughly converged with their initial judgment. How-
ever, in the majority of cases, the bone age opinions
diverged, and radiologists faced uncertainty in how to
respond: It [x-ray AI] would give me bone ages that
would make me rethink what I said Ind that Im
often disagreeing with the model. Maybe itsjustme
and I dont know how to read bone ages(Dr. D). Ra-
diologists were troubled by the discrepancies, which
led them to question their own judgments as well as
the AI tools, as in Dr. R remarking, Sometimes I felt
that the algorithm was a little inaccurate, either too
old or too young I couldnt put my nger on what
it was that was off. Or maybe I was off, maybe the
algorithm was more accurate, and I wasnt looking at
it right.
Lacking the ability to understand or examine the
tools result left radiologists frustrated: I have no
idea, I really dont. I would be curious to know. I
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
12 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
dont really know how its working(Dr. M). They
were often questioning what the AI tool was consider-
ing and guessing at the image features the AI tool
may be weighing: Id be curious to nd out what
parts of the image the algorithm actually uses I felt
it was probably looking atI wasnt surebut I felt
like it was probably looking at more of the hand than
I was Idont know how much weight the AI gives
to the different bones(Dr. R). One afternoon, a spirit-
ed discussion broke out as Dr. D attempted to reason
about the tools underlying logic: Is there a way to
tell what the algorithm used on an individual case to
come to its determination? If its looking at the
wrist bones, we would maybe disagree with it.Al-
though Dr. A agreed, she questioned Dr. Ds assump-
tion about how the tool was forming its judgment:
Yeah, but is it looking at the wrist bones?Experienc-
ing opacity of AI-in-use, Dr. D shrugged: I dont
know. I dont know how it works!Dr. A sighed in
frustration, agreeing that [i]ts a mystery.
In particular, they were bafed at how the AI tool
was producing bone age measurements at a level of
precision far greater than they were capable of pro-
ducing. Pediatric radiologists report bone age results
using the one-year increment standards, but the AI
tool reported more granular results using combina-
tions of years and months (e.g., 6 years 4 months),
which Dr. R explained, saying [i]t [x-ray AI tool]
doesnt always give you an exact number [of years]. It
gives you a kind of interpolation between standards.
We dont typically do that.They struggled to under-
stand or interpret how the AI tool was able to discern
these precise results that did not correspond with
their accepted language or approach: How is it com-
ing up with this granular of a bone age? How does
this make sense? How does it know?(Dr. A).
Dealing with AI Opacity: Not Enacting AI Interrogation
Practices and Not Incorporating AI Results. As in the
previous cases, pediatric radiologists encountered a
sudden surge of uncertainty as they experienced
opacity of AI-in-use. They struggled in the process
of relating the AI toolsresults to their own expert
knowledge, as Dr. O remarked: Idont really know
how to gauge the results from that software; Imnot
sure how itsworking.Ultimately, in the cases we
observed, pediatric radiologists did not enact AI in-
terrogation practices and thus, rarely incorporated
AI results into their nal judgments.
These radiologists faced a sudden increase in uncer-
tainty when viewing the AI results, despite the (previ-
ously) straightforward nature of the task. They were
unable to integrate the tools unfamiliar way of com-
municating bone age opinions with their own knowl-
edge about pediatric bone development: It [x-ray AI
tool] gives me things like 11 years 8 months.How
does it get that? If someone was going to ask me,
How do you know it was 11 years 8 months?Idbe
like, Idontreallyknow’” (Dr. D). Moreover, they did
not enact a rich range of analytical practices to help
them interrogate the AI result and relate it to their
own opinions. When they viewed a divergent AI bone
age opinion, they resorted to rereviewing the same
images from the x-ray and textbook and rarely trans-
formed their initial opinion as a result. This is illustrat-
ed in the following case. Dr. Ds eyes icked back and
forth between the standard images and the patients
x-ray as she formed her initial assessment: Im look-
ing at how wide is this area here [the areas separating
the bones of the ngers]. Looking at the different
shapes. This is bigger. This is the same. I think hesbe-
tween 8 and 9. The machine says between 9 and 10.
Closer to 10 actually!Reacting to the divergent AI
opinion, Dr. D cocked her head to the side and exhaled
in frustration: Now Im going to try to nd why it
said that.She continued reviewing the same image
on her screen and the textbook again, which yielded
no new insights that would change her original view:
Ifeelhes not that close to that [10 years]. I think the
machines overestimating. To me, its 8 or 9.
Discussion
Summary of Findings
This study brings to light a process of how profession-
al knowledge workers experienced and dealt with
opacity of AI-in-use when forming critical judgments.
In all three departments we studied, professionals
key practice is producing knowledge claims with the
highest level of certainty possible. Professionals in
two departments faced intense uncertainty (during
lung cancer and breast cancer diagnoses) and worked
hard to reduce it using varied analytical practices. In
the third department (when evaluating bone age),
they experienced lower uncertainty and drew on few-
er analytical practices. In all three departments, pro-
fessionals rst formed initial knowledge claims and
then considered the AI knowledge claim, which fre-
quently conicted with their initial claim. In all three
departments, professionals experienced opacity of AI-
in-use because they had no insight into the underlying
reasoning of a given AI result, which in turn, height-
ened their experience of uncertainty.
Interestingly, the three departments had divergent
patterns of the degree to which they transformed their
own knowledge as a result of considering AI tool re-
sults. Only one department consistently integrated the
AI results, despite the opacity of AI-in-use (when diag-
nosing lung cancer), whereas professionals in two other
departments did not integrate the AI results (when di-
agnosing breast cancer and bone age). Upon closer
analysis, we found that it was critical that professionals
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 13
were enacting AI interrogation practicesor practices
humans enact to relate their own knowledge claims to
AIs knowledge claims. Enacting AI interrogation prac-
tices enabled professionals to reconcile the two knowl-
edge claims (by overruling the AI claim, reectively
agreeing with it, or synthesizing the claims synergisti-
cally) and reduce their overall uncertainty. Professio-
nals who did not enact such practices struggled to
incorporate AI results because of the opacity and con-
sequently, formed their nal judgments by either blind-
ly accepting or ignoring AI claims. We differentiate
these paths of human-AI use as engaged augmentation
and unengaged augmentation.Figure 1summarizes
this process and the two paths.
Theoretical and Practical Implications
Drawing on our conceptualization, we now outline
the implications of our study for two key areas of fo-
cus for organizational scholars of AI: AI opacity and
human-AI augmentation.
Opacity of AI-in-Use and the Importance of AI Interro-
gation Practices. Opacity associated with AI tools has
become a ercely debated topic in academic and socie-
tal conversations (Pasquale 2015, von Krogh 2018,
Christin 2020, Diakopoulos 2020). Our study brings is-
sues of opacity to the center stage in studying how
professionals use AI tools to form critical judgments.
Most of the existing literature on opacity conceptual-
izes opacity as a property of AI tools, especially of tools
that use deep learning methods (Domingos 2015,
Burrell 2016, Pearl and Mackenzie 2018, Kellogg et al.
2019). Our study shifts the analytical focus from what
appears as an innate and xed property of technology
to the broader sociomaterial practice that produces
opacity as a specic technology is used in a particular
context. This enables us to focus on the process of
how AI opacity emerges in practice and how, in some
cases, professionals can deal with it.
A growing community focusing on issues of AI
opacity proposes two approaches for dealing with it.
The rst focuses on limiting the use of AI tools for
critical decisions if transparency is unattainable (e.g.,
Gillespie 2014, Domingos 2015, Burrell 2016, Teodor-
escu et al. 2021). The second approach is designing
explainableor interpretableAI tools that provide
greater transparency toward explaining AI outputs.
Although this work is critical (as we discuss), our
work uncovers a third approach. We illuminate a path
where professionals deal with opacity of AI-in-use by
enacting AI interrogation practices. These practices
provided professionals a way of validating AI results,
despite experiencing opacity, and resulted in an en-
gaged mode of human-AI augmentation.
Although many researchers are focused on devel-
oping explainable AIor interpretable AI(e.g.,
Guidotti et al. 2018, Hooker et al. 2019,Rudin2019,
Samek et al. 2019, Barredo Arrieta et al. 2020,
Fern´
andez-Lor´
ıa et al. 2020, Bauer et al. 2021, Teodor-
escu et al. 2021), some leading scholars (Simonite
2018,Cukieretal.2021) and AI designers believe
there is no need for explanations. They argue that an
AI tools evidence-based performance results should
motivate experts to rely on the toolsresultswith
condence. This assumption was expressed by a
leader of AI research at Urbanside: People talk
about explainability in AI a lot. My personal opinion
is I dont think you need to do any explaining. As
long as you show users that the tool performs well.
When it performs well, I think people are really okay
Figure 1. Experts Using AI Tools for Critical Judgments
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
14 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
working under that uncertainty.Our study shows
how that point of view is disconnected from the real-
ity of how professionals are wrestling with opacity of
using AI in practice. Based on our studysndings,
explainable or interpretable AI may enable but not
guarantee that professionals are able to integrate AI
knowledge (i.e., engaged augmentation). Our study
showed that despite AI toolshigh performance
documented in published literature, some professio-
nals chose to invest their valuable time into AI inter-
rogation practices rather than simply relying on
unexplained AI claims at face value.
If new generation AI tools provide explanations or
become more interpretable, this should impact ex-
pertsability to engage with AI but not necessarily
their motivation or willingness to do so. Such willing-
ness is inuenced by many factors, such as profes-
sional norms, organizational and nancial incen-
tives, and societal expectations. Making sense of AI
explanations requires an investment of time and re-
sources. Not only is this challenging given intense
organizational constraints (e.g., time, knowledge),
but such investment does not align with widely held
expectations that AI will make work faster and more
efcient. In medical practice, professionals solicit
opinions from their colleagues, investing in collabo-
ration only when they experience particularly high
doubt or uncertainty (on a regular but infrequent ba-
sis). In contrast, in our study (as in many leading
U.S. hospitals), the AI tool provides opinions on ev-
ery case, regardless of the professionalsdegreeof
uncertainty. Thus, professionals were spending ad-
ditional time coping with the heightened uncertain-
ty, even for simple and routine cases (where prom-
ises of AI efciency are strongest). We hope future
research will further unpack the relationship be-
tween AI and time as the push for accelerating the
pace of work is increasing (Lifshitz-Assaf et al.
2021), yet implications on the nature and quality of
work are underexplored.
Our study also contributes to the debates and con-
versations on opacity by uncovering an important
relationship between opacity of AI-in-use and epi-
stemic uncertainty (GrifnandGrote2020,Packard
and Clark 2020, Rindova and Courtney 2020). In
many knowledge elds, experts are keenly focused
on producing high-quality judgments, and they are
willing to invest resources to obtain additional evi-
dence and reduce their epistemic uncertaintyor
ignorance of unknown but knowable information
(Packard and Clark 2020). Contrary to prior litera-
ture, when professionals in our study obtained
additional evidencefrom AI tools, which often di-
verged from their prior judgment, their epistemic
uncertainty increased because of their experience of
opacity. In our study, professionals would regularly
integrate conicting knowledge provided by their
colleagues by probing one another and building on
their common ground and participation in a shared
eld (Carlile 2004,Maguireetal.2004,Levina2005).
However, when professionalsopinions diverged
from AI toolsopinions, no common ground or
shared eld exists or can be created (as tools are de-
signed today). Enacting AI interrogation practices
was the only way some professionals were able to
overcome the opacity of AI-in-use and reduce the
uncertainty needed to integrate the AI knowledge
into their own.
Future research is warranted to explain why
some professionals enact AI interrogation practices,
whereasothersdonot.Ourstudysuggeststhree
main potential factors: the AI tools ability to re-
duce professionalsuncertainty, the presence of
time pressure (and other resource constraints) on
professionalswork, and the richness of professio-
nalscomplementary technologies-in-use. Motiva-
tion to invest in AI interrogation practices may be
lowerifprofessionalsviewtheAIexpertiseassimi-
lar to (or worse than) their own. In such cases, there
is only increased pressure of investing additional
time without the benet of reducing uncertainty (as
in the breast and pediatric departments in our
study). Moreover, the time required to enact AI
interrogation practices may further deter professio-
nals from investing in them (as in the breast depart-
ment, where time pressure was extremely high). It
is also possible that professionals may still develop
AI interrogation practices as they continue using
the AI tool over a longer period of time (as in the
pediatric department); on the other hand, it may
be difcult to develop such practices when the
complementary technologies are limited and lack
richness (e.g., when analyzing x-ray images). Fu-
ture research should investigate other motivators
or deterrents that were not apparent in our context,
such as the impact of regulation or perceived legal
and institutional risks. It could be, for example, that
regulatory or authority bodies that require profes-
sionals to articulate why they overrule an AI result
may motivate investment in AI interrogation
practices.
Importantly,wedonotwishtosuggestthatAI
interrogation practices are a substitute for explain-
ability or interpretability. On the contrary, we urge
continued dedicated attention and resources toward
designing AI tools that enable professionals to more
readily integrate AI knowledge claims in practice.
For instance, when investigating the x-ray AI tool
for determining bone age, we as academic research-
ers learned from reading archival published materi-
alsthatitispossibletoproducesaliencemaps
showing what areas on the x-ray were most relevant
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 15
for producing a given AI result. Although this tech-
nology is available for the algorithm underlying the
x-ray tool, it was not implemented at Urbanside. By
highlighting the critical importance of AI interroga-
tion practices, we hope that managers, practitioners,
and researchers will focus on designing and adopt-
ing more transparent and interpretable AI tools.
This should help professionals to more easily devel-
op AI interrogation practices that build on these ex-
planations. In this study, all AI tools had a similar
degree of opacity of AI-in-use. Future studies can ex-
plore a variety of interrogation practices that may
emerge in response to different degrees of opacity.
Moreover, AI researchers can proactively design
new features that ease engagement.
We also wish to highlight our focus on professio-
nalsjudgments for critical decisions, those with par-
ticularly high consequences or costs of errors (medical
diagnoses in our study). Our ndings are relevant to
contexts where experts make critical decisions that re-
quire knowledge integration and transformation, such
as judges rendering verdicts and sentencing, human
resource managers evaluating employees, or military
experts carrying out targeted attacks. Our study
speaks to such contexts where engaged augmentation
is necessary versus those where experts may defer to
AI results even when opaque. We do not suggest that
our ndings apply for decisions that do not require
knowledge transformation or when the cost of errors
is substantially low, such as using AI for supply chain
logistics, marketing and advertising, grammatical ed-
iting, or call center prioritization.
Future research is needed to explore potential dif-
ferences in how professionals in other contexts
experience opacity and enact AI interrogation prac-
tices. This is a study of a specic profession (physi-
cians) within a highly resourced U.S. organization
(a teaching hospital), and we believe other experts
in different legal and professional environments are
important to investigate. The organization we stud-
ied has world-leading experts with high standards
of quality and strong professional accountability. In
the past few years, there has been a gold rush to
purchasing AI tools, especially in hospitals with
fewer resources and lower standards of care (Moran
2018,Gkeredakisetal.2021, Roberts et al. 2021).
Based on this study, we suggest that such a gold
rush may give rise to unengaged augmentation,
which is highly risky from a learning and knowl-
edge perspective for experts, AI companies, and
consumers. In addition, our study is based in the
U.S. legal system where hospitals must adhere to
strict regulation and oversight, which is not the case
in many countries currently adopting AI tools. Fu-
ture research is warranted on the role of regulation
on the adoption and engagement of AI tools for
critical decisions. When such regulation is missing
and fewer checks and validations are in place, en-
gaged augmentation may be even less likely and
yet, even more important.
Challenging the Taken for Granted Concept of Aug-
mentation. Professional work is currently being dis-
rupted by AI technologies, as modern AI increasing-
ly pertains to processes of producing and evaluating
knowledge claims (Anthony 2018, Faraj et al. 2018,
von Krogh 2018,Pachidietal.2021). Debates are
emerging around the degree of automation or aug-
mentation that may result as AI tools are adopted
into professional work settings (e.g., Autor 2015,
Seamans and Furman 2019, Zhang et al. 2021). Our
study speaks to this important debate by problemat-
izing the taken for granted concept of augmentation
and its implication for the future of work and human
expertise.
Within the current literature, augmentation gener-
ally refers to human in the loop scenarios where ex-
perts and AI tools collaborateso as to multiply
and combine their complementary strengths(Raisch
and Krakowski 2021,p.193).Theresultsofour
study challenge the taken for granted equivalency of
augmentation with collaboration. Instead, we sug-
gest differentiating engaged augmentation from unen-
gaged augmentation.In engaged augmentation, ex-
perts integrated AI knowledge claims with their
own, which requires both building an understand-
ing of the AI claim and the ability and willingness to
transform onesownknowledgebasedontheAI
claim (this took place in lung cancer diagnosis in our
study).ByenactingAIinterrogationpractices,pro-
fessionals were able to understand the AI result, al-
beit the opacity they experienced, and demonstrated
their willingness to change their initial judgment
through reectively agreeing with the AI claim,
overruling it, or synthesizing the two claims. From a
learning and knowledge perspective, engaged aug-
mentation scenarios could be productive and bene-
cial, including cases of reectively overruling the AI
results. Future research is needed to investigate the
learning that professionals (and AI tools) experience
when involved in engaged augmentation over ex-
tended periods of time. For example, if engaged
professionals routinely change their judgment by en-
dorsingAIresults,theymaybereproducingAIs
shortcomings over time (e.g., AI errors or biases in
judgment).
In contrast, unengaged augmentationinvolved
professionals not relating the AI knowledge claims to
their own claims (this took place during breast cancer
and bone age diagnosis in our study). These professio-
nals appeared to be using the AI tool as they were
going through the act of opening the AI results.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
16 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
However, they were not integrating knowledge claims
and mostly blindly accepting or blinding ignoring the
AI results. This path does not offer strong opportuni-
ties or benets from a learning and knowledge per-
spective. We argue that human in the loop scenarios of
unengaged augmentationare essentially cases of au-
tomation; without the ability to relate AI knowledge
claims to experts(through AI interrogation practices
or explainable or interpretable AI tools that enable in-
terrogation), what looks like augmentation on paper is
much closer to automation.
Moreover, there is an assumption that augmen-
tation will help organizations achieve positive
outcomes, usually through improving humans
knowledge insights, efciency, or both (Davenport
and Kirby 2016, Brynjolfsson and Mitchell 2017,
Daugherty and Wilson 2018, Raisch and Krakowski
2021). Researchers have already raised issues of AI
accuracy claims and superior to humanknowl-
edge performance (Lebovitz et al. 2021). Using AI
tools may have reduced some human error but could
have introduced other errors into the humansjudg-
ment (e.g., because of biases or poor training data).
To truly understand the impact on accuracy, we
must be able to compare AI outputs and experts
judgments. However, in many professional contexts,
such evaluations are limited because the knowledge
is highly uncertain, and many ground truthmeas-
ures are based on knowledge claims that lack strong
external validation (Lebovitz et al. 2021).
Our study adds to these concerns by calling into
question the assumption of increased efciency.
In all three cases we studied, experts using AI tools
spent additional time even on simplecases as they
experienced opacity and additional uncertainty. With-
in engaged augmentation, experts invested additional
time to reconcile the AI knowledge claims by enacting
AI interrogation practices. This additional time may
be justied by improvements to care quality, but it is
unclear whether healthcare systems are willing to
commit that additional time. AI vendors tend to pro-
mote their tools using promises of efciency. If man-
agers implement these tools based on such claims,
they may pressure experts to reduce time spent per
judgment, which is likely to encourage unengaged
augmentationand potentially lead to a decline in
quality (e.g., patient health outcomes).
Another perspective on our study that warrants
future research is the impact of AI on an overall pro-
fessional eld and its knowledge work over time.
Leading medical professionals have been claiming that
AI tools will eliminate the need for professional radiol-
ogists, explicitly citing the rise of diagnostic AI tools
as a case for automation. On the other hand, leading
radiologists are arguing that AI can enhance their pro-
fessional roles and abilities. In our study, we did not
nd signicant differences in attitudes toward AI
across departments, which all took positive ap-
proaches toward adopting new technologies (includ-
ing AI). Future research may explore the impact of
AI on the broader professional eld of radiology
and other professions experiencing massive disruption
because of AI tools (e.g., human resource manage-
ment, criminal justice). It could be that engaged aug-
mentation and unengaged augmentationare reac-
tive responses of professionals dealing with the
potential disruption posed by AI and automation. It
will be important to investigate how professionals re-
spond to a new technological force that is challenging
the professional jurisdiction and knowledge bound-
aries of an existing profession: for instance, how pro-
fessionals enact professional identity work (Tripsas
2009, Lifshitz-Assaf 2018) or knowledge boundary
work (Bechky 2003a, Levina and Vaast 2005,Barrett
et al. 2012) or the strategies and responses that impact
the profession eld (Nelson and Irwin 2014, Howard-
Grenville et al. 2017, Bechky 2020).
To conclude, we do not wish to convey that dealing
with the opacity related to AI tools, even by using AI
interrogation practices, should be viewed as the de-
sired or optimal path forward. From knowledge and
learning perspectives, the opacity of AI tools can be
seen as inhibiting knowledge workersfull feedback
and reective cycles (Sch¨
on 1983,Gherardi2000).
When professionals cannot analyze the reasoning be-
hind AI decisions, they miss out on the learning pro-
cess (Beer 2017), lacking opportunities to reect on,
deepen, or update their expertise (Beane 2019). Ulti-
mately, AI technologies are designed to create new
sorts of expertise to enable professionals, organiza-
tions, and even society to better address hard prob-
lems such as medical diagnosis. However, the opacity
experienced when professionals are using AI tools is
an increasingly critical problem in its own right. We
urge further researchers and policy makers to tackle
this problem, across domains and disciplines, to en-
sure that the path of new technological development
meets the needs of humanity and society.
Acknowledgments
The authors thank the special issue editors and the anony-
mous reviewers for their invaluable insights throughout
the review process. This research beneted from the help-
ful feedback provided by Beth Bechky and Foster Provost
as well as constructive comments from researchers at the
New York University (NYU) Qualitative Research Semi-
nar, the NYU Future of Work Seminar, the Stanford
Changing Nature of Work Workshop, and International
Conference of Information Systems 2020 AI in Practice
Professional Development Workshop and in the Work in
the Age of Intelligent Machines community. Finally, the
authors thank the individuals at Urbansidewho gra-
ciously allowed them to study their daily work.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 17
Appendix
Figure A.1. Single Image from a CT Scan Showing Various Lung Structures
Figure A.2. CT AI Tool Outputs
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
18 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
Figure A.3. Typical Display of the Digital Mammogram Images Used for Breast Cancer Diagnosis
Figure A.4. Mammo AI Tool Outputs
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 19
Endnotes
1
See the Radiological Society of North Americas journal Radiology:
Artificial Intelligence (https://pubs.rsna.org/journal/ai) overview of
the state of the field when it comes to AI use and other resources cu-
rated by the American College of Radiology (https://www.acrdsi.
org/).
2
This research prompted the U.S. government in 2003 to mandate
that insurance providers must reimburse the use of AI tools for
breast cancer screening, leading to wide purchasing of such tools
across U.S. breast imaging centers.
3
Three shapes were used to indicate the type of classification the
tool generated: star indicated mass,triangle indicated
calcification,and plus sign indicated the co-occurrence of mass
and calcification.
4
Calcifications are tiny flecks of calcium that can sometimes indi-
cate early signs of cancer. They are usually unable to be felt by a
physical examination. Large calcifications are not usually associated
with cancer. Clusters of small calcifications indicate extra breast cell
activity, which can be related to early cancer development but may
also be related to normal breast cell activity.
5
A physis is a growth plate located between bones. Over time, the
physis becomes thinner until eventually disappearing as one nears
full growth.
References
Albu OB, Flyverbom M (2019) Organizational transparency: Con-
ceptualizations, conditions, and consequences. Bus. Soc. 58(2):
268297.
Ananny M, Crawford K (2016) Seeing without knowing: Limitations
of the transparency ideal and its application to algorithmic ac-
countability. New Media Soc. 20(3):973989.
Anthony C (2018) To question or accept? How status differences in-
uence responses to new epistemic technologies in knowledge
work. Acad. Management Rev. 43(4):661679.
Anthony C (2021) When knowledge work and analytical technolo-
gies collide: The practices and consequences of black boxing
algorithmic technologies. Admin. Sci. Quart. ePub ahead of print
June 4, https://doi.org/10.1177/00018392211016755.
Autor DH (2015) Why are there still so many jobs? The history and
future of workplace automation. J. Econom. Perspect. 29(3):330.
Bailey D, Leonardi P, Barley S (2012) The lure of the virtual. Organ.
Sci. 23(5):14851504.
Barad K (2003) Posthumanist performativity: Toward an under-
standing of how matter comes to matter. Signs 28(3):801831.
Barley S (1986) Technology as an occasion for structuring: Technical-
ly induced change in the temporal organization of radiological
work. Admin. Sci. Quart. 3(1):78108.
Barley S (1990) The alignment of technology and structure through
roles and networks. Admin. Sci. Quart. 35(1):61103.
Barley SR, Bechky BA, Milliken FJ (2017) The changing nature of
work: Careers, identities, and work lives in the 21st century.
Acad. Management Discoveries 3(2):111115.
Barocas S, Selbst AD, Raghavan M (2020) The hidden assumptions
behind counterfactual explanations and principal reasons. Proc.
2020 Conf. Fairness, Accountability, Transparency (Association for
Computing Machinery, New York), 8089.
Barredo Arrieta A, D´
ıaz-Rodr´
ıguez N, Del Ser J, Bennetot A, Tabik
S, Barbado A, Garcia S, et al. (2020) Explainable articial intelli-
gence (XAI): Concepts, taxonomies, opportunities and chal-
lenges toward responsible AI. Inform. Fusion 58(2020):82115.
Barrett M, Oborn E, Orlikowski W (2016) Creating value in online
communities: The sociomaterial conguring of strategy, plat-
form, and stakeholder engagement. Inform. Systems Res. 27(4):
704723.
Barrett M, Oborn E, Orlikowski WJ, Yates J (2012) reconguring
boundary relations: Robotic innovations in pharmacy work. Or-
gan. Sci. 23(5):14481466.
Bauer K, Hinz O, van der Aalst W, Weinhardt C (2021) Expl(AI)n it
to me explainable AI and information systems research. Bus.
Inform. Systems Engrg. 63(2):7982.
Beane M (2019) Shadow learning: Building robotic surgical skill
when approved means fail. Admin. Sci. Quart. 64(1):87123.
Beane M, Orlikowski WJ (2015) What difference does a robot make?
The material enactment of distributed coordination. Organ. Sci.
26(6):15531573.
Figure A.5. Typical Display of the Digital Mammogram Images Used for Breast Cancer Diagnosis
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
20 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
Bechky B (2003a) Object lessons: Workplace artifacts as represen-
tations of occupational jurisdiction. Amer. J. Sociol. 109(3):
720752.
Bechky B (2003b) Sharing meaning across occupational communi-
ties: The transformation of understanding on a production
oor. Organ. Sci. 14(3):312330.
Bechky BA (2020) Evaluative spillovers from technological change:
The effects of DNA Envyon occupational practices in foren-
sic science. Admin. Sci. Quart. 65(3):606643.
Beer D (2017) The social power of algorithms. Inform. Comm. Soc.
20(1):113.
Benbya H, Pachidi S, Jarvenpaa S (2021) Special issue editorial: Arti-
cial intelligence in organizations: Implications for information
systems research. J. Assoc. Inform. Systems,https://aisel.aisnet.
org/jais/vol22/iss2/10.
Bird S, Dud´
ık M, Edgar R, Horn B, Lutz R, Milan V, Sameki M,
Wallach H, Walker K (2020) Fairlearn: A toolkit for assess-
ing and improving fairness in AI. Report, Microsoft, Red-
mond, WA.
Boyaci T, Canyakmaz C, deVericourt F (2020) Human and Machine:
The Impact of Machine Input on Decision-Making Under Cognitive
Limitations (Social Science Research Network, Rochester, NY).
Brynjolfsson E, McAfee A (2014) The Second Machine Age: Work, Pro-
gress, and Prosperity in a Time of Brilliant Technologies (W. W.
Norton & Company, New York).
Brynjolfsson E, Mitchell T (2017) What can machine learning do?
Workforce implications. Science 358(6370):15301534.
Burrell J (2016) How the machine thinks: Understanding opacity in
machine learning algorithms. Big Data Soc.,https://doi.org/10.
1177/2053951715622512.
Caplan R, Donovan J, Hanson L, Matthews J (2018) Algorithmic
accountability: A primer (Data & Society). Accessed
January 10, 2020, https://datasociety.net/library/algorithmic-
accountability-a-primer/.
Carlile PR (2004) Transferring, translating, and transforming: An in-
tegrative framework for managing knowledge across bound-
aries. Organ. Sci. 15(5):555568.
Charmaz K (2014) Constructing Grounded Theory (Sage, Thousand
Oaks, CA).
Christin A (2020) The ethnographer and the algorithm: Beyond the
black box. Theory Soc. 49(5):897918.
Crawford K, Dobbe R, Dyer T, Fried G, Green B, Kaziunas E, Kak
A, et al. (2019) AI Now 2019 Report (AI Now Institute, New
York).
Cremer DD, Kasparov G (2021) AI should augment human intelligence,
not replace it. Harvard Bus. Rev. (March 18), https://hbr.org/2021/
03/ai-should-augment-human-intelligence-not-replace-it.
Cukier K, Mayer-Schonberger V, De Vericourt F (2021) Framers: Hu-
man Advantage in an Age of Technology and Turmoil (Dutton,
New York).
Daugherty PR, Wilson HJ (2018) Human +Machine: Reimagining Work
in the Age of AI (Harvard Business Press, Cambridge, MA).
Davenport TH, Kirby J (2016) Only Humans Need Apply: Winners
and Losers in the Age of Smart Machines (HarperBusiness,
New York).
Diakopoulos N (2020) Transparency. Dubber M, Pasquale F, Das S,
eds. The Oxford Handbook of Ethics in AI (Oxford University
Press, Oxford, United Kingdom), 197214.
Dodgson M, Gann DM, Salter A (2007) In case of re, please use
the elevator: Simulation technology and organization in re
engineering. Organ. Sci. 18(5):849864.
Domingos P (2015) The Master Algorithm: How the Quest for the Ulti-
mate Learning Machine Will Remake Our World, 1st ed. (Basic
Books, New York).
Dourish P (2016) Algorithms and their others: Algorithmic culture
in context. Big Data Soc.,https://doi.org/10.1177/20539517166
65128.
Erickson I, Robert L, Crowston K, Nickerson J (2018) Workshop:
Work in the Age of Intelligent Machines. GROUP 18 Proc. 20th
ACM Internat. Conf. Supporting Groupwork (Sundial Island, FL),
359361.
Faraj S, Pachidi S, Sayegh K (2018) Working and organizing in the
age of the learning algorithm. Inform. Organ. 28(1):6270.
Fern´
andez-Lor´
ıa C, Provost F, Han X (2020) Explaining data-driven
decisions made by AI systems: The counterfactual approach. Pre-
print, submitted January 21, https://arxiv.org/abs/2001.07417v1.
Gao R, Saar-Tsechansky M, De-Arteaga M, Han L, Lee MK, Lease
M (2021) Human-AI collaboration with bandit feedback. Pre-
print, submitted May 22, https://arxiv.org/abs/2105.10614.
Gherardi S (2000) Practice-based theorizing on learning and know-
ing in organizations. Organization 7(2):211223.
Gillespie T (2014) The relevance of algorithms. Gillespie T, Bocz-
kowski PJ, Foot KA, eds. Media Technologies: Essays on Communi-
cation, Materiality, and Society (MIT Press, Cambridge, MA),
167194.
Gkeredakis M, Lifshitz-Assaf H, Barrett M (2021) Crisis as opportu-
nity, disruption and exposure: Exploring emergent responses to
crisis through digital technology. Inform. Organ. 31(1):100344.
Glaser B, Strauss A (1967) Discovering Grounded Theory (Aldine Pub-
lishing Company, Chicago).
Glikson E, Woolley AW (2020) Human trust in articial intelli-
gence: Review of empirical research. Acad. Management Ann.
14(2):627660.
Grady D (2019) A.I. took a test to detect lung cancer. It got an A.
New York Times (May 20), https://www.nytimes.com/2019/05/
20/health/cancer-articial-intelligence-ct-scans.html.
Grifn M, Grote G (2020) When is more uncertainty better? A mod-
el of uncertainty regulation and effectiveness. Acad. Management
Rev. 45(4):745765.
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi
D (2018) A survey of methods for explaining black box models.
ACM Comput. Surveys 51(5):93:193:42.
Hansen HK, Flyverbom M (2015) The politics of transparency
and the calibration of knowledge in the digital age. Organi-
zation 22(6):872889.
Haraway D (1988) Situated knowledges: The science question in
feminism and the privilege of partial perspective. Feminist Stud.
14(3):575599.
Hardy C, Lawrence TB, Grant D (2005) Discourse and collaboration:
The role of conversations and collective identity. Acad. Manage-
ment Rev. 30(1):5877.
Hooker S, Erhan D, Kindermans PJ, Kim B (2019) A benchmark for
interpretability methods in deep neural networks. Preprint,
submitted November 5, https://arxiv.org/abs/1806.10758.
Howard-Grenville J, Nelson AJ, Earle A, Haack J, Young D (2017)
If chemists dont do it, who is going to?Peer-driven occupa-
tional change and the emergence of green chemistry. Admin.
Sci. Quart. 62(3):524560.
Kaur H, Nori H, Jenkins S, Caruana R, Wallach H, Wortman
Vaughan J (2020) Interpreting interpretability: Understanding
data scientistsuse of interpretability tools for machine learn-
ing. Proc. 2020 CHI Conf. Human Factors Comput. Systems,Hono-
lulu (Association for Computing Machinery, New York), 114.
Kellogg K, Valentine M, Christin A (2019) Algorithms at work: The
new contested terrain of control. Acad. Management Ann. 14(1):
366410.
Khadpe P, Krishna R, Fei-Fei L, Hancock JT, Bernstein MS (2020)
Conceptual metaphors impact perceptions of human-AI collab-
oration. Proc. ACM Human Comput. Interactions, 163:1163:26.
Kogut B, Zander U (1992) Knowledge of the rm, combinative ca-
pabilities, and the replication of technology. Organ. Sci. 3(3):
383397.
Lebovitz S, Levina N, Lifshitz-Assaf H (2021) Is AI ground truth re-
ally true? The dangers of training and evaluating AI tools
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 21
based on expertsknow-what. Management Inform. Systems
Quart. 45(3):15011525.
Leonardi P (2011) When exible routines meet exible technologies: Af-
fordance, constraint, and the imbrication of human and material
agencies. Management Inform. Systems Quart. 35(1):147167.
Leonardi PM, Bailey DE (2008) Transformational technologies and
the creation of new work practices: Making implicit knowledge
explicit in task-based offshoring. Management Inform. Systems
Quart. 32(2):411436.
Leonardi P, Barley S (2010) Whats under construction here? Social
action, materiality, and power in constructivist studies of tech-
nology and organizing. Acad. Management Ann. 4(1):151.
Leonardi PM, Treem JW (2020) Behavioral visibility: A new para-
digm for organization studies in the age of digitization, digitali-
zation, and datacation. Organ. Stud. 41(12):16011625.
Levina N (2005) Collaborating on multiparty information systems
development projects: A collective reection-in-action view. In-
form. Systems Res. 16(2):109130.
Levina N, Vaast E (2005) The emergence of boundary spanning
competence in practice: Implications for implementation and
use of information systems. Management Inform. Systems Quart.
29(2):335363.
Lifshitz-Assaf H (2018) Dismantling knowledge boundaries at
NASA: The critical role of professional identity in open innova-
tion. Admin. Sci. Quart. 63(4):746782.
Lifshitz-Assaf H, Lebovitz S, Zalmanson L (2021) Minimal and
adaptive coordination: How hackathonsprojects accelerate in-
novation without killing it. Acad. Management J. 64(3):684715.
Maguire S, Hardy C, Lawrence TB (2004) institutional entrepre-
neurship in emerging elds: HIV/AIDS treatment advocacy
in Canada. Acad. Management J. 47(5):657679.
Mazmanian M, Cohn M, Dourish P (2014) Dynamic reconguration
in planetary exploration: A sociomaterial ethnography. Manage-
ment Inform. Systems Quart. 38(3):831848.
Mazmanian M, Orlikowski W, Yates J (2013) The autonomy para-
dox: The implications of mobile email devices for knowledge
professionals. Organ. Sci. 24(5):13371357.
Mol A (2003) The Body Multiple (Duke University Press, Durham,
NC).
Moran G (2018) This articial intelligence wonttakeyourjob,it
will help you do it better. Fast Company (October 24), https://
www.fastcompany.com/90253977/this-articial-intelligence-
wont-take-your-job-it-will-help-you-do-it-better.
Mukherjee S (2017) A.I. Vs. M.D. New Yorker (March 27), https://
www.newyorker.com/magazine/2017/04/03/ai-versus-md.
NelsonAJ,IrwinJ(2014)Dening what we doall over again:
Occupational Identity, technological change, and the librari-
an/internet-search relationship. Acad. Management J. 57(3):
892928.
Nunn J (2018) How AI Is Transforming HR Departments. Forbes
(May 9), https://www.forbes.com/sites/forbestechcouncil/
2018/05/09/how-ai-is-transforming-hr-departments/.
Orlikowski W (1992) The duality of technology: Rethinking the con-
cept of technology in organizations. Organ. Sci. 3(3):398427.
Orlikowski W (2000) Using technology and constituting structures:
A practice lens for studying technology in organizations. Organ.
Sci. 11(4):404428.
Orlikowski W (2007) Sociomaterial practices: Exploring technology
at work. Organ. Stud. 28(9):14351448.
Orlikowski W, Scott S (2008) Sociomateriality: Challenging the sepa-
ration of technology, work and organization. Acad. Management
Ann. 2(1):433474.
Pachidi S, Berends H, Faraj S, Huysman M (2021) Make way for the
algorithms: Symbolic actions and change in a regime of know-
ing. Organ. Sci. 32(1):1841.
Packard MD, Clark BB (2020) On the mitigability of uncertainty and
the choice between predictive and nonpredictive strategy. Acad.
Management Rev. 45(4):766786.
Pasquale F (2015) The Black Box Society: The Secret Algorithms That
Control Money and Information (Harvard University Press,
Cambridge, MA).
Pearl J, Mackenzie D (2018) The Book of Why: The New Science of
Cause and Effect, 1st ed. (Basic Books, New York)
Pinch T, Bijker W (1987) The social construction of facts and arti-
facts: Or how the sociology of science and the sociology of tech-
nology might benet each other. Hughes TP, Bijker W, Pinch T,
eds. The Social Construction of Technological Systems: New Direc-
tions in the Sociology and History of Technology (MIT Press, Cam-
bridge, MA), 1750.
Polanyi M (1958) Personal Knowledge: Toward a Post-Critical Philosophy
(University of Chicago Press, Chicago).
Polanyi M (1966) The Tacit Dimension (University of Chicago Press,
Chicago).
Puranam P (2021) HumanAI collaborative decision-making as an
organization design problem. J. Organ. Design 10(2021):7580.
Rai A, Constantinides P, Sarker S (2019) Editors comments: Next-
generation digital platforms: Toward humanAI hybrids. Man-
agement Inform. Systems Quart. 43(1):iiiix.
Raisch S, Krakowski S (2021) Articial intelligence and manage-
ment: The automationaugmentation paradox. Acad. Manage-
ment Rev. 46(1):192210.
Razorthink Inc. (2019) 4 major challenges facing fraud detection;
ways to resolve them using machine learning. Medium (April
25), https://medium.com/razorthink-ai/4-major-challenges-fac
ing-fraud-detection-ways-to-resolve-them-using-machine-
learning-cf6ed1b176dd.
Recht M, Bryan RN (2017) Articial intelligence: Threat or boon to
radiologists? J. Amer. College Radiology 14(11):14761480.
Rindova V, Courtney H (2020) To shape or adapt: Knowledge prob-
lems, epistemologies, and strategic postures under Knightian
uncertainty. Acad. Management Rev. 45(4):787807.
Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S,
Aviles-Rivero AI, et al. (2021) Common pitfalls and recommen-
dations for using machine learning to detect and prognosticate
for COVID-19 using chest radiographs and CT scans. Nature
Machine Intelligence 3(3):199217.
Rudin C (2019) Stop explaining black box machine learning models
for high stakes decisions and use interpretable models instead.
Nature Machine Intelligence 1(5):206215.
Samek W, Montavon G, Vedaldi A, Hansen LK, M¨
uller KR, eds.
(2019) Explainable AI: Interpreting, Explaining and Visualizing
Deep Learning (Springer Nature, Cham, Switzerland).
Sch¨
on DA (1983) The Reective Practitioner: How Professionals Think in
Action (Basic Books, New York).
Scott SV, Orlikowski WJ (2012) Reconguring relations of account-
ability: Materialization of social media in the travel sector.
Accounting Organ. Soc. 37(1):2640.
Scott S, Orlikowski W (2014) Entanglements in practice: Performing
anonymity through social media. Management Inform. Systems
Quart. 38(3):873893.
Seamans R, Furman J (2019) AI and the economy. Innovation Policy
Econom. 19(1):161191.
Simonite T (2018) Googles AI guru wants computers to think more
like brains. Wired Magazine (December 12), https://www.wired.
com/story/googles-ai-guru-computers-think-more-like-brains/.
Spradley (1979) The Ethnographic Interview (Holt, Rinehart and Win-
ston, New York).
Stohl C, Stohl M, Leonardi PM (2016) Managing opacity: Informa-
tion visibility and the paradox of transparency in the digital
age. Internat.J.Comm.10(2016):123137.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
22 Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS
Suchman L (2007) Human-Machine Recongurations: Plans and Situat-
ed Actions (Cambridge University Press, Cambridge, United
Kingdom).
Teodorescu M, Morse L, Awwad Y, Kane G (2021) Failures of fair-
ness in automation require a deeper understanding of
humanML augmentation. Management Inform. Systems Quart.
45(3b):14831499.
Tripsas M (2009) Technology, identity, and inertia through the lens
of the digital photography company.Organ. Sci. 20(2):441460.
Turco CJ (2016) The Conversational Firm: Rethinking Bureaucracy in the
AgeofSocialMedia(Columbia University Press, New York).
Van Den Broek E, Sergeeva A, Huysman M (2021) When the ma-
chine meets the expert: An ethnography of developing AI for
hiring. Management Inform. Systems Quart. 45(3):15571580.
Van Maanen J (1988) Tales of the Field: On Writing Ethnography,2nd
ed. (University of Chicago Press, Chicago).
von Krogh G (2018) Articial intelligence in organizations: New op-
portunities for phenomenon-based theorizing. Acad. Manage-
ment Discoveries 4(4):404409.
Waardenburg L, Sergeeva A, Huysman M (2018) Hotspots and
blind spots. Schultze U, Aanestad M, M¨
ahring M, Østerlund
C, Riemer K, eds. Living with Monsters? Social Implication of
Algorithmic Phenomena, Hybrid Agency, and the Permativity of
Technology, IFIP Advances in Information and Communication
Technology (Springer International Publishing, Cham, Switzer-
land), 96109.
Wagner EL, Moll J, Newell S (2011) Accounting logics, recongura-
tion of ERP systems and the emergence of new accounting
practices: A sociomaterial perspective. Management Accounting
Res. 22(3):181197.
Wagner E, Newell S, Piccoli G (2010) Understanding project surviv-
al in an ES environment: A sociomaterial practice perspective.
J. Assoc. Inform. Systems 11(5):276297.
Watkins EA (2020) Took a pic and got declined, vexed and per-
plexed: Facial recognition in algorithmic management. 2020
Comput. Supported Cooperative Work Social Comput. (Association
for Computing Machinery, New York), 177182.
Wilson HJ, Daugherty PR (2018) Collaborative intelligence: Humans
and AI are joining forces. Harvard Bus. Rev. (July 1), https://hbr.
org/2018/07/collaborative-intelligence-humans-and-ai-are-
joining-forces.
Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J (2019) Explainable
AI: A brief survey on history, research areas, approaches and
challenges. Tang J, Kan MY, Zhao D, Li S, Zan H, eds. Natural
Language Processing and Chinese Computing, Lecture Notes in Com-
puter Science (Springer International Publishing, Cham, Switzer-
land), 563574.
Zhang D, Mishra S, Brynjolfsson E, Etchemendy J, Ganguli D, Grosz
B, Lyons T, et al. (2021) The AI index 2021 annual report. Re-
port, AI Index Steering Committee, Human-Centered AI Insti-
tute, Stanford University, Stanford, CA.
Zuboff S (2015) Big other: Surveillance capitalism and the prospects
of an information civilization. J. Inform. Tech. 30(1):7589.
Sarah Lebovitz received her PhD from New York Univer-
sitys Stern School of Business and is an assistant professor
at the University of Virginias McIntire School of Commerce.
Her current research investigates how new technologies are
adopted in organizations and how they impact professionals
and their knowledge work practices. She studies how AI
tools are evaluated and used in consequential decision mak-
ing and how accelerating technologies are transforming in-
novation processes.
Hila Lifshitz-Assaf is an associate professor at New York
Universitys Stern School of Business and a faculty associate at
Harvard Universitys Laboratory for Innovation Science. Her
research focuses on the microfoundations of scientific and tech-
nological innovation and knowledge creation processes in the
digital age. She earned her doctorate from Harvard Business
School. She won a prestigious award from the National Science
Foundation for inspiring cross-disciplinary research.
Natalia Levina is a Toyota Motors Corp Term Professor
of Information Systems at New York UniversitysStern
School of Business and a part-time research environment
professor at the Warwick Business School. Her main re-
search interests focus on how people span organizational,
professional, cultural, and other boundaries while develop-
ing and using new technologies. Her current work explores
AI adoption in professional work, open innovation, block-
chain, and firm-crowd relationships.
Lebovitz, Lifshitz-Assaf, and Levina: To Engage or Not to Engage with AI
Organization Science, Articles in Advance, pp. 123, © 2022 INFORMS 23
... Our paper makes several key contributions to the literature. First, we advance research on human-inthe-loop systems (Fügener et al., 2022;Lebovitz et al., 2022) by empirically examining the downstream effects of human-AI collaboration in financial services. Using evidence from a field experiment with a realistic service provision scenario, we demonstrate how incorporating human involvement in AI-driven financial advisory influences its consumption and increases downstream customers' welfare. ...
... On the service production side, previous studies often find that human involvement can diminish the quality of AI-driven services. Specialized professionals, such as medical doctors (Lebovitz et al., 2022), judges (Berk, 2017), and supply chain workers (Sun et al., 2022), frequently discount AI-generated advice, possibly resulting in lower service quality compared to that produced by AI alone. Experts' tendency to discount AI advice aligns with the broader concept of "algorithmic aversion," where humans deviate from algorithmic recommendations due to lower tolerance for algorithmic errors (Dietvorst et al., 2015), insufficient trust in AI for subjective tasks (Castelo et al., 2019), the high-stakes nature of decision contexts such as financial investments (Önkal et al., 2009) and medical diagnoses (Longoni et al., 2019), or a perceived threat to their decision-making authority (Lebovitz et al., 2022). ...
... Specialized professionals, such as medical doctors (Lebovitz et al., 2022), judges (Berk, 2017), and supply chain workers (Sun et al., 2022), frequently discount AI-generated advice, possibly resulting in lower service quality compared to that produced by AI alone. Experts' tendency to discount AI advice aligns with the broader concept of "algorithmic aversion," where humans deviate from algorithmic recommendations due to lower tolerance for algorithmic errors (Dietvorst et al., 2015), insufficient trust in AI for subjective tasks (Castelo et al., 2019), the high-stakes nature of decision contexts such as financial investments (Önkal et al., 2009) and medical diagnoses (Longoni et al., 2019), or a perceived threat to their decision-making authority (Lebovitz et al., 2022). Conversely, there are instances where over-reliance on AI can also degrade service outcomes. ...
Preprint
Full-text available
Amid ongoing policy and managerial debates on keeping humans in the loop of AI decision-making, we investigate whether human involvement in AI-based service production benefits downstream consumers. Partnering with a large savings bank in Europe, we produced pure AI and human-AI collaborative investment advice, passed it to customers, and examined their advice-taking in a field experiment. On the production side, contrary to concerns that humans might inefficiently override AI output, we find that giving a human banker the final say over AI-generated financial advice does not compromise its quality. More importantly, on the consumption side, customers are more likely to follow investment advice from the human-AI collaboration compared to pure AI, especially when facing riskier decisions. In our setting, this increased reliance leads to higher material welfare for consumers. Additional analyses from the field experiment and an online experiment show that the persuasive power of human-AI advice cannot be explained by consumers' beliefs about enhanced advice quality due to human-AI complementarities. Instead, the benefit stems from human involvement acting as a peripheral cue that increases the advice's affective appeal. Our findings suggest that regulations and guidelines should adopt a consumer-centered approach by fostering service environments in which humans and AI systems can collaborate to improve consumer outcomes. These insights are relevant for managers designing AI-based services and for policymakers advocating for human oversight in AI systems.
... For instance, AI learns from a large set of examples as we learn from case-studies, they also attempt to explore unknown pattern within the existing knowledge, and they learn from experience, alike us (Janiesch et al., 2021). Scholars had agreed that AI will enhance human ability by complimenting the skill gap, and human and AI collaboration will play a critical role in this enhancing the ability (Sturm et al., 2021;Wilson and Daugherty, 2018;Lebovitz et al., 2022). Moreover, Ransbotham et al. (2020) found that deriving financial gains from AI is highly uncertain, which is determined by a number of factors, and collaborative learning between AI and human agent is the most critical determinant for financial gains from AI. ...
... Given the unique features of human-AI collaboration, it warrants further investigation, specifically how introducing AI in collaborative decision-making influence organizational coupled learning. Earlier research already established that only an extensive interaction with AI results in adapting AI decisions (Lebovitz et al., 2022), and pointed out the importance of combing the complementing abilities of human and AI (Ostheimer et al., 2021;Choudhary et al., 2023;Holzinger, 2016). This finding reflects the importance of understanding the dynamics in learning performance when AI is introduced in an interactive decision-making and how communication between human and AI pose a challenge in the learning progress. ...
... Moreover, a significant gap exists in current AI related research regarding the exploration of collaborative learning of human and AI when they engage in interdependent tasks. Unlike human only collaboration, human-AI collaboration introduces a unique form of learning, influenced by the factor that human agent mutually adapt with technology, complementing abilities, and enhanced yet constrained communication ability of AI (Puranam, 2021;Choudhary et al., 2023;Lebovitz et al., 2022). Coupled learning is one of the special form of learning relevant to this context, where agents learn from the combined outcome in an interdependent decision (Knudsen and Srikanth, 2014;Puranam and Swamy, 2016). ...
Thesis
The integration of Artificial Intelligence into human decision-making processes has fundamentally transformed the dynamics of collaboration in organization. This thesis explores the concept of human-AI collaboration in interdependent tasks and the role of communication in such collaboration, particularly focusing on their impact on coupled learning. By examining the unique strengths and limitations of Artificial Intelligence, the study seeks to understand how communication and interaction between human agents and AI influence learning outcomes. The study used theories on task level collaboration within organizations, and behavioral learning as the foundation of the investigation. A literature review on AI’s evolution from rule-based expert systems to advanced machine learning models has highlighted AI’s expansion of its role in augmenting human decision-making, allowing for enhanced efficiency and adaptability. Theories on organizational cooperation and coordination, task interdependence provide insights on the way human can collaborate with AI. Learning theories ranging from stimulus-response framework, social learning theory, and behavioral theory of the firm provide insights into how AI-human interaction influences learning dynamics. The study uses an experimental design to analyze the collaborative learning process between human and AI in a structured decision-making task. Experiments are conducted to compare the performance of human-AI collaborative decisions with human-only collaborations. Moreover, how the learning performance changes with the introduction of communication between agents is also studied. The experiment decision-making task is designed to identify the optimum way to rank stocks that generate maximum profit. Participants are asked to rank a set of stocks, either collaborating with another human or an AI, and either with or without a communication facility with their collaborating partner. A novel spreadsheet model is developed to aggregate the decisions of collaborating partners, and standard instructions were used to execute the experiment. The experiment assesses whether AI assistance improves learning performance compared to human-only ranking. Additionally, the study investigates how enabling communication between human participants and AI affect the learning performance. To ensure the credibility and replicability of this study, both validity and reliability are maintained. Internal validity is strengthened through a randomized lab experiment design, controlling for extraneous variables such as testing effects, selection bias, and mortality. External validity is addressed by designing experiments that reflect real-world human-AI collaboration scenarios and incorporating a diverse student sample. Construct validity is ensured by grounding measurement metrics in established organizational learning literature and using Data Envelopment Analysis (DEA) for collaborative decision aggregation. Reliability is enhanced through standardized procedures, strict adherence to experimental conditions, random subject assignment, longitudinal data collection across ten decision rounds, and the use of objective measurement indicators like cumulative payoff. Statistical analyses, including t-test and regression model, are employed to measure the effects of human-AI collaboration and inter-agent communication on coupled learning performance. Findings from the study indicate that human-AI collaboration significantly enhances coupled learning, as AI complements human capability. Moreover, the study reveals that effective communication mechanisms between collaborating agents play a crucial role in optimizing learning outcomes. Communication between human and AI improved learning performance, however, found not to be significant in terms of statistical inference. The results suggest that organizations adopting AI-driven augmentation strategies can achieve superior decision-making efficiency, and can be enhanced more provided that they establish clear communication pathways and foster adaptive learning frameworks. This study contributes to the knowledge on human-AI interaction by providing empirical evidence on the benefits and challenges of AI augmentation in collaborative learning. Theoretical implications include organizational learning theories, particularly in understanding how AI can influence learning and how AI can be integrated into existing learning theories. Practical implications include recommendations for businesses and policymakers on designing AI-augmented work environments that maximize productivity and innovation. Future studies should include longitudinal studies on the evolving nature of human-AI collaboration and the exploration of real-world applications. As AI continues to integrate into organizational decision-making, understanding its role in shaping learning dynamics and performance remains a critical area of inquiry.
... The augmentation literature suggests that human-AI interaction in unstructured decision-making tasks leads to higher performance (Choudhury et al. 2020, Lebovitz et al. 2022). AI's speed and accuracy help identify patterns that may elude human analysis, mitigating cognitive biases . ...
Article
Full-text available
Humans and artificial intelligence (AI) algorithms increasingly interact on unstructured managerial tasks. We propose that tailoring this human-AI interaction to align with individuals’ cognitive preferences is essential for enhancing performance. This hypothesis is examined through a field experiment in a multinational pharmaceutical firm. In the experiment, we manipulated four contextual parameters of human-AI interaction—work procedures, decision-making authority, training, and incentives—to align with sales experts’ cognitive styles, categorized as either adaptors or innovators. Our results show that tailored interaction significantly improves sales performance, whereas untailored interaction results in negative treatment effects compared with both the tailored and control conditions. Qualitative evidence suggests that this negative outcome arises from role conflicts and ambiguities in untailored interaction. Exploring the mechanisms underlying these outcomes further, a mediation analysis of AI login data reveals that human-AI interaction tailoring leads sales experts to adjust their AI utilization, which contributes to the observed performance outcomes. These findings support a human-centered approach to AI that prioritizes individuals’ information-processing needs and tailors their interaction with AI accordingly.
... This framework is illustrated in Fig. 1. AI can allow a human-centered partnership model where humans utilize the insights and tasks automatically generated by AI to improve their decision-making processes and the services provided (Lebovitz et al., 2022). Patient care informed by AI can change the entire healthcare process from predictive modeling to measurement-based care, with timely updates of the treatment team; data-driven therapy process, training, and supervision; digitally enabled interventions for psychoeducation, skills, and support; and using AI to reduce administrative burden. ...
... This is particularly problematic since these systems can produce unexpected outcomes (Benbya et al., 2020), which can lead to unanticipated consequences within organizations (Benbya et al., 2021). Furthermore, employees who use ML systems in their decision-making process may even experience an increase in uncertainty because they lack explanations of the underlying reasoning for the ML output (Lebovitz et al., 2022). Implementing ML systems into organizations must, therefore, not necessarily lead to increases in efficiency, and corresponding projects must be carefully evaluated and managed. ...
Conference Paper
Organizations from all industries have recently begun to develop and operate machine learning (ML) systems. While ML promises to improve an organization's effectiveness and efficiency, developing and operating ML systems remains challenging as these systems differ significantly from traditional software and require novel work practices that run counter to existing business processes. These conflicting demands create tension in the organization as resources to develop and operate ML systems are limited. Organizations thus seek to leverage scarce resources by employing a range of organizational structures and tailored tactics. To explore the interplay of organizational structures, tensions, and tactics, we conducted an explorative expert interview study informed by computational grounded theory methodology. We took an ambidextrous perspective to identify four central tensions and associated tactics employed within given organizational structures. Further, we found that organizations are moving from centralized and decentralized structures to hybrid ones to enable effective ML development and operation.
Article
Anwendungen der Künstlichen Intelligenz haben für die Reform des öffentlichen Sektors besondere Bedeutung erlangt, da sich auch große Datenmengen in kurzer Zeit verarbeiten lassen. Ausgehend von einem KI-Blackbox-Szenario mit ungewissem Ausgang (sog. nicht-experimentelle Fallvignette) befragten wir 109 Student*innen in Studiengängen der Bundesverwaltung zu ihren Wahrnehmungen eines Algorithmus-gestützten Dashboards. Auf Basis der Einflussfaktoren KI-Leistungsvermögen, Einfachheit der Nutzung, eigenes Kompetenzerleben, KI-Expertenvertrauen, KI-Überlegenheit und Datenschutzbedenken untersucht diese Studie verschiedene Nutzungsintentionen von KI-generierten Empfehlungen. In Anlehnung an Thea Snow (2021) wird zwischen vier Strategien – einer direkten Übernahme von Empfehlungen, einer reflektierten Verwendung von Empfehlungen, einer reinen Kenntnisnahme und einem bewussten Ignorieren des KI-Outputs unterschieden.
Article
Full-text available
Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools out-perform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.
Article
Full-text available
Machine learning (ML) tools reduce the costs of performing repetitive, time-consuming tasks yet run the risk of introducing systematic unfairness into organizational processes. Automated approaches to achieving fair- ness often fail in complex situations, leading some researchers to suggest that human augmentation of ML tools is necessary. However, our current understanding of human–ML augmentation remains limited. In this paper, we argue that the Information Systems (IS) discipline needs a more sophisticated view of and research into human–ML augmentation. We introduce a typology of augmentation for fairness consisting of four quadrants: reactive oversight, proactive oversight, informed reliance, and supervised reliance. We identify significant intersections with previous IS research and distinct managerial approaches to fairness for each quadrant. Several potential research questions emerge from fundamental differences between ML tools trained on data and traditional IS built with code. IS researchers may discover that the differences of ML tools undermine some of the fundamental assumptions upon which classic IS theories and concepts rest. ML may require massive rethinking of significant portions of the corpus of IS research in light of these differences, representing an exciting frontier for research into human–ML augmentation in the years ahead that IS researchers should embrace.
Conference Paper
Full-text available
Human-machine complementarity is important when neither the algorithm nor the human yield dominant performance across all instances in a given domain. Most research on algorithmic decision-making solely centers on the algorithm's performance, while recent work that explores human-machine collaboration has framed the decision-making problems as classification tasks. In this paper, we first propose and then develop a solution for a novel human-machine collaboration problem in a bandit feedback setting. Our solution aims to exploit the human-machine complementarity to maximize decision rewards. We then extend our approach to settings with multiple human decision makers. We demonstrate the effectiveness of our proposed methods using both synthetic and real human responses, and find that our methods outperform both the algorithm and the human when they each make decisions on their own. We also show how personalized routing in the presence of multiple human decision-makers can further improve the human-machine team performance.
Article
Full-text available
Machine learning methods offer great promise for fast and accurate detection and prognostication of coronavirus disease 2019 (COVID-19) from standard-of-care chest radiographs (CXR) and chest computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we consider all published papers and preprints, for the period from 1 January 2020 to 3 October 2020, which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. All manuscripts uploaded to bioRxiv, medRxiv and arXiv along with all entries in EMBASE and MEDLINE in this timeframe are considered. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 62 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher-quality model development and well-documented manuscripts.
Article
Full-text available
Artificial intelligence (AI) technologies offer novel, distinctive opportunities and pose new significant challenges to organizations that set them apart from other forms of digital technologies. This article discusses the distinct effects of AI technologies in organizations, the tensions they raise and the opportunities they present for information systems (IS) research. We explore these opportunities in term of four business capabilities: automation, engagement, insight/decision making and innovation. We discuss the differentiated effects that AI brings about and the implications for future IS research.
Article
Full-text available
We live in a technologically advanced era with a recent and marked dependence on digital technologies while also facing increasingly frequent extreme and global crises. Crises, like the COVID-19 pandemic, are significantly impacting our societies, organizations and individuals and dramatically shifting the use of, and dependence on, digital technology. The way digital technology is used to cope with crises is novel and not well understood theoretically. To explore the varied uses and impact of digital technologies during crises, we propose to view crisis as (1) opportunity, (2) disruption, and (3) exposure. Examining crisis as opportunity reveals how digital technologies enable experimentation and accelerate innovation while raising coordination challenges and risky implementation. Viewing crisis as disruption highlights how digital technologies enable the rapid shifting of organizational and occupational practices to new digital spaces, allowing work continuity, yet potentially distorting work practices and raising challenges of over-dependence. Finally, crisis exposes the societal implications in making visible and exposing digital inequalities and producing moral dilemmas for us all. We use these three perspectives to shed light on the varied uses of digital technologies in the COVID-19 crisis and suggest new avenues for research on crises more broadly.
Article
We examine counterfactual explanations for explaining the decisions made by model-based AI systems. The counterfactual approach we consider defines an explanation as a set of the system’s data inputs that causally drives the decision (i.e., changing the inputs in the set changes the decision) and is irreducible (i.e., changing any subset of the inputs does not change the decision). We (1) demonstrate how this framework may be used to provide explanations for decisions made by general data-driven AI systems that can incorporate features with arbitrary data types and multiple predictive models, and (2) propose a heuristic procedure to find the most useful explanations depending on the context. We then contrast counterfactual explanations with methods that explain model predictions by weighting features according to their importance (e.g., Shapley additive explanations [SHAP], local interpretable model-agnostic explanations [LIME]) and present two fundamental reasons why we should carefully consider whether importance-weight explanations are well suited to explain system decisions. Specifically, we show that (1) features with a large importance weight for a model prediction may not affect the corresponding decision, and (2) importance weights are insufficient to communicate whether and how features influence decisions. We demonstrate this with several concise examples and three detailed case studies that compare the counterfactual approach with SHAP to illustrate conditions under which counterfactual explanations explain data-driven decisions better than importance weights.
Article
The introduction of machine learning (ML)in organizations comes with the claim that algorithms will produce insights superior to those of experts by discovering the “truth” from data. Such a claim gives rise to a tension between the need to produce knowledge independent of domain experts and the need to remain relevant to the domain the system serves. This two-year ethnographic study focuses on how developers managed this tension when building an ML system to support the process of hiring job candidates at a large international organization. Despite the initial goal of getting domain experts “out the loop,” we found that developers and experts arrived at a new hybrid practice that relied on a combination of ML and domain expertise. We explain this outcome as resulting from a process of mutual learning in which deep engagement with the technology triggered actors to reflect on how they produced knowledge. These reflections prompted the developers to iterate between excluding domain expertise from the ML system and including it. Contrary to common views that imply an opposition between ML and domain expertise, our study foregrounds their interdependence and as such shows the dialectic nature of developing ML. We discuss the theoretical implications of these findings for the literature on information technologies and knowledge work, information system development and implementation, and human–ML hybrids.
Article
Analytical technologies that structure and process data hold great promise for organizations but also may pose fundamental challenges for how knowledge workers accomplish tasks. Knowledge workers are generally considered experts who develop deep understanding of their tools, but recent observations suggest that in some situations, they may black box their analytical technologies, meaning they trust their tools without understanding how they work. I conducted a two-year inductive ethnographic study of the use of analytical technologies across four groups in an investment bank and found two distinct paths that these groups used to validate financial analyses through what I call “validating practices”: actions that confirm whether a produced analysis is trustworthy. Surprisingly, engaging in these practices does not necessarily equate to understanding the calculations performed by the technologies. In one path, validating practices are partitioned across junior and senior roles: junior bankers engage in assembling tasks and use the analytical tools to perform analysis, while only senior bankers interpret the analysis. In the other path, junior and senior members engage in co-construction: junior bankers do both assembling and interpreting tasks, and senior bankers engage in interpreting and provide feedback on junior bankers’ reasoning and choices. Both junior and senior bankers in the partitioning groups routinely black boxed the algorithms embedded in their technologies, taking them for granted without understanding them. By contrast, bankers in the co-construction groups were conscious of the algorithms and understood their potential impact. I found that black boxing influenced the knowledge outputs of these bankers and constrained the development of junior members’ expertise, with consequences for their career trajectories.