ArticlePDF Available

The Price of Artificial Intelligence

Authors:

Abstract

Introduction: Whilst general artificial intelligence (AI) is yet to appear, today’s narrow AI is already good enough to transform much of healthcare over the next two decades. Objective: There is much discussion of the potential benefits of AI in healthcare and this paper reviews the cost that may need to be paid for these benefits, including changes in the way healthcare is practiced, patients are engaged, medical records are created, and work is reimbursed. Results: Whilst AI will be applied to classic pattern recognition tasks like diagnosis or treatment recommendation, it is likely to be as disruptive to clinical work as it is to care delivery. Digital scribe systems that use AI to automatically create electronic health records promise great efficiency for clinicians but may lead to potentially very different types of clinical records and workflows. In disciplines like radiology, AI is likely to see image interpretation become an automated process with diminishing human engagement. Primary care is also being disrupted by AI-enabled services that automate triage, along with services such as telemedical consultations. This altered future may necessarily see an economic change where clinicians are increasingly reimbursed for value, and AI is reimbursed at a much lower cost for volume. Conclusion: AI is likely to be associated with some of the biggest changes we will see in healthcare in our lifetime. To fully engage with this change brings promise of the greatest reward. To not engage is to pay the highest price.
14
IMIA Yearbook of Medical Informatics 2019
© 2019 IMIA and Georg Thieme Verlag KG
The Price of Artificial Intelligence
Enrico Coiera
Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
We are not ready for what is about to come.
It is not that healthcare will be soon run
by a web of artificial intelligences (AIs) that
are smarter than humans. Such general AI
does not appear anywhere near the horizon.
Rather, the narrow AI that we already have,
with all its flaws and limitations, is already
good enough to transform much of what we
do, if applied carefully.
Amara’s Law tells us that we tend to
overestimate the impact of a technology in
the short run, but underestimate its impact
in the long [1]. There is no doubt that AI has
gone through another boom cycle of inflated
expectations, and that some will be disap-
pointed that promised breakthroughs have
not materialized. Yet, despite this, the next
decade will see a steadily growing stream
of AI applications across healthcare. Many
of these applications may initially be niche,
but eventually they will become mainstream.
Eventually they will lead to substantial
change in the business of healthcare. In
twenty years time, there is every prospect
the changes we find will be transformational.
Such transformation however comes with
a price. For all the benefits that will come
through improved efficiency, safety, and
clinical outcomes, there will be costs [2]. The
nature of change is that it often seems to appear
suddenly. While we are all daily distracted try-
ing to make our unyielding health system bend
to our needs using traditional approaches,
disruptive change surprises because it comes
from places we least expected, and in ways we
never quite imagined.
In linguistics, the Whorf hypothesis says
that we can only imagine what we can speak
of [3]. Our cognition is limited by the concepts
we have words for. It is much the same in the
world of health informatics. We have devel-
oped strict conceptual structures that corral AI
into solving classic pattern recognition tasks
like diagnosis or treatment recommendation.
We think of AI automating image interpreta-
tion, or sifting electronic health record data
for personalized treatment recommendations.
Most don’t often think about AI automating
foundational business processes. Yet AI is
likely to be more disruptive to clinical work
in the short run than it will be to care delivery.
Digital scribes, for example, will steadily
take on more of the clinical documentation task
[4]. Scribes are digital assistants that listen to
clinical talk such as patient consultations. They
may undertake a range of tasks from simple
transcription through to the summarization of
key speech elements into the electronic record,
as well as providing information retrieval and
question-answering services. The promise of
digital scribes is a reduction in human docu-
mentation burden. The price for this help will
be a re-engineering of the clinical encounter.
The technology to recognize and interpret
clinical speech from multiple speakers, and
to transform that speech into accurate clinical
summaries is not yet here. However, if humans
are willing to change how they speak, for
example by giving an AI commands and hints,
then much can be done today. It is easier for
a human to say “Scribe, I’d like to prescribe
some medication” than for the AI to be trained
to accurately recognize whether the speech it
is listening to is past history, present history,
or prescription talk.
The price for using a scribe might also be
an even more obvious intrusion of technol-
ogy between patient and clinician, and new
risks to patient privacy because speech data
contains even more private information than
clinician-generated records. Clinicians might
simply replace today’s effort in creating
records, where they have control over con-
tent, to new work in reviewing and editing
automated records, where content reflects
the design of the AI. There are also subtler
risks. Automation bias might mean that many
clinicians cease to worry about what should
go into a clinical document, and simply
accept whatever a machine has generated
Summary
Introduction: Whilst general artificial intelligence (AI) is yet to
appear, today’s narrow AI is already good enough to transform
much of healthcare over the next two decades.
Objective: There is much discussion of the potential benefits
of AI in healthcare and this paper reviews the cost that may
need to be paid for these benefits, including changes in the way
healthcare is practiced, patients are engaged, medical records are
created, and work is reimbursed.
Results: Whilst AI will be applied to classic pattern recognition
tasks like diagnosis or treatment recommendation, it is likely to
be as disruptive to clinical work as it is to care delivery. Digital
scribe systems that use AI to automatically create electronic
health records promise great efficiency for clinicians but may lead
to potentially very different types of clinical records and work-
flows. In disciplines like radiology, AI is likely to see image inter-
pretation become an automated process with diminishing human
engagement. Primary care is also being disrupted by AI-enabled
services that automate triage, along with services such as tele-
medical consultations. This altered future may necessarily see an
economic change where clinicians are increasingly reimbursed for
value, and AI is reimbursed at a much lower cost for volume.
Conclusion: AI is likely to be associated with some of the biggest
changes we will see in healthcare in our lifetime. To fully engage
with this change brings promise of the greatest reward. To not
engage is to pay the highest price.
Keywords
Artificial intelligence, electronic health record, radiology, primary
care, value-based care
Yearb Med Inform 2019:14-5
http://dx.doi.org/10.1055/s-0039-1677892
Published online: 25.04.2019
IMIA Yearbook of Medical Informatics 2019
15
The Price of Artificial Intelligence
[5]. Given the widespread use of copy and
paste in current day electronic records [6],
such an outcome seems a distinct possibility.
At this moment, narrow AI, predomi-
nately in the form of deep learning, is making
great inroads into pattern recognition tasks
such as diagnostic radiological image inter-
pretation [7]. The sheer volume of training
data now available, along with access to
cheap computational resources, has allowed
previously impractical neural network archi-
tectures to come into their own. When a price
for deep learning is discussed, it is often in
terms of the end of clinical professions such
as radiology or dermatology [8]. Human
expertise is to be rendered redundant by
super-human automation.
The reality is much more nuanced. Firstly,
there remain great challenges to generalizing
narrow AI methods. A well-trained deep
network typically does better on data sets
that resemble its training population [9]. The
appearance of unexpected new edge cases,
or implicit learning of features such as clin-
ical workflow or image quality [10], can all
degrade performance. One remedy for this
limitation is transfer learning [11], retraining
an algorithm on new data taken from the
local context in which it will operate. So, just
as we have seen with electronic records, the
prospect of cheap and generalizable technol-
ogy might be a fantasy, and expensive system
localization and optimization may become
the lived AI reality.
Secondly, the radiological community
has reacted early, and proactively, to these
challenges. Rather than resisting change,
there is strong evidence not just that AI is
being actively embraced within the world
of radiology, but also that there is an under-
standing that change brings not just risks, but
opportunities. In the future, radiologists might
be freed from working in darkened reading
rooms, and emerge to become highly visible
participants to clinical care. Indeed, in the
future, the idea of being an expert in just a
single modality such as image interpretation
may seem quaint, as radiologists transform
into diagnostic experts, integrating data from
multiple modalities from the genetic through
to the radiologic.
The highly interconnected nature of
healthcare means that changes in one part
of the system will require different changes
elsewhere. Radiologists in many parts of the
world are paid for each image they read. With
the arrival of cheap bulk AI image interpre-
tation, that payment model must change. The
price of reading must surely drop, and expert
humans must instead be paid for the value
they create, not the volume they process.
The same kind of business pressure is
being felt in other clinical specialties. In
primary care, for example, the arrival of
new, sometimes aggressive, players who base
their business model on AI patient triage and
telemedicine is already problematic [12, 13].
Patients might love the convenience of such
services, especially when they are technolog-
ically literate, young, and in good health, but
they may not always be so well served if they
are older, or have complex comorbidities [14].
Thus, AI-based primary care services might
end up caring for profitable low-cost and low-
risk patients, and leave the remainder to be
managed by a financially diminished existing
primary care system. One remedy to such a
risk is again to move away from reimburse-
ment for volume, to reimbursement for value.
Indeed, value-based healthcare might arrive
not as the product of government policy, but
as a necessary side effect of AI automation.
There are thus early lessons in the different
reactions to AI between primary care and
radiology. One sector is being caught by sur-
prise and playing catch up to new commercial
realities that have come more quickly than
expected; the other has begun to reimagine
itself in anticipation of becoming the ones
that craft the new reality. The price each
sector pays is different. Proactive preparation
requires investment in reshaping workforce,
and actively engaging with industry, con-
sumers, and government. It requires serious
consideration of new safety and ethical risks
[15]. In contrast, reactive resistance takes a toll
on clinical professionals who rightly wish to
defend their patients’ interests, as much as their
own right to have a stake in them. Unexpected
change may end up eroding or even destroying
important parts of the existing health system
before there is a chance to modernize them.
So, the fate of medicine, and indeed for
all of healthcare, is to change [15]. As change
makers go, AI is likely to be among the
biggest we will see in our time. Its tendrils
will touch everything from basic biomedical
discovery science through the way we each
make our daily personal health decisions. For
such change we must expect to pay a price.
What is paid, by whom, and who benefits, all
depend very much on how we engage with
this profound act of reinvention. To fully
engage brings promise of the greatest reward.
To not engage is to pay the highest price.
References
1. Roy Amara 1925–2007, American futurologist. In:
Ratcliffe S, editor. Oxford Essential Quotations.
4th ed; 2016.
2. Schwartz WB. Medicine and the Computer. The
Promise and Problems of Change. N Engl J Med
1970;283(23):1257-64.
3. Kay P, Kempton W. What is the Sapir-Whorf
hypothesis? Am Anthropol 1984;86(1):65-79.
4. Coiera E, Kocaballi B, Halamka J, Laranjo L. The
digital scribe. NPJ Digit Med 2018;1:58.
5. Lyell D, Coiera E. Automation bias and verification
complexity: a systematic review. J Am Med Inform
Assoc 2017;24(2):423-31.
6. Siegler EL, Adelman R. Copy and paste: a reme-
diable hazard of electronic health records. Am J
Med 2009 Jun;122(6):495-96.
7. Litjens G, Kooi T, Bejnordi BE, Setio AAA,
Ciompi F, Ghafoorian M, et al. A survey on deep
learning in medical image analysis. Med Image
Anal 2017 Dec;42:60-88.
8. Darcy AM, Louie AK, Roberts LW. Machine
learning and the profession of medicine. JAMA
2016;315(6):551-2.
9. Chen JH, Asch SM. Machine Learning and Prediction
in Medicine - Beyond the Peak of Inflated Expecta-
tions. New Engl J Med 2017;376(26):2507-09.
10. Zech JR, Badgeley MA, Liu M, Costa AB,
Titano JJ, Oermann EK. Variable generalization
performance of a deep learning model to detect
pneumonia in chest radiographs: A cross-sectional
study. PLoS Med 2018 Nov 6;15(11):e1002683.
11. Pan SJ, Yang Q. A survey on transfer learning. IEEE
Trans Knowl Data Eng 2010;22(10):1345-59.
12. McCartney M. General practice can’t just exclude
sick people. BMJ 2017;359:j5190.
13. Fraser H, Coiera E, Wong D. Safety of patient-fac-
ing digital symptom checkers. Lancet 2018 Nov
24;392(10161):2263-4.
14. Marshall M, Shah R, Stokes-Lampard H. Online
consulting in general practice: making the move
from disruptive innovation to mainstream service.
BMJ 2018 Mar 26;360:k1195.
15. Coiera E. The fate of medicine in the time of AI.
Lancet 2018;392(10162):2331-2.
Correspondence to:
Enrico Coiera
Australian Institute of Health Innovation
Macquarie University
Level 6 75 Talavera Rd
Sydney, NSW 2109, Australia
E-mail: enrico.coiera@mq.edu.au
... Using technologies like computerized provider order entry (CPOE) often translates to more or new work, changes in workflow, and clinician overdependence on technology [8]. Similarly, we may see unintended consequences of AI applications including changes to roles and work processes of health care professionals [4,9], factual errors or the perpetuation of existing practice biases by codifying societal biases and inequities reflected in training sets [10]. The Technological Literacy Framework described herewith addresses these concerns and can guide educational initiatives to prepare clinicians for the digital era of health care. ...
Article
With increasing use of information and communications technologies (ICTs) in health, and rapid technological changes, there is a pressing need to prepare current and future health professionals to use ICTs as an integral part of their practice. We propose the Technological Literacy Framework, which includes 3 interlinked elements—knowledge, capabilities, and critical thinking and decision making—as an overarching structure for organizing and designing competencies, learning objectives, and educational interventions for health professions education in the digital era. We provide examples of EHR and telehealth educational interventions and how they map to the framework.
... Esto no solo libera tiempo para que los profesionales de la salud se concentren en la atención directa al paciente, sino que también reduce los errores administrativos y mejora la precisión del flujo de trabajo clínico . (43,44) LatIA. 2023; 1:3 4 ...
Article
Full-text available
The integration of artificial intelligence (AI) in telemedicine is revolutionizing the provision of healthcare services, especially in rural areas. These technologies enable the overcoming of geographical and resource barriers, facilitating precise diagnoses, personalized recommendations, and continuous monitoring through portable devices. AI systems analyze patient data and suggest the most appropriate care options based on their health profile, thus optimizing the efficiency of the healthcare system and improving patient satisfaction. In addition, the automation of administrative tasks through AI frees up time for healthcare professionals to concentrate on direct care. To ensure trust and effectiveness in these technologies, it is essential to implement clinically validated and unbiased algorithms, while fostering transparency and collaboration among developers, healthcare professionals, and regulators. Therefore, AI applied to telemedicine offers a revolutionary opportunity to improve the accessibility and quality of healthcare in rural areas by promoting more equitable and efficient care.
... Healthcare education in the era of increasing AI integration will be a major topic for research in the coming years. Breast surgeons should be updated with the recent advances and applications of AI in their field to provide the best care for their patients (191,192). ...
Article
Full-text available
Background and Objective We have witnessed tremendous advances in artificial intelligence (AI) technologies. Breast surgery, a subspecialty of general surgery, has notably benefited from AI technologies. This review aims to evaluate how AI has been integrated into breast surgery practices, to assess its effectiveness in improving surgical outcomes and operational efficiency, and to identify potential areas for future research and application. Methods Two authors independently conducted a comprehensive search of PubMed, Google Scholar, EMBASE, and Cochrane CENTRAL databases from January 1, 1950, to September 4, 2023, employing keywords pertinent to AI in conjunction with breast surgery or cancer. The search focused on English language publications, where relevance was determined through meticulous screening of titles, abstracts, and full-texts, followed by an additional review of references within these articles. The review covered a range of studies illustrating the applications of AI in breast surgery encompassing lesion diagnosis to postoperative follow-up. Publications focusing specifically on breast reconstruction were excluded. Key Content and Findings AI models have preoperative, intraoperative, and postoperative applications in the field of breast surgery. Using breast imaging scans and patient data, AI models have been designed to predict the risk of breast cancer and determine the need for breast cancer surgery. In addition, using breast imaging scans and histopathological slides, models were used for detecting, classifying, segmenting, grading, and staging breast tumors. Preoperative applications included patient education and the display of expected aesthetic outcomes. Models were also designed to provide intraoperative assistance for precise tumor resection and margin status assessment. As well, AI was used to predict postoperative complications, survival, and cancer recurrence. Conclusions Extra research is required to move AI models from the experimental stage to actual implementation in healthcare. With the rapid evolution of AI, further applications are expected in the coming years including direct performance of breast surgery. Breast surgeons should be updated with the advances in AI applications in breast surgery to provide the best care for their patients.
... Digital scribes have the potential to ease the strain of manual documentation. The redesign of the clinical encounter will be the cost of this assistance [32]. But India now has a space to enhance smart humanistic services and patient satisfaction as described in [34]. ...
Article
Full-text available
Artificial intelligence (AI) has immense power to set up an ideal health ecosystem through "intelligent medicine" i.e., a combination of human and machine intelligence. However, the application of AI in healthcare is still unclear. Currently, India is facing huge challenges such as the scarcity of medical resources and the uneven distribution of medical services. This also highlights the opportunities linked to challenges and risks. The most recent pandemic has accelerated this process by acknowledging that medicine stands on the brink of an AI revolution. Incorporating the evidence on the role of precision medicine, cost-effective healthcare, and expanding humanistic and medical services, this paper demonstrates the digital health interventions for the “enhancement” of capabilities, “efficiency,” “extension of services” and upgrading “experience” in the health sector. Through thorough literature searches from PubMed, Google Scholar, and other reliable sources, this study aims to understand the evolving needs, and greater control and to bridge gaps in access to healthcare through AI. Also, India is currently developing the potential to automate multiple tasks and calling for more human interventions. The future of AI in healthcare looks promising with digital health interventions that eventually offer flexibility and convenience to both the patient and the provider. This paper will help public health professionals address ethical considerations and policy-making where AI plays a significant role in setting up an ideal health ecosystem.
... Despite commercial market approval of multiple AI products, there are very few examples of insurance reimbursement for AI. In order to establish added value for government and insurance agencies, larger clinical trials and real-life observational studies are required to demonstrate how the information is actually used by clinicians and how it impacts patient outcomes [65][66][67][68][69][70]. ...
Article
Full-text available
Over the past decade, there has been a dramatic rise in the interest relating to the application of artificial intelligence (AI) in radiology. Originally only ‘narrow’ AI tasks were possible; however, with increasing availability of data, teamed with ease of access to powerful computer processing capabilities, we are becoming more able to generate complex and nuanced prediction models and elaborate solutions for healthcare. Nevertheless, these AI models are not without their failings, and sometimes the intended use for these solutions may not lead to predictable impacts for patients, society or those working within the healthcare profession. In this article, we provide an overview of the latest opinions regarding AI ethics, bias, limitations, challenges and considerations that we should all contemplate in this exciting and expanding field, with a special attention to how this applies to the unique aspects of a paediatric population. By embracing AI technology and fostering a multidisciplinary approach, it is hoped that we can harness the power AI brings whilst minimising harm and ensuring a beneficial impact on radiology practice.
... Another troublesome requirement is that in order to achieve high prediction accuracy, ergo maximum effectiveness, AI models require large amounts of curated and labeled patient data for training [17]. This ensures that, under all circumstances, they will be able to handle the complexity of comorbidities that are frequently seen in the population [19,20]. It is essential for the future welfare of healthcare systems, and medical professionals to evaluate the patients' condition not only from the results of the digital triage but also from the patients' clinical condition. ...
Article
Full-text available
Purpose: In the Emergency Departments (ED) the current triage systems that are been implemented are based completely on medical education and the perception of each health professional who is in charge. On the other hand, cutting-edge technology, Artificial Intelligence (AI) can be incorporated into healthcare systems, supporting the healthcare professionals' decisions, and augmenting the performance of triage systems. The aim of the study is to investigate the efficiency of AI to support triage in ED. Patients–methods: The study included 332 patients from whom 23 different variables related to their condition were collected. From the processing of patient data for input variables, it emerged that the average age was 56.4 ± 21.1 years and 50.6% were male. The waiting time had an average of 59.7 ± 56.3 minutes while 3.9% ± 0.1% entered the Intensive Care Unit (ICU). In addition, qualitative variables related to the patient's history and admission clinics were used. As target variables were taken the days of stay in the hospital, which were on average 1.8 ± 5.9, and the Emergency Severity Index (ESI) for which the following distribution applies: ESI: 1, patients: 2; ESI: 2, patients: 18; ESI: 3, patients: 197; ESI: 4, patients: 73; ESI: 5, patients: 42. Results: To create an automatic patient screening classifier, a neural network was developed, which was trained based on the data, so that it could predict each patient's ESI based on input variables.The classifier achieved an overall accuracy (F1 score) of 72.2% even though there was an imbalance in the classes. Conclusions: The creation and implementation of an AI model for the automatic prediction of ESI, highlighted the possibility of systems capable of supporting healthcare professionals in the decision-making process. The accuracy of the classifier has not reached satisfactory levels of certainty, however, the performance of similar models can increase sharply with the collection of more data.
... • were not triage related (n = 2) [40,41]. ...
Article
Full-text available
Introduction Patient-operated digital triage systems with AI components are becoming increasingly common. However, previous reviews have found a limited amount of research on such systems’ accuracy. This systematic review of the literature aimed to identify the main challenges in determining the accuracy of patient-operated digital AI-based triage systems. Methods A systematic review was designed and conducted in accordance with PRISMA guidelines in October 2021 using PubMed, Scopus and Web of Science. Articles were included if they assessed the accuracy of a patient-operated digital triage system that had an AI-component and could triage a general primary care population. Limitations and other pertinent data were extracted, synthesized and analysed. Risk of bias was not analysed as this review studied the included articles’ limitations (rather than results). Results were synthesized qualitatively using a thematic analysis. Results The search generated 76 articles and following exclusion 8 articles (6 primary articles and 2 reviews) were included in the analysis. Articles’ limitations were synthesized into three groups: epistemological, ontological and methodological limitations. Limitations varied with regards to intractability and the level to which they can be addressed through methodological choices. Certain methodological limitations related to testing triage systems using vignettes can be addressed through methodological adjustments, whereas epistemological and ontological limitations require that readers of such studies appraise the studies with limitations in mind. Discussion The reviewed literature highlights recurring limitations and challenges in studying the accuracy of patient-operated digital triage systems with AI components. Some of these challenges can be addressed through methodology whereas others are intrinsic to the area of inquiry and involve unavoidable trade-offs. Future studies should take these limitations in consideration in order to better address the current knowledge gaps in the literature.
... The price for this help will be a re-engineering of the clinical encounter. 52 Unconstrained clinical conversation between patient and doctor is non-linear, with the appearance of new information (e.g., a new clinical symptom or finding) triggering a re-exploration of a previously completed task such as an enquiry about family history of disease. 53 While a fully automated method to transform conversation into complete and accurate clinical records in such a dynamic setting is beyond the state of the art, it is possible to use AI methods to undertake subtasks in this process and still meaningfully reduce clinician documentation effort. ...
Article
Full-text available
Healthcare has well-known challenges with safety, quality, and effectiveness, and many see artificial intelligence (AI) as essential to any solution. Emerging applications include the automated synthesis of best-practice research evidence including systematic reviews, which would ultimately see all clinical trial data published in a computational form for immediate synthesis. Digital scribes embed themselves in the process of care to detect, record, and summarize events and conversations for the electronic record. However, three persistent translational challenges must be addressed before AI is widely deployed. First, little effort is spent replicating AI trials, exposing patients to risks of methodological error and biases. Next, there is little reporting of patient harms from trials. Finally, AI built using machine learning may perform less effectively in different clinical settings.
Article
During the pandemic, artificial intelligence was employed and utilized by students around the globe. Students' conduct changed in a variety of ways when schooling returned to regular instruction. This study aimed to analyze the student's behavioral intention and actual academic use of communicational AI (CAI) as an educational tool. This study identified the variables by utilizing an integrated framework based on the Unified Theory of Acceptance and Use of Technology (UTAUT2) and self-determination theory. Through the use of an online survey and Structural Equation Modeling, data from 533 respondents were analyzed. The results showed that perceived relatedness has the most significant effect on the behavioral intention of students in using CAI as an educational tool, followed by perceived autonomy. It showed that students use CAI based on the objective and the possibility of increasing their productivity, rather than any other purpose in the education setting. Among the UTAUT2 domains, only facilitating conditions, habit, and performance expectancy provided a significant direct effect on behavioral intention and an indirect effect on actual academic use. Further implications were presented. Moreover, the methodology and framework of this study could be extended and applied to educational technology-related studies. Lastly, the outcome of this study may be considered in analyzing the behavioral intention of the students as the teaching-learning environment is still continuously expanding and developing.
Article
Full-text available
Mis-diagnosis by physicians is a common problem affecting 5% of outpatients. There is a growth in interest in computerised diagnostic decision support systems for physicians, and increasingly for direct use by patients on mobile phones, termed Symptom Checkers(SC). These have the potential to improve the way in which health care is delivered and reduce the burden on GP services. However claims have been made that SC from Babylon Health is more accurate at diagnosis than physicians. Evaluations to date have primarily been conducted in controlled environments using clinician-generated scenarios, and surrogate outcomes such as diagnostic performance in lieu of clinical outcomes. Such results are unlikely to reflect real-world use and can be unrealistically optimistic. Patients use risks missing important diagnoses and/or may increasing the burden on the health system. To avoid this, we advocate the use of multi-stage evaluation, building on many years of experience in health informatics and reflecting best practice in other areas of medicine.
Article
Full-text available
Background There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task. Methods and findings A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong’s test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855–0.866) on the joint MSH–NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927–0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745–0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH–NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system–specific biases. Conclusion Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.
Article
Full-text available
Current generation electronic health records suffer a number of problems that make them inefficient and associated with poor clinical satisfaction. Digital scribes or intelligent documentation support systems, take advantage of advances in speech recognition, natural language processing and artificial intelligence, to automate the clinical documentation task currently conducted by humans. Whilst in their infancy, digital scribes are likely to evolve through three broad stages. Human led systems task clinicians with creating documentation, but provide tools to make the task simpler and more effective, for example with dictation support, semantic checking and templates. Mixed-initiative systems are delegated part of the documentation task, converting the conversations in a clinical encounter into summaries suitable for the electronic record. Computer-led systems are delegated full control of documentation and only request human interaction when exceptions are encountered. Intelligent clinical environments permit such augmented clinical encounters to occur in a fully digitised space where the environment becomes the computer. Data from clinical instruments can be automatically transmitted, interpreted using AI and entered directly into the record. Digital scribes raise many issues for clinical practice, including new patient safety risks. Automation bias may see clinicians automatically accept scribe documents without checking. The electronic record also shifts from a human created summary of events to potentially a full audio, video and sensor record of the clinical encounter. Digital scribes promisingly offer a gateway into the clinical workflow for more advanced support for diagnostic, prognostic and therapeutic tasks.
Article
Full-text available
Big data, we have all heard, promise to transform health care. But in the “hype cycle” of emerging technologies, machine learning now rides atop the “peak of inflated expectations,” and we need to better appreciate the technology’s capabilities and limitations.
Article
Full-text available
Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.
Article
Introduction: While potentially reducing decision errors, decision support systems can introduce new types of errors. Automation bias (AB) happens when users become overreliant on decision support, which reduces vigilance in information seeking and processing. Most research originates from the human factors literature, where the prevailing view is that AB occurs only in multitasking environments. Objectives: This review seeks to compare the human factors and health care literature, focusing on the apparent association of AB with multitasking and task complexity. Data sources: EMBASE, Medline, Compendex, Inspec, IEEE Xplore, Scopus, Web of Science, PsycINFO, and Business Source Premiere from 1983 to 2015. Study selection: Evaluation studies where task execution was assisted by automation and resulted in errors were included. Participants needed to be able to verify automation correctness and perform the task manually. Methods: Tasks were identified and grouped. Task and automation type and presence of multitasking were noted. Each task was rated for its verification complexity. Results: Of 890 papers identified, 40 met the inclusion criteria; 6 were in health care. Contrary to the prevailing human factors view, AB was found in single tasks, typically involving diagnosis rather than monitoring, and with high verification complexity. Limitations: The literature is fragmented, with large discrepancies in how AB is reported. Few studies reported the statistical significance of AB compared to a control condition. Conclusion: AB appears to be associated with the degree of cognitive load experienced in decision tasks, and appears to not be uniquely associated with multitasking. Strategies to minimize AB might focus on cognitive load reduction.
Article
This Viewpoint discusses the opportunities and ethical implications of using machine learning technologies, which can rapidly collect and learn from large amounts of personal data, to provide individalized patient care.Must a physician be human? A new computer, “Ellie,” developed at the Institute for Creative Technologies, asks questions as a clinician might, such as “How easy is it for you to get a good night’s sleep?” Ellie then analyzes the patient’s verbal responses, facial expressions, and vocal intonations, possibly detecting signs of posttraumatic stress disorder, depression, or other medical conditions. In a randomized study, 239 probands were told that Ellie was “controlled by a human” or “a computer program.” Those believing the latter revealed more personal material to Ellie, based on blind ratings and self-reports.1 In China, millions of people turn to Microsoft’s chatbot, “Xiaoice,”2 when they need a “sympathetic ear,” despite knowing that Xiaoice is not human. Xiaoice develops a specially attuned personality and sense of humor by methodically mining the Internet for real text conversations. Xiaoice also learns about users from their reactions over time and becomes sensitive to their emotions, modifying responses accordingly, all without human instruction. Ellie and Xiaoice are the result of machine learning technology.