ArticlePDF Available

Using Artificial Intelligence to Improve Administrative Process in Medicaid

Authors:

Abstract

Administrative burden across state-federal benefits programs is unsustainable, and artificial intelligence (AI) and associated technologies have emerged and resulted in significant interest as possible solutions. While early in development, AI has significant potential to reduce administrative waste and increase efficiency, with many government agencies and state legislators eager to adopt the new technology. Turning to existing frameworks defining what functions are considered “inherently governmental” can help determine where more autonomous implementation could be not only appropriate, but also provide unique advantages. Such areas could include eligibility and redetermination of Medicaid eligibility as well as preventing improper Medicaid payments. However, while AI is promising, this technology may not be ready for fully autonomous implementation and instead could be deployed to augment human capabilities with robust safeguards until it has proven to be more reliable. In the meantime, the Centers for Medicare & Medicaid Services should release clear guidance around the use of AI by state Medicaid programs, and policymakers must work together to harness AI technologies in order to improve the efficiency and effectiveness of the Medicaid program.
Using artificial intelligence to improve administrative
process in Medicaid
Ted Cho
1
, Brian J. Miller
2,3,4,
*
1
Department of Pediatrics, UCSF Benioff Children’s Hospital, San Francisco, CA, 94158, United States
2
Division of Hospital Medicine, Department of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, 21287,
United States
3
The Johns Hopkins Carey Business School, Baltimore, MD, 21202, United States
4
American Enterprise Institute, Washington, DC, 20025, United States
*Corresponding author: Division of Hospital Medicine, Department of Medicine, The Johns Hopkins Hospital, Baltimore, MD 21287, United States.
Email: brian@brianjmillermd.com
Abstract
Administrative burden across state–federal benefits programs is unsustainable, and artificial intelligence (AI) and associated technologies have
emerged and resulted in significant interest as possible solutions. While early in development, AI has significant potential to reduce
administrative waste and increase efficiency, with many government agencies and state legislators eager to adopt the new technology.
Turning to existing frameworks defining what functions are considered “inherently governmental” can help determine where more
autonomous implementation could be not only appropriate but also provide unique advantages. Such areas could include eligibility and
redetermination of Medicaid eligibility as well as preventing improper Medicaid payments. However, while AI is promising, this technology
may not be ready for fully autonomous implementation and instead could be deployed to augment human capabilities with robust safeguards
until it has proven to be more reliable. In the meantime, the Centers for Medicare and Medicaid Services should release clear guidance
around the use of AI by state Medicaid programs, and policymakers must work together to harness AI technologies in order to improve the
efficiency and effectiveness of the Medicaid program.
Key words: Medicaid; administration; artificial intelligence; enrollment; eligibility; redetermination.
Received: November 1, 2023; Revised: January 10, 2024; Accepted: January 26, 2024
© The Author(s) 2024. Published by Oxford University Press on behalf of Project HOPE - The People-To-People Health Foundation, Inc.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For
commercial re-use, please contact journals.permissions@oup.com
Using artificial intelligence to improve
administrative process in Medicaid
Administrative burden in public benets programs is generally
high, with joint state–federal programs adding additional
complexity. Estimates for administrative spending range
from 15% to 30% of total health care spending—half of
which has been characterized as wasteful
1,2
—with recent esti-
mates of annual administrative spending reaching $1 trillion.
3
There are a variety of denitions of what constitutes adminis-
trative spending; however, broadly speaking, administrative
spending is generally characterized as spending that is non-
clinical in nature, which includes spending in categories such
as billing and insurance.
1,4
While there may be some diver-
gence as to the exact proportion of overall health care spend-
ing that is attributed to administrative spending, there is no
doubt that it represents a signicant portion of overall health
care spending.
High administrative burden in health care is typically attrib-
uted to the large number of nonclinical staff, many of whom
perform routine or repetitive tasks that could readily be auto-
mated.
5
The recent boom in articial intelligence (AI), which
includes technologies such as machine learning, natural lan-
guage processing, and large language models, has promised
to create tools to automate and reduce administrative
burden,
6,7
with some even estimating savings as high as
$200–$360 billion in health care spending using existing tech-
nologies realized in the next 5 years.
6
Despite this potential,
there are many who are skeptical of AI and even those who
see potential for profound risks to society,
8,9
given recent
advances that have come in leaps and bounds, sometimes re-
ferred to as emergent abilities. However, more recent evidence
calls into question the claims of emergent abilities
10-12
and
supports others’ claims that such concerns may be exagger-
ated.
13,14
While it is important to create commonsense safe-
guards to prevent unethical or even dangerous applications
of these technologies with unintended consequences, the
potential uses of AI to reduce administrative waste and
increase efciency will be pivotal to making the US health
care system more sustainable. In this commentary, we review
governmental interest in adopting AI technologies, opportun-
ities to improve administrative efciency and operations in the
Medicaid program, as well as associated targeted policy
recommendations.
Government is eager to adopt AI
Although the government at both the state and federal levels
often lags in the adoption of new technology, AI technologies
are already being used throughout the federal government,
with a 2020 survey citing 45% of government agencies sur-
veyed expressing interest in AI, with many having planned,
Health Affairs Scholar, 2024, 2(2), 1–4
https://doi.org/10.1093/haschl/qxae008
Advance access publication: January 29, 2024
Commentary
Downloaded from https://academic.oup.com/healthaffairsscholar/article/2/2/qxae008/7591560 by guest on 21 February 2024
piloted, or implemented such technologies.
15
As mandated by
Executive Order 13960, “Promoting the Use of Trustworthy
Articial Intelligence in the Federal Government,”
16
the
Department of Health and Human Services
17
reported 23
use cases applicable to the Centers for Medicare and
Medicaid Services (CMS), including fraud detection,
18,19
pay-
ment forecasting, and drug-cost anomaly detection.
17
At the
state level, policymakers have started the conversation around
setting boundaries and implementing certain AI applications,
with at least 25 states, Puerto Rico, and the District of
Columbia introducing AI bills, and 14 states and Puerto
Rico adopting resolutions or enacting legislation in the 2023
legislative session.
20
With these current and future impending waves of interest
in and adoption of AI technologies, it is important to under-
stand where these technologies will thrive and maximize their
potential. There are certain functions within government that
are designated as “inherently governmental functions.” Per
denitions in the Federal Activities Inventory Reform (FAIR)
Act of 1998, the Ofce of Management and Budget Circular
A-76, and most recently, the Ofce of Procurement Policy
(OFPP) Policy Letter 11-01, certain government functions
are considered to be “inherently governmental” in that they
are “so intimately related to the public interest as to mandate
performance by government personnel.”
21
Functions that are
not considered “inherently governmental” are designated
“commercial functions” that can be performed by contrac-
tors. While there is a litany of different statutory, regulatory,
and policy authorities designating specic functions as either
inherently governmental or commercial, per most recent guid-
ance given in OFPP Policy Letter 11-01, there are 2 tests that
agencies are required to use to identify inherently governmen-
tal functions (see Figure 1), as follows:
1. “Nature of the function” test: functions involving exer-
cise of US sovereign power are inherently governmental
2. “Exercise of discretion” test: a function should be catego-
rized as inherently governmental when it allows for exer-
cising of discretion that “commit[s] the government to a
course of action where two or more alternative courses
of action exist and decision making is not already limited
or guided by existing policies, procedures, directions,
orders, and other guidance that: (I) identify specied
ranges of acceptable decisions or conduct concerning
the overall policy or direction of the action; and (II) sub-
ject the discretionary authority to nal approval or regu-
lar oversight by agency ofcials”
22
At present, the determination of whether a function per-
formed by AI is “inherently governmental” or “commercial”
is, in many cases, a moot point as most AI applications are cur-
rently implemented as tools empowering users to more easily
or more prociently perform their duties and functions rather
than as separate entities performing a function autonomously.
However, these 2 tests become more important as AI advances
and begins to function more autonomously. Functions that
satisfy the “nature of the function” test and allow for exercis-
ing of US sovereign power would be of too high importance to
allow for any possible error due to malfunction or otherwise
unintended outputs from AI. It could also be argued that AI
performs best in functions that have discrete and specic pa-
rameters that would preclude it from being categorized as “in-
herently governmental” per the “exercise of discretion” test.
Together, these tests should help us identify functions that
are not “inherently governmental” and would not only be ap-
propriate but perhaps also even uniquely benet from the ap-
plication of AI technologies.
When specically looking at CMS oversight of and actions of
state Medicaid agencies, 1 area that has often been acknowl-
edged as being more difcult and complicated than it needs
to be is the oversight and procedural determination of
Medicaid eligibility, both the initial determination as well as
subsequent redetermination processes. Millions of Americans
are unable to access benets due to administrative holdups,
prompting ongoing efforts to streamline and simplify the pro-
cess,
23
while simultaneously, millions of other Americans re-
main inappropriately enrolled. Redetermination is a critical
administrative process, with an estimated 17 of the 20 million
Medicaid beneciaries added during the pandemic who may
lose Medicaid coverage with the unwinding of the continuous
enrollment requirement of the Families First Coronavirus
Response Act.
24
Medicaid eligibility requirements are codied
by law, allowing for little to no discretion, so the processes of
initial determination and redetermination of eligibility would
AI Technology
Tool
Used by someone to
perform their funcon
Autonomous
Allowed to perform
funcons without or with
very li!le oversight
Inherently
Governmental
Func!on
Commercial
Func!on
Should be treated as an extension of whoever
is using the tool, but also being cognizant to
build in appropriate safeguards
AI should likely not be ulized in this se"ng
at this me as it is likely too premature to be
implemented safely
AI could be ulized as these funcons are
likely codified with li!le room for discreon,
which is ideal for this technology
Figure 1. Flowchart for evaluation and classifications of uses of artificial intelligence (AI) functions (created by authors).
2 Health Affairs Scholar, 2024, 2(2), 1–4
Downloaded from https://academic.oup.com/healthaffairsscholar/article/2/2/qxae008/7591560 by guest on 21 February 2024
represent areas of immediate administrative need for states
where AI could be used to help bridge the gap. Additionally,
AI could further be used to augment and digitize the communi-
cations process to convey status updates to beneciaries regard-
ing their determination process as well as other important
information even outside of the determination process.
Another notable area that may benet from AI intervention
is the prevention of improper Medicaid payments. Improper
Medicaid payments are quantied and tracked as mandated
by the Payment Integrity Information Act of 2019.
25
Improper payments do not necessarily indicate fraud but ra-
ther payments made that did not meet statutory, regulatory,
or administrative requirements—most often due to missing in-
formation. They do, however, still create a signicant nancial
burden, with improper Medicaid payments reaching $80.57
billion in 2022.
26
Although Managed Care represents 72%
of the Medicaid marketplace,
27
just 0.1% of improper pay-
ments are attributed to Managed Care.
26
In contrast, eligibil-
ity is estimated to account for 73.7% (>$61 billion) in
improper payments, with Fee for Service Medicaid accounting
for 26.2% (nearly $22 billion) in improper payments in
2022.
26
These improper payments are made due to a variety
of reasons that include insufcient or no documentation, cod-
ing errors, unbundling, and other errors.
26
Whether or not a
Medicaid payment can be determined to be improper is again
a codied determination with little to no room for discretion,
making the prevention of improper Medicaid payments an-
other area ripe for AI intervention.
Avoiding potential pitfalls
Despite their promise, AI technologies are still works in pro-
cess. Previous attempts to apply data-mining and algorithmic
technologies to various areas of government, including admin-
istering welfare programs, have not been without controver-
sies or failures.
28
This is not particularly surprising given
that even the more advanced technologies underlying AI are
prone to inaccuracies—some of which are referred to as “hal-
lucinations”
29
—and improperly built (ie, using incomplete,
skewed, or otherwise poor-quality training data) or imple-
mented AI tools can also be prone to algorithmic bias,
30
which, if introduced into government functions, could prove
disastrous. A prominent example of this includes facial recog-
nition algorithms that are often heavily inuenced by demo-
graphics due to training on incomplete datasets.
31
Given these limitations, for now, AI technologies should be
used to augment human capabilities until they have proven to
be more stable and can be built upon more robust datasets.
While it is likely that AI will eventually come to be imple-
mented in many truly autonomous fashions, AI should be im-
plemented more cautiously through the initial, liberal use of
human review because of the potential for severe consequen-
ces should AI fail to behave as expected when implemented
in government functions. Further safeguards include building
in auditing functions to help with human review of outputs as
well as implementing easy ways to rollback or otherwise
“undo” AI-driven actions.
Recommendations
In order to help advance the adoption of AI, CMS should re-
lease clear guidance detailing which functions that state
Medicaid programs would be permitted or even encouraged
to utilize AI tools to reduce administrative burden. This could
be maintained in a public database that state Medicaid pro-
grams could then use to signal their desire to purchase, ac-
quire, or otherwise work with vendors offering services as
well as to specify technical specications and requirements.
An improvement over the status quo of a CMS webpage listing
of state Medicaid IT procurement websites,
32
federal guidance
would create regulatory clarity for state Medicaid programs
and signal to the private sector where the need for AI
tools in improving administrative processes is greatest.
Additionally, in order to maintain this momentum, policy-
makers should establish a requirement for CMS to release
clear guidance that should be updated yearly given the rapidly
evolving nature of AI technologies. Together, these recom-
mendations will help create a solid regulatory framework
upon which both states and the private sector can collaborate
to bring the fruits of AI to beneciaries.
While perhaps not entirely ready to be unleashed without
supervision, AI continues to demonstrate rapid development
and positive potential to signicantly improve administrative
processes in Medicaid and other health benets programs,
public and private. Policymakers must work together to har-
ness and implement AI technologies in order to improve the ef-
ciency and effectiveness of the Medicaid program. With over
$1 trillion in administrative spending nationwide, now is cer-
tainly a good time to try.
Supplementary material
Supplementary material is available at Health Affairs Scholar
online.
Conflicts of interest
Please see ICMJE form(s) for author conicts of interest. These
have been provided as supplementary materials.
Notes
1. The role of administrative waste in excess US health spending.
Health Affairs Res Brief. October 6, 2022. https://doi.org/10.
1377/hpb20220909.830296. Accessed October 30, 2023. https://
www.healthaffairs.org/do/10.1377/hpb20220909.830296/
2. Tollen L, Keating E, Weil A. How administrative spending contrib-
utes to excess US health spending. Health Affairs Forefront. 2020.
Accessed October 30, 2023. https://www.healthaffairs.org/
content/forefront/administrative-spending-contributes-excess-us-
health-spending
3. Sahni N, Mishra P, Carrus B, Cutler DM. Administrative simplica-
tion: how to save a quarter-trillion dollars in US healthcare.
McKinsey Center for US Health System Reform; 2021. Accessed
October 30, 2023. https://www.mckinsey.com/industries/healthcare/
our-insights/administrative-simplication-how-to-save-a-quarter-
trillion-dollars-in-us-healthcare#/
4. Cutler D. Reducing administrative costs in U.S. health care. 2020.
The Hamilton Project. Accessed October 30, 2023. https://www.
hamiltonproject.org/assets/les/Cutler_PP_LO.pdf
5. Sahni N, Kumar P, Levine E, Singhal S. The productivity imperative
for healthcare delivery in the United States. McKinsey Center for
US Health System Reform; 2019. Accessed October 30, 2023.
https://www.mckinsey.com/industries/healthcare/our-insights/the-
productivity-imperative-for-healthcare-delivery-in-the-united-states#/
6. Sahni N, Stein G, Zemmel R, Cutler DM. The potential impact of
articial intelligence on healthcare spending. National Bureau of
Economic Research; 2023. Accessed October 30, 2023. https://
www.nber.org/system/les/working_papers/w30857/w30857.pdf
Health Affairs Scholar, 2024, 2(2), 1–4 3
Downloaded from https://academic.oup.com/healthaffairsscholar/article/2/2/qxae008/7591560 by guest on 21 February 2024
7. Kalis B, Collier M, Fu R. 10 Promising AI applications in health care.
Harvard Business Review. 2018. Accessed October 30, 2023. https://
hbr.org/2018/05/10-promising-ai-applications-in-health-care
8. Metz C, Schmidt G. Elon Musk and others call for pause on A.I.,
citing “profound risks to society”. The New York Times.
Accessed October 30, 2023. https://www.nytimes.com/2023/03/
29/technology/ai-articial-intelligence-musk-risks.html
9. Roose K. A.I. poses “risk of extinction,” industry leaders warn. The
New York Times. Accessed October 30, 2023. https://www.
nytimes.com/2023/05/30/technology/ai-threat-warning.html
10. Schaeffer R, Miranda B, Koyejo S. Are emergent abilities of large
language models a mirage? arXiv 230415004. https://doi.org/10.
48550/arXiv.2304.15004, May 22, 2023, preprint: not peer
reviewed.
11. Miller K. AI’s ostensible emergent abilities are a mirage. Stanford
University Institute for Human-Centered Articial Intelligence;
2023. Accessed October 30, 2023. https://hai.stanford.edu/news/ais-
ostensible-emergent-abilities-are-mirage
12. Srivastava A, Rastogi A, Rao A, et al. Beyond the imitation game:
quantifying and extrapolating the capabilities of language models.
arXiv 220604615. https://doi.org/10.48550/arXiv.2206.04615
June 12, 2023, preprint: not peer reviewed.
13. Grady P, Castro D. Tech panics, generative AI, and the need for
regulatory caution. 2023. Accessed October 30, 2023. https://
www2.datainnovation.org/2023-ai-panic-cycle.pdf
14. Eisikovits N. AI is an existential threat—just not the way you think.
Scientic American. 2023. Accessed October 30, 2023. https://
www.scienticamerican.com/article/ai-is-an-existential-threat-just-
not-the-way-you-think/
15. Engstrom DF, Ho DE, Sharkey CM, Cuellar M-F. Government by
algorithm: articial intelligence in federal administrative agencies.
2020. Accessed October 30, 2023. https://law.stanford.edu/wp-
content/uploads/2020/02/ACUS-AI-Report.pdf
16. Executive Ofce of the President. Promoting the Use of
Trustworthy Articial Intelligence in the Federal Government.
Federal Register; 2020.
17. US Department of Health and Human Services. Department of Health
and Human Services: Articial Intelligence Use Cases Inventory.
Accessed October 30, 2023. https://www.hhs.gov/about/agencies/
asa/ocio/ai/use-cases/index.html
18. Whiteld J. How health tech leaders use AI to combat fraud. 2023.
Accessed October 30, 2023. https://governmentciomedia.com/
how-health-tech-leaders-use-ai-combat-fraud
19. Krishan N. HHS CIO Mathias says tree-based AI models helping
to combat Medicare fraud. Fedscoop; 2023. Accessed October
30, 2023. https://fedscoop.com/hhs-cio-mathias-says-tree-based-ai-
models-helping-to-combat-medicare-fraud/
20. National Conference of State Legislatures. Articial intelligence 2023
legislation. Accessed October 30, 2023. https://www.ncsl.org/
technology-and-communication/articial-intelligence-2023-legislation
21. Congressional Research Service. Denitions of “Inherently
Governmental Function” in Federal Procurement Law and
Guidance. 2014. Accessed October 30, 2023. https://www.
everycrsreport.com/les/20141223_R42325_ba76864808b1cfc5b92
720461b225702a81ac71d.pdf
22. Ofce of Management and Budget; Ofce of Federal Procurement
Policy. Publication of the Ofce of Federal Procurement Policy
(OFPP) Policy Letter 11-01, Performance of Inherently
Governmental and Critical Functions. Federal Register; 2011.
Accessed October 30, 2023. https://www.federalregister.gov/
documents/2011/09/12/2011-23165/publication-of-the-ofce-of-
federal-procurement-policy-ofpp-policy-letter-11-01-performance-of
23. Centers for Medicare and Medicaid Services. Streamlining eligibility
& enrollment. Notice of Propose Rulemaking (NPRM). Accessed
October 30, 2023. https://www.cms.gov/newsroom/fact-sheets/
streamlining-eligibility-enrollment-notice-propose-rulemaking-nprm
24. Burns A, Williams E, Corallo B, Rudowitz R. How many people
might lose Medicaid when states unwind continuous enrollment?
Kaiser Family Foundation; 2023. Accessed October 30, 2023.
https://www.kff.org/medicaid/issue-brief/how-many-people-might-
lose-medicaid-when-states-unwind-continuous-enrollment/
25. Payment Integrity Information Act of 2019, S 375, 116th Cong,
Sess (2019).
26. Centers for Medicare and Medicaid Services. 2022 Medicaid & CHIP
supplemental improper payment data. 2022. Accessed October
30, 2023. https://www.cms.gov/les/document/2022-medicaid-chip-
supplemental-improper-payment-data.pdf-0
27. Kaiser Family Foundation. Total Medicaid MCO enrollment.
Accessed October 30, 2023. https://www.kff.org/other/state-
indicator/total-medicaid-mco-enrollment/?currentTimeframe=0&
sortModel=%7B%22colId%22:%22Location%22,%22sort%22:
%22asc%22%7D
28. Quach K. US government use of AI is shoddy and failing citizens—
because no one knows how it works. The Register; 2018. Accessed
October 30, 2023. https://www.theregister.com/2018/09/26/us_
government_algorithms/
29. O’Brien M. Chatbots sometimes make things up. Is AI’s hallucination
problem xable? Associated Press; 2023. Accessed October 30, 2023.
https://apnews.com/article/articial-intelligence-hallucination-chatbots-
chatgpt-falsehoods-ac4672c5b06e6f91050aa46ee731bcf4
30. Bousquette I. Rise of AI puts spotlight on bias in algorithms. The
Wall Street Journal. Accessed October 30, 2023. https://www.wsj.
com/articles/rise-of-ai-puts-spotlight-on-bias-in-algorithms-26ee6cc9
31. Buolamwini J, Gebru T. Gender shades: intersectional accuracy dis-
parities in commercial gender classication. Proc Mach Learn Res.
2018;81:1-15.
32. Centers for Medicare and Medicaid Services. State Medicaid IT pro-
curement opportunities. Accessed October 30, 2023. https://www.
medicaid.gov/medicaid/data-systems/state-medicaid-it-procurement-
opportunities/index.html#NC
4 Health Affairs Scholar, 2024, 2(2), 1–4
Downloaded from https://academic.oup.com/healthaffairsscholar/article/2/2/qxae008/7591560 by guest on 21 February 2024
... States that have expanded Medicaid are now using AI to better manage the increasing number of Medicaid recipients and ensure that health care funded by taxpayers is delivered efficiently [13]. This use of AI offers an opportunity to apply advanced algorithms for predictive analytics, which can improve early intervention strategies and health outcomes for Medicaid populations, potentially lowering overall health care costs. ...
Article
Full-text available
This article underscores the economic benefits of AI, the importance of collaborative innovation, and the need for workforce development to prepare health care professionals for an AI-enhanced future. We include guidance for strategic and ethical AI adoption while advocating for a unified approach to leveraging technology to improve patient outcomes.
Preprint
Full-text available
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not. Thus, our alternative suggests that existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale. We present our explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how similar metric decisions suggest apparent emergent abilities on vision tasks in diverse deep network architectures (convolutional, autoencoder, transformers). In all three analyses, we find strong supporting evidence that emergent abilities may not be a fundamental property of scaling AI models.
Preprint
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
Gender shades: intersectional accuracy disparities in commercial gender classification
  • J Buolamwini
  • T Gebru
Buolamwini J, Gebru T. Gender shades: intersectional accuracy disparities in commercial gender classification. Proc Mach Learn Res. 2018;81:1-15.
How administrative spending contributes to excess US health spending
  • L Tollen
  • E Keating
  • A Weil
Tollen L, Keating E, Weil A. How administrative spending contributes to excess US health spending. Health Affairs Forefront. 2020. Accessed October 30, 2023. https://www.healthaffairs.org/ content/forefront/administrative-spending-contributes-excess-ushealth-spending
Administrative simplification: how to save a quarter-trillion dollars in US healthcare. McKinsey Center for US Health System Reform; 2021. Accessed
  • N Sahni
  • P Mishra
  • B Carrus
  • D M Cutler
Sahni N, Mishra P, Carrus B, Cutler DM. Administrative simplification: how to save a quarter-trillion dollars in US healthcare. McKinsey Center for US Health System Reform; 2021. Accessed October 30, 2023. https://www.mckinsey.com/industries/healthcare/ our-insights/administrative-simplification-how-to-save-a-quartertrillion-dollars-in-us-healthcare#/
Reducing administrative costs in U.S. health care. 2020. The Hamilton Project. Accessed
  • D Cutler
Cutler D. Reducing administrative costs in U.S. health care. 2020. The Hamilton Project. Accessed October 30, 2023. https://www. hamiltonproject.org/assets/files/Cutler_PP_LO.pdf
The productivity imperative for healthcare delivery in the United States. McKinsey Center for US Health System Reform
  • N Sahni
  • P Kumar
  • E Levine
  • S Singhal
Sahni N, Kumar P, Levine E, Singhal S. The productivity imperative for healthcare delivery in the United States. McKinsey Center for US Health System Reform; 2019. Accessed October 30, 2023. https://www.mckinsey.com/industries/healthcare/our-insights/theproductivity-imperative-for-healthcare-delivery-in-the-united-states#/
10 Promising AI applications in health care
  • B Kalis
  • M Collier
  • R Fu
Kalis B, Collier M, Fu R. 10 Promising AI applications in health care. Harvard Business Review. 2018. Accessed October 30, 2023. https:// hbr.org/2018/05/10-promising-ai-applications-in-health-care