ArticlePDF Available

The global geography of artificial intelligence in life science research

Springer Nature
Nature Communications
Authors:

Abstract and Figures

Artificial intelligence (AI) promises to transform medicine, but the geographic concentration of AI expertize may hinder its equitable application. We analyze 397,967 AI life science research publications from 2000 to 2022 and 14.5 million associated citations, creating a global atlas that distinguishes productivity (i.e., publications), quality-adjusted productivity (i.e., publications stratified by field-normalized rankings of publishing outlets), and relevance (i.e., citations). While Asia leads in total publications, Northern America and Europe contribute most of the AI research appearing in high-ranking outlets, generating up to 50% more citations than other regions. At the global level, international collaborations produce more impactful research, but have stagnated relative to national research efforts. Our findings suggest that greater integration of global expertize could help AI deliver on its promise and contribute to better global health.
This content is subject to copyright. Terms and conditions apply.
Article https://doi.org/10.1038/s41467-024-51714-x
The global geography of articial
intelligence in life science research
Leo Schmallenbach
1
,TillW.Bärnighausen
2,3,4
& Marc J. Lerchenmueller
1,5
Articial intelligence (AI) promises to transform medicine, but the geographic
concentration of AI expertize may hinder its equitable application. We analyze
397,967 AI life science research publications from 2000 to 2022 and 14.5
million associated citations, creating a global atlas that distinguishes pro-
ductivity (i.e., publications), quality-adjusted productivity (i.e., publications
stratied by eld-normalized rankings of publishing outlets), and relevance
(i.e., citations). While Asia leads in total publications, Northern America and
Europe contribute most of the AI research appearing in high-ranking outlets,
generating up to 50% more citations than other regions. At the global level,
international collaborations produce more impactful research, but have
stagnated relative to national research efforts. Our ndings suggest that
greater integration of global expertize could help AI deliver on its promise and
contribute to better global health.
Articial intelligence (AI) promises to transform the life sciences and,
ultimately, medical care1.Broadlydened, AI refers to the ability of a
digital computer or computer-controlled robot to perform tasks
commonly associated with intelligent beings2. In the life sciences, AI is
already widely used, for example, when computers analyze large
amounts of patient data to aid in initial diagnoses, or when algorithms
optimize patient enrollment in clinical trials for drug development35.
The high hopes for the growing use of AI technology are reected in
estimates that the global market for AI-based medical care will grow
eightfold by 20276.
Against this backdrop, the geography of the AI life science research
enterprise, i.e., research that incorporates AI in a life science context, is
important for at least three reasons. First, a longstanding line of research
has documented that scientic advancement benets from
collaboration7, especially across borders8. Research ideas are rarely
conned to national boundaries, the talent needed to conduct research is
geographically dispersed, and the challenges of a globalized world
require the collaboration of international scientists to derive integrated
insights9,10. Second, and more specic to AI in the life sciences, geo-
graphically concentrated research runs the risk of creating biased data
foundations that distort inferences and, possibly, lead to biased medical
care11. Recent research has already documented biases showing, for
example, that the underrepresentation of ethnicities in training data can
lead to distortions in prognosis, diagnosis, and treatment12,13.AstheAI
research agenda in the life sciences rapidly accelerates, fueled by national
funding and possibly concomitant interests, questions about effective-
ness and equity have grown. Third, AI applications in healthcare promise
to deliver high-quality medical care without relying on the expensive and
complex machinery traditionally required14,15. AI-driven diagnostics and
treatment plans can be implemented using more accessible and afford-
able technologies, such as smartphones and simple medical devices.
Such democratization of healthcare technology could enable remote and
underserved regions to access advanced medical care that was previously
out of reach. These regions must, however, partake in AI-powered life
science research to ensure that newly developed technologies meet local
needs and to build the capabilities and trust needed for application. In
short, the geography of AI research matters for harnessing AIspromises
to the benet of global patient populations.
Existing studies on the geography of AI research, both across sci-
entic disciplines and specic to the life sciences, describe a geo-
graphically concentrated enterprise. Studies have shown that China and
the United States (US) have come to dominate the AI research system in
terms of funding, active scientists and, consequentially, the number of
publications1618.Arecentmeta-analysisatthe intersection of general
Received: 8 June 2023
Accepted: 15 August 2024
Check for updates
1
University of Mannheim, Mannheim, Germany.
2
Heidelberg Institute of Global Health (HIGH), Medical School, Heidelberg University, Heidelberg, Germany.
3
HarvardCenter for Populationand Development Studies, Harvard University, Cambridge,USA.
4
AfricaHealth Research Institute (AHRI), Durban, SouthAfrica.
5
Leibniz Center for European Economic Research (ZEW), Mannheim, Germany. e-mail: schmallenbach@uni-mannheim.de
Nature Communications | (2024) 15:7527 1
1234567890():,;
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
and healthcare-specic AI, which reviewed 288 studies across the dis-
ciplines of accounting and management, decision sciences, and health
professions, documented a rapidly growing body of AI research, with the
US and China contributing the most publications19. The study that, to our
knowledge, comes closest to our focusonthelifesciences,analyzed
3529 scientic AI publications between 2000 and 2021, and again found
the US and China to be the most productive geographies based on the
number of publications20.Weprovideasummaryofourliteraturereview
in the Supplementary Material (S1).
We extend this emerging and productivity-focused line of
research by analyzing the geography of AI research in the life sciences
using three dimensions:
1. Productivity, i.e., publication counts at the country level as well as
at the level of world regions, with additional stratication of
publications by eld of AI application.
2. Quality-adjusted productivity, i.e., publications stratied by eld-
normalized quality rankings of the publishing outlets.
3. Relevance, i.e., forward citations received by a focal piece of
research, additionally stratifying citations into accruing from
general research and clinical research.
We apply the three dimensions to a sample of 397,967 AI life
science publications and 14.5 million associated citations, creating a
multidimensional global atlas spanning over two decades of research
(20002022).
A detailed sampling protocol, variable descriptions, econometric
techniques, and sensitivity analyses are outlined in the Methods. In brief,
we use keyword-based text mining and machine learning techniques to
identify and classify AI research at the intersection of the life sciences and
computer science. We use the standard bibliographic reference for life
science research, the PubMed database, to retrieve 374,501 AI-relevant
publications from life science journals. To cover computer science, we
use the OpenAlex database with its comprehensive indexing and identify
23,466 AI-relevant conference proceedings publications with a life sci-
ence focus. For constructing our global atlas of AI life science research,
we pool the datasets and henceforth use the terms articlesor pub-
licationsto refer to both journal and conference proceedings publica-
tions. To proxy the accuracy of our identication approach, we manually
inspect a random sample of 300 articles for AI and life science relevance
and test our obtained article coverage against a set of AI special issues in
life science journals, obtaining corroborating results. We then stratify the
obtained 397,967 AI life science publications by the country of afliation
of the lead author of the articles, i.e., the last author where available, and
the rst author otherwise, reecting common authorship norms21,22.
For the rst dimension of our atlas, we analyze the geography of
production both at the country level as well as at the level of world
regions, according to the six world regions dened by the United
Nations: Africa, Asia, Europe, Latin America, Northern America, and
Oceania23. We also stratify productivity by eld of AI life science
application, employing the OpenAlex content classication algorithm
and keyword-based identication of clinical research. To assess the
second dimension, we adjust productivity with a eld-normalized
approximation for quality, distinguishing articles published in the top
three ranked journals and conference proceedings publications for a
given eld. Finally, we assess the relevance of published research by
linking 14.5 million forward citations, distinguishing citations arising
from general versus clinical research. To analyze the geography of the
rst two dimensions (productivity and quality-adjusted productivity),
we use descriptive data visualization. To assess the geographic var-
iance in the relevance of the research produced, we use negative
binomial regression models. This class of models can accurately
estimate the inuence of geography, content, and quality of research
on relevance (i.e., citations) by also accounting for the skewed dis-
tributional properties of citations as the dependent variable.
The three-dimensionalassessment provides a nuanced geography
of the AI life science research enterprise. Asia leads the global pro-
duction of AI life science research in absolute terms, with China
accounting for over 50% of the regions publications. Examining the
content of publications reveals that many countries contribute to core
AI research areas in the life sciences (dimension 1). When productivity
is adjusted for quality (dimension 2), the regions of Northern America
and Europe contribute most publications in high-ranking outlets. We
also nd that the dimensions of quality and relevance are strongly
correlated, with research from Northern America and Europe receiving
a substantial citation premium relative to other regions (dimension 3).
This citation premium appears to be mostly explained by our
approximation of underlying research quality. We complement the
three-dimensional assessment of geography by examining interna-
tional (versus national) collaborations, dened as articles with at least
two authors on the author byline who are afliated with different
countries. We present evidence for greater relevance of research
conducted in international collaborations as opposed to national col-
laborations. Despite creating research of greater relevance, the share
of international collaborations stagnates and the propensity to colla-
borate internationally differs between world regions.
Results
Dimension one: productivity
We begin our assessment of productivity by documenting an expo-
nential increase in global AI life science publications (Fig. 1), quantied
by a 20% annual growth rate since 2010.
Continuing with the rst dimension of our atlas, we show a geo-
graphical concentration of AI life science research in the US (101,195
articles), followed by China (73,129 articles), together accounting for
about 44% of cumulative productivity between 2000-2022 (Fig. 2). Of
Fig. 1 | Evolution of the AI research enterprise in the life sciences. Yearly counts of articles (n=397,967) with AI-related keywords in titles or abstracts from 2000 to
2022. Growth refers to the compound annual growth rate (CAGR 20102022). Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
note, 2020 marks the rst year in which China has surpassed the US in
the number of publications per year in our dataset (see dynamic online
graph for details). In terms of cumulative productivity, there is a
marked gap between the US, China, and the next tier of countries,
which is led by the United Kingdom (21,215 articles), Germany (18,759
articles), Japan (15,263), Canada (12,578 articles), India (12,560 articles),
and South Korea (12,264 articles). Select countries, like India, show
differences between their productivity in life science journal publica-
tions versus computer science conference publications with a life
science focus. We provide a table showing all countriesindividual
productivity statistics in the Supplementary Material S2. While the
regions of Asia, Europe, Northern America, and Oceania all tangibly
contribute research, countries in Africa and Latin America show
moderate-to-low involvement in the AI life science research enterprise.
These data underscore two concerns: An almost bipolar geographic
concentration of AI research productivity, led by the US and China,
while countries from Africa and Latin America remain little involved in
AI life science research.
We next consider whether the observed geographic concentra-
tion goes in hand with a concentration in research topics and under-
lying capabilities, which may cater to productivity advantages of some
countries over others. In a rst step, we assign articles to content
categories available from the OpenAlex database. We provide further
details on the categorization in the Methods and in the Supplementary
Material (S3). We focus our analysis on the 40 most frequent content
categories in our dataset, representing, on average, two-thirds of AI life
science research across the 40 most productive countries. These 40
countries collectively account for 96% of global productivity in our
data. To examine the resulting content-by-country (40 × 40) data
matrix,wecreateaheatmapvisualizationinFig.3. The individual cells
of the heatmap contain the share of a countrys publications for a
specic content category relative to all publications by the same
country. This share, expressing nationsresearch foci, also denes the
heatmaps color, with darker shading representing less focus and
lightershadinggreaterfocus.Theheatmaprst indicates that there are
many elds that yet stand to gain from further AI applications, indi-
cated by the broad space covered by darker coloring across world
regions. Looking at the most productive AI life science research cate-
gories, such as computer vision, computational biology, neuroscience,
internal medicine, statistics, radiology, and surgery, there is a global
focus rather than geographic specialization. Thus, topic specialization
does not appear to be driving the concentration of productivity visible
in Fig. 2.
Extending our productivity stratication for content, we assess
the extent to which countries generally conduct clinical research with
the application of AI. Clinical research is of particular interest because
it reects research with potential applications that more directly
benet human health. To identify clinical research, we rely on a search
strategy proposed by Haynes and colleagues24,25, further described in
the Methods. Overall, AI-focused clinical research accounts for about
20% of the articles included in our sample. Figure 4depicts the geo-
graphic distribution across the 30 most productive countries together
accounting for 94% of global production of clinical AI research. The
primary vertical axis shows the share of a countrys clinical research
articles relative to all clinical research articles globally (blue bars),
while the secondary vertical axis shows the share of a countrys clinical
research articles relative to all AI life science articles published by that
country (orange bars). Comparable to general productivity, we
observe the US and China account for about 45% of AI clinical research,
with several countries from all world regions, except for Africa and
Latin America, contributing tangibly to the clinical AI research enter-
prise. Consistent with the content analysis presented in Fig. 3,wealso
nd that many countries devote 1520% of their research efforts to AI
clinical research.
Dimension two: quality-adjusted productivity
Next, we examine whether the geographic concentration we observe in
the number of publications is accompanied by a concentration in
quality. Scientic progress tends to be driven by research of unusual
rather than average quality26,27, traditionally motivating dedicated
examinations of the right-hand tail of the research quality
distribution28.
To adjust for quality with a eld-normalized approach, we use
external rankings of journals and conferences. For journal articles, we
Fig. 2 | Geographyof the AI life science research enterprise in terms of productivity.Counts of AI-focusedlife science articles by country, cumulated forthe years 2000
to 2022 (n= 397,967). Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
consider articles published in one of the top three journals within a
given journal category according to Clarivates Journal Citation Report.
For conference proceedings, we consider articles published in a pro-
ceedings publication of conferences ranked A*, according to the
CORE conference ranking29. For journal publications, this approach
classies about 8% of the research as appearing in high quality outlets,
and for conference proceedings publications about 6% (S4).
We nd that the US, Australia, and several European countries
contribute the largest shares of research in high-quality outlets over
the period 20002022 (Fig. 5). Compared to general productivity,
China, and other Asian countries, as well as countries in Latin America
rank in the mideld towards the lower-end of the quality-adjusted
productivity distribution. Africa, meanwhile, remains largely absent
from this mapping due to overall low productivity, including in top-
ranked outlets. A notable exception is Kenya, which has international
collaborators on two-thirds of its publications placed in high-ranking
outlets, while, for example, one-third of South African publications
have international collaborators. We discuss the role of internationally
collaborated research in a separate section below.
Moving the analysis from the country level to the level of world
regions, we seek to examine the consistency with which regions can
contribute to AI-focused life science research published in high-
ranking outlets. Figure 6depicts relatively stable proportions of
research that distinguish into two groups of regions. On the one
hand, there is the group of Northern America, Europe, and Oceania
that places consistently about 10% of their published research in
high-ranking outlets. On the other hand, there is a group consisting
of Asia, Latin America, and Africa, who publish about 5% of papers in
these top-ranked outlets. Europe and Asia have shown opposite
trends in recent years, with Europe gradually decreasing and Asia
gradually increasing their respective shares of publications in high-
quality outlets.
Dimension three: relevance
To assess the third dimension of the atlas, we examine geographic
variance in the relevance of the produced research. We con-
ceptualize relevance as the extent to which focal publications inform
(a) scientic progress (scientic relevance) and (b) clinical
Fig. 3 | Heatmap of relative country focus with respect to publication topics.
The horizontal axis enlists the 40 most productive countries grouped by geo-
graphic region. The vertical axis depicts the underlying publication topics in des-
cending order (computer vision being the most frequently researched topic). The
color scheme of the heatmap reects the percentage share of country-specic
productivityfor a given publication topic(n= 397,967) . Source data are provided as
a Source Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 4
Content courtesy of Springer Nature, terms of use apply. Rights reserved
application (clinical relevance). We operationalize relevance via for-
ward citations to the AI life science articles in our sample. As
econometric model, we employ negative binomial regression models
to account for the overdispersion of citation measures. We regress
citation counts on dummy variables representing the six geographic
regions, setting the most productive region, Asia, as the base cate-
gory. We control for the publication year to account for the time a
given article had to accrue citations. Figure 7shows incidence rate
ratios (IRRs) obtained from the negative binomial regression models.
These ratios can be interpreted as percentage changes in the
dependent variable, citations, given a one-unit change in the
independent dummy variables, i.e., given the geography of the focal
articles across the six world regions.
Scientic relevance
We assess an articlesscientic relevance as the number of forward
citations an article receives from general life science research articles.
We nd that AI-focused life science research produced in the world
regions of Africa, Oceania, Europe and Northern America receives
about 10% (95% condence interval (CI) 6%15%), 26% (95% CI
23%29%), 20% (95% CI 19%22%), and 40% (95% CI 38%42%) more
forward citations in general life science articles, respectively, than
Fig. 4 | Clinical AI research across countries. The share of a countrys clinical
researchrelative to global clinical research production (primary y-axis) and relative
to all publications within the same country (secondary y-axis) for the 30 most
productive countries in terms of clinical articles (n= 67,167). Source data are
provided as a Source Data le.
Fig. 5 | Geography of the AI life science research enterprise in terms of quality-
adjusted productivity. Percentage shares of AI-focused life science articles pub-
lished in high-ranked outlets by country, cumulated for the years 2000 to 2022
(n= 31,837). The analysis is limited to countries with at least 100 publications.
Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
research created in Asia (Fig. 7A). Research produced in Latin America,
in comparison, receives fewer forward citations than research
from Asia.
To adjust for the quality of the underlying research, we next
include dummy variables for each journaland conference proceedings
outlet in our regression model (i.e., outlet xed effects). The inclusion
of xed effects adjusts forany geographical variance in researchtied to
the outlet, including quality ranking and subject matter published. In
this adjusted model, with the exception of Africa and Latin America,
world regions are no longer statistically different in terms of forward
citations in downstream life sciences research (Fig. 7D). In other words,
the citation differences between geographic regions appear to be
largely explained by regional differences in research quality, which is
consistent with the geographic variance in research quality shown
in Fig. 5.
Clinical relevance
Ultimately, AI is expected to transform medicine. We therefore seek to
analyze the inuence of AI life science research on clinically applied
research. Figure 7B shows that the regions of Oceania, Europe and
Northern America receive a citation premium from downstream clin-
ical research articles (about 13% (95% CI 8%17%), 26% (95% CI
24%29%), and 55% (95% CI 52%57%) respectively), compared to
research generated in Asia, analogous to the scientic relevance
dimension (Fig. 7A). The greater number of clinical citations to AI life
science articles from these three regions again appears to be explained
by our approximation of the underlying research quality (Fig. 7C).
Overall, the ndings in Fig. 7indicate that the differences in sci-
entic and clinical relevance are driven by differences in quality rather
than geographic bias in citation patterns. In other words, the cumu-
lative knowledge-building process in the AI research enterprise
appears to be largely unbiased with respectto the geographic location
of the knowledge-creating researchers.
International collaborations
Lastly, we return to the argument that scientic progress is driven by
collaborating on the best ideas, irrespective of the ideasgeography.
We analyze international collaborations in our dataset and dene
articles as international if at least two authors on the author byline are
afliated with institutions from different countries. We focus this
analysis on the relevance dimension, because it is the best proxy for
0% 5% 10% 15%
share of publications in high-quality outlets
2000 2005 2010 2015 2020
publication year
Africa Asia Europe
Northern America Latin America Oceania
by publication years (2000-2022)
Share of High-Quality Publications within World Region
Fig. 6 | Geography of the AIlife science research enterprise in terms of quality-
adjusted productivity. Percentage shares of AI life science articles published in
high-ranked outlets by geographic regionand per year (n= 32,010).Source data are
provided as a Source Data le.
0.5 1.5 2.0
Asia
Africa
Europe
Latin America
Northern America
Oceania
0.5 1.5
Asia
Africa
Europe
Latin America
Northern America
Oceania
0.5 1.5
Asia
Africa
Europe
Latin America
Northern America
Oceania
0.5 1.5 2.0
Asia
Africa
Europe
Latin America
Northern America
Oceania
A. Scienfic relevance B. Clinical relevance
D. Quality-adjusted
scienfic relevance
C. Quality-adjusted
clinical relevance
Fig. 7 | Geography ofthe relevance of AIlife science articles in terms of forward
citations in life science research (Figs. 7A, D) and clinical research (Figs. 7B, C).
All panels depict incidence rate ratios (IRRs) with error bars for 95% condence
intervals obtained from negative binomial regressions of citations on dummy
variables for the geography ofthe research, with the most productive region, Asia,
serving as the base category. A(n=397,965)andB(n= 397,965) show unadjusted
estimates (only accounting for publication year), whereas (C)(n= 393,722) and (D)
(n= 375,033) also include controls for quality variation across publishing outlets. As
publishing outlets with all zero outcomes for the dependent variable (i.e., pub-
lishingresearch that is not cited)get automaticallydropped from the analyses with
qualitycontrols, the sample sizesare smaller in (CandD). Source data areprovided
as a Source Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
what kind of research informs the advancement of the global research
enterprise.
We again estimate negative binomial regression models with
citations as dependent variables and a dummy variable for interna-
tional collaboration asthe core independent variable. In the analysis of
a potential citation differential between research from international
versus national collaborations, we control for three factors. We
include dummy variables for the lead author country to account for
regional variance in international collaboration. We control for the
number of co-authors because larger author teams are more likely to
include a co-author from another country and team size has been
shown to correlate with citations7. Additionally, we control for the
publication year of a focal article to account for the time it had to
accrue citations.
We nd that articles stemming from international rather than
national collaborations receive, on average, 21% (95% CI 20%22%) more
citations by general life science articles and 7% (95% CI 6%8%) more
citations by clinical life science articles (Fig. 8A). Of note, international
collaborations also tend to publish 35% more frequently in high-ranking
research outlets than national collaborations, on average.
Despite apparent benets of collaborating across borders, the
share of internationally collaborated research is with less than 20%
over time relatively low and has come to stagnate in propor-
tion (Fig. 8B). However, the extent to which regions engage in inter-
national collaboration varies. Figure 8Cshowstheshareof
publications that stem from international collaboration by region of
the lead author. While African lead authors coauthor 36% of their
publications with at least one collaborator from a different country,
Asian lead authors do so for only 16% of their articles. Oceania (32%),
Europe (27%), and Latin America (23%) range in between, whereas
Northern America also tends to emphasize national over international
collaborations (18%).
To further contextualize this cross-regional variance in interna-
tional collaborations, our nal analysis characterizes the dyadic rela-
tionships between regions that engage in international collaborations.
Figure 9presents an alluvial diagram to show patterns of international
collaboration, including, by construction, only the articles identied as
international. We count each occurrence of a difference in geographic
location separately and sum international collaborations to the
regional level. In other words, if a lead authors country of afliation is
Fig. 8 | Characteristics of international collaborations. The effect of interna-
tional collaboration on scientic and clinical relevance (A); share of international
collaborations over time (B); share of international collaborations by region (C).
Incidence rate ratios (IRRs) with error bars for 95% condence intervals obtained
from negative binomial regressions of citations (n= 397,949) and clinical citations
(n= 397,887) on a dummy variable for international collaboration, accounting for
country of lead author, team size, and publication year (A). Percentage share of
articles with at least two authors afliated in different countries (n=397,965)(B).
Percentage share of articles with at least two authors afliated in different coun-
tries by geographi c region (n= 397,965) (C). Source data are provided as a Source
Data le.
Fig. 9 | Alluvial diagram of international collaborations. Number of dyadic
collaborations between authors from different countries, aggregated to the
regional level. Dyadic collaborations are counted as co-authorships between a
publications lead author (last author or rst author otherwise) and any other
author on the author byline that is from a different country. Only international
dyads are considered (n=105,258 dyads). Source data are provided as a Source
Data le.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
the US and the lead author collaborates with co-authors from China
and Germany, then we depict two lines in the alluvial diagram, one
from Northern America to Asia and one from Northern America to
Europe. The vertical bars on the left depict the sum of outgoing
international collaborations from lead authors afliated in the
respective region, while the right vertical bars depict the incoming
collaborations for non-lead authors from the respective region.
Overall, Fig. 9shows that Europe engages most frequently in
international collaborations, both from an outgoing perspective (lead
authors) as well as an incoming perspective (other authors), repre-
sented in the blue vertical bars on both sides of the diagram. European
researchers most frequently collaborate with colleagues from the
same geographic region, followed by Northern America. But Europe
also appears to play an important role in partnering with African and
Latin American researchers. Oceanias international collaborations
appear most pronounced withAsia. Africa collaborates frequently with
European and Asian researchers. Latin Americaappears more varied in
its international collaboration patterns, but also appears collaborating
most frequently with researchers based in Europe. Northern American
lead authors tend to mostly co-author with colleagues from the same
region, followed by collaborations with Asia and Europe.
Discussion
Prior research with a focus on productivity has identied the US and
China as the maincontributorsof AI research.In this study, we rst test
whether this general nding holds true in the speciccontextofAI
applications in life science research and document that the US and
China produced almost half of the global AI life science research
between 2000 and 2022. Taking a regional perspective, Asia leads
global production. We then extend this one-dimensional perspective
on research output by considering two additional dimensions: quality-
adjusted productivity and relevance. We show that the geography of
global AI life science research changes depending on the dimension
under consideration. For example, we show that the world regions of
Northern America and Europe produce most life science articles
published in high-ranking outlets and, alongside Oceania, produce
work that most advances the AI life science research enterprise.
Meanwhile, the world regions of Latin America and Africa markedly lag
as contributorsto the AI life science research enterprise. We show that
exceptions to this pattern, like select African countries, dis-
proportionally engage in international collaborations to produce
research in high-ranking outlets that is also of high relevance.
The productivity-focused analysis of AI research in prior literature
has contributed to concerns about national research agendas potentially
undermining the effective and equitable advancement of AI research
across science elds. In the wake of rising nationalism and protectionism,
researchers have come to summarize these bipolar geographic dynamics
as a ChinaUS arms racein AI30. The public discourse feeds this con-
ception. For example, in 2017, China announced a program for the
domestic development of AI with the objective of becoming the worlds
leading AI region by 2030 and has recently underscored its ambitions by
pledging a record investment in AI-enabling infrastructure31.AsAI
research uniquely requires large-scale investments, including scaled
computing resources, trained human capital, and encompassing data,
China successfully accelerates its AI research program32.Thesegeopoli-
tical dynamics invigorate the arms raceperspective on AI research,
which is to be appreciated in light of evidence that governmentsAI
investment can also be politically motivated33. Our assessment of pro-
ductivity mirrors the US-China duopoly perspective also for the life sci-
ences, and it remains subject to further research on how this geographic
concentration in productivity inuences the advancement of AI in the life
sciences and elsewhere longer term.
However, considering quality-adjusted productivity and rele-
vance as two further dimensions of evaluation provides additional,
different perspectives. Other world regions, home to many countries,
produce AI life science research that is disproportionally used in
advancing the AI research enterprise. Research from the regions of
Europe, Oceania, and Northern America gets cited at 1.2 to 1.5 times
higher rates in general and clinical research when compared to other
world regions. This citation premium appears explained by the quality
of the underlying research. Some of this quality stratication proxied
by the eld-specic ranking of the outlets publishing the research may
stem from differential access to journals or conferences across geo-
graphies. Not all researchers may be equally comfortable publishing
their ndings in English, generally the standard language of academic
communication in international journals, for example. Still, journal and
conference proceedings publications remain the central avenue for
cumulative knowledge building in the sciences34. Overall, our results
show that the regions of Northern America, Oceania, and Europe are
key regions producing relevant research to advance the AI life science
research enterprise.
Geographic differences in research quantity and quality may fur-
ther hold implications for how scholars model the evolution of the AI
research enterprise more generally. Studies predicated on quantitative
productivity have mostly equated investments in inputs with superior
outputs. For example, cities able to attract the largest number of AI
scientists have been found to emerge as the cities accomplishing the
largest number of AI publications16. Similarly, national research fund-
ing has been correlated with publication output17.Thesendings
notwithstanding, our study argues and shows that output can be
conceptualized along multiple dimensions. While countries in our
study each individually operate with a given set of inputs forproducing
AI research, there is considerable variance in output when comparing
quantity to quality and ensuing relevance. As such, we submit that
further research is needed that examines the input-output relationship
in AI research, in the life sciences and possibly other disciplines, to
better understand research trajectories.
Beyond different types of geographic concentrations, we nd that
international collaborations produce more relevant research than
national collaborations. Consistent with previous research highlighting
the importance of scientists collaborating across borders810,wend a
citation premium of more than 20% for international versus national
collaborations, specically in the AI life sciences context. Despite this
apparent importance of internationally conducted AI research for
cumulative knowledge building in the life sciences, the rate at which
scientists collaborate internationally appears to stagnate. Exceptions to
this pattern emerge in select countries that successfully use international
collaborations to disproportionally produce research in high-ranking
outlets and of high relevance. While the average rate of international
collaborations hovers around 20% in our data, countries like Kenya
collaborate internationally on over 40% of their publications and even on
two-thirds of the publications placed in high-ranked outlets. Interna-
tional collaborations may thus prove instrumental for broadening geo-
graphic participation in the AI life science research enterprise.
We also nd, however, that the proclivity to international colla-
boration varies. The most productive world regions of Northern
America and Asia team across borders at the lowest rates, while sci-
entists located in Europe collaborate internationally at higher ratesand
more geographically distributed. Europe collaborating internationally
may thus particularly cross-pollinate research conducted in the world
regions of Africa and Latin America, which overall create a tangible
share of their productivity through international collaborations.
Lastly and importantly, ourglobal atlas showsmany world regions
remain moderately or little involved in the AI life science research
enterprise. Countries in Africa and Latin America account for less than
5% of global AI research in the life sciences. These two world regions
are home to more than 25% of the world population and experience
more than half of the global disease burden35. That is not to say that the
existing research does not tackle research questions that are also
germane to these regions. In fact, the possibility of scaling AI
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 8
Content courtesy of Springer Nature, terms of use apply. Rights reserved
applications across world regions may lead to marked benets for
many countries,even if countries arenot all involved to a similar extent
in the creation of the research. Still, our ndings add to a concern that
may be especially applicable to the life sciences. The prowess of AI
often depends on the data foundation fed to learning-based models. If
research remains geographicallyconcentrated, it stands to reason that
data foundations evolve in an unbalanced fashion. In turn, the imbal-
ance could lead to biased AI models producing biased recommenda-
tions. Patient populations are diverse in terms of gender, race, and
ethnicity, as well as other attributes, like socio-demographic status or
access to healthcare systems. To mitigate the risk of AI informed
medical care being biased towards certain demographics, straddling
these different characteristics requires more and accelerated research
and building the necessary capabilities and training datasets globally.
Parts of the life science community have voiced these concerns, and
studies have begun to selectively expose such biases36. Our global atlas
may be viewed as underscoring the geographic magnitude of these
concerns and points to examining the desirability and design of
potential countermeasures.
Our study is not without limitations. First, we use the afliation
country of the lead author (last author where available and rst author
otherwise) to determine the geography of a focal article. Although this
approach is in line with characterizing research according to the
characteristics of lead authors21,37, it still focuses on the academic
creators of research. Future work may enhance the global atlas of AI
life science research by, for example, considering the location of
supporting funding institutions or the geography of academy-industry
collaborations. Second, we rely on a keyword-based identication of
clinical research to distinguish the nature of forward citations. Future
research examining the nature and detailed content of clinical studies
seems warranted. For example, scholars might more qualitatively
examine the kind of clinical research that draws on AI techniques, as
well as characterize the medical elds most poised to benet clinically
from AI. Finally, the goal of our study is to provide an atlas of AI life
science research. As a corollary, we focused on the supply sideof
research. Bridging this supply-side perspective to a demand-side per-
spective seems a fruitful research area, addressing questions like what
patient populations stand to gain (or lose) from AI advances.
Third, our research, which covers AI applications in the life sci-
ences through 2022, largely misses the very recent surge in studies
using large language models (LLMs). These models are poised to have
a major impact on a range of biomedical applications, from synthe-
sizing expert literature to improving patient communication and
medical education. The rapid development and integration of LLMs
highlight the need for ongoing research to better understand their
capabilities and ensure equitable benets. Consideration of the geo-
graphy of LLM applications is likely critical to addressing access dis-
parities, understanding local implementation challenges, and
promoting global health equity.
In conclusion, our study offers a global atlas of AI life science
research published between 20002022 along three dimensions:
productivity, quality-adjusted productivity, and relevance. We show
that geographic gravity changes across these dimensions. Overall, the
productivity dimension shows Northern America and Asia to dom-
inate, led by the US and China respectively. By contrast, the world
regions of Northern America, Oceania, and Europe, with several
countries contributing to publications in high-quality outlets, produce
research most relevantfor advancing the AI lifescience enterprise. The
world regions of Latin America and Africa remain largely absent from
the global atlas of AI life science research. Beyond this differentiating
geographical view, we show that integrating international collabora-
tions is instrumental for the creation of relevant research. Yet, the
internationality of the AI life science research enterprise stagnates. To
best advance AI research, concerted international efforts may be
warranted.
Methods
To identify AI-focused life science articles, we take a two-pronged
approach that reects the interdisciplinary nature of this research. On
the one hand, we start from the life sciences per spective, turning to the
PubMed XML database as the worlds most comprehensive inventory
of biomedical literature, with more than 35 million articles linked to a
range of supporting information. On the other hand, we startfrom the
computer science perspective, turning to articles published in con-
ference proceedings indexed in OpenAlex, a database successor of
Microsoft Academic Graph (MAG), containing detailed bibliometric
information on more than 250 million scholarly works. Recent
research documents OpenAlex to have the widest coverage of aca-
demic publications, especially for non-journal publications38.We
adopt the search strategy established by Baruffaldi et al.29, which relies
on an encompassing keyword search for articles containing AI terms in
their title or abstract, and we apply the approach to both data foun-
dations. Baruffaldi and colleagues followed a three-step approach to
create a set of query terms for bibliometric databases that accurately
retrieve documents focused on AI. In the rst step, the authors iden-
tied articles published in AI-tagged journals and conference pro-
ceedings according to the All Science Journal Classication (ASJC). In
the next step, the authors identied keywords listed in these docu-
ments and performed a co-occurrence analysis of these keywords
based on the titles and abstracts of the AI-tagged documents. Only
keywords that appeared at least 100 times and belonged to the top
60% in terms of relevance were kept. Finally, this list of keywords was
presented to and approved by a group of AI experts from academia
and industry. We provide the nal list of 214 AI-related keywords in the
Supplementary Material (S5).
For our identication of AI-focused life science research indexed
in PubMed, we use these 214 keywords to identify publications con-
taining any of them in either the title or abstract of articles published
between 2000 and 2022. In total, we identied 388,633 AI-related life
science articles indexed in PubMed. To identify AI-focused life science
research in conference proceedings, we rst searched for the 214 AI
keywords in the title and abstract of 2.4 million documentslinked to all
10,794 conferences listed in OpenAlex. In the next step, we apply the
content classication embedded in the OpenAlex database to identify
AI research that also addresses concepts relevant to the life sciences.
OpenAlex tags articles with multiple concepts representing their
topical focus, using an automated state-of-the-art machine learning
classier based on titles and abstracts with condence scores indi-
cating relevance39. These scientic concepts are organized hier-
archically, with 19 root-level concepts branching into six levels of
specic topics. When a lower-level concept is mapped, all its parent
concepts are mapped as well, ensuring comprehensive coverage39,40.
We consider articles that have been assigned at least one of the fol-
lowing four top-level concepts (dened as level 0 in Open Alex) related
to life science research: Biology, Chemistry, Medicine, or Psychology.
We cross-verify the representativeness of these terms for life science
research in our sample of PubMed articles. Here, these four terms
represent more than 80% of the indexed life science research. This
approach gives us 28,848 conference proceedings publications at the
intersection of AI and the life sciences.
We evaluate the accuracy of our approach, summarize the results
here, and provide further details in the Supplementary Material (S6).
To test precision, we examine a random sample of 150 PubMed articles
and 150 conference articles from our nal dataset, employing two
independent reviewersto rate the PubMed articles for AI focus and the
conference articles for life science relevance. In an iterative process,
90% of the PubMed articles were designated as containing a certain
type of AI application and 93% of the conference articles as having a life
science focus. In addition, we evaluate the comprehensiveness of our
approach in identifying AI life science research by looking at the cov-
erage ofarticles published in AI specialissues of life science journals. In
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
a sample of 15 special issues published in 2022, we nd that 92% of the
articles published in these special issues are also part of our sample.
We also compare the chosen search strategy with a second search
strategy for AI-related publications proposed by Liu and colleagues41.
We nd that while the Liu et al. approach is slightly more precise, the
approach proposed by Baruffaldi and colleagues yields more than
twice as many articles, making it more comprehensive.
We map the 417,481 articles identied with the above search
approach to countries using the afliation of the lead author as
recorded in OpenAlex, if available, and otherwise as recorded in
PubMed. To link the corpus of PubMed articles to OpenAlex, we
leverage unique article identiers, i.e., PubMed IDs (PMID) and/or
Digital Object Identiers (DOI), and query the OpenAlex application
programming interface. To identify the lead author of an article, we
use long-established authorship norms21,22,thatreservetherst and
last author positions for the lead authors of an article. Since the
last author is usually the more senior author, who typically sponsors
the necessary research infrastructure (e.g., laboratory and
ofce space), we designate the geographic location of anarticle based
on the country of afliation associated with the last author when
available. Otherwise, we use the afliation information for the rst
author. For approximately 90% of the articles in our sample, the
afliation of the rst and last author is linked to the same country. We
identify a country for 397,967 (95%) articles that make up the nal
sample of our analysis. We assess the reliability of our country
assignment for a random sample of 300 articles with the help of two
independent raters andobtain a correctcountry assignment for 99% of
the observations.
Using our nal sample, we then map the individual countries to
geographic regions according to the United Nations classication:
Africa, Asia, Europe, Latin America (including the Caribbean),Northern
America, and Oceania23. We also collect information on the afliation
of interior authors and use their location to identify international
collaborations, which we dene as articles where at least one author is
afliated with a country that is different from that of the lead author.
Figure 10 summarizes the steps of our sample creation approach.
To enrich our data with information about the content of indivi-
dual articles, we again make use of the concepts provided in the
OpenAlex database. Specically, we assign each article to the level 1
concepts with the highest relevance score. If this highest scoring
concept is a purely AI-related term, we assign the next highestscoring
concept to ensure that the assigned concept reects the context of life
science applications. In addition, we create a subset of articles related
to clinical research by performing a keyword search for clinical key-
words in the title or abstract24,25.
We expand the dataset for our quality-adjusted productivity
analysis with journal-level information fromClarivates Journal Citation
Report (JCR 2020). We link our data to this report using the unique
International Standard Serial Number (ISSN) of the publishing journal.
In total, 328,062 (88%) articles of our PubMed corpus were published
in a journal indexed by the JCR. Of relevance to our analysis is the
journals rank within the same journal category based on the journals
impact factor. The Pearson correlation between journalimpact factors
from different vintages of the JCR is generally greater than 0.9, indi-
cating little temporal variance in the scaling of the metric37.Wecon-
sider any publication published in one of the three highest-ranked
journals within the same journal category. Because journals can be
assigned to multiple categories, we consider the journal category in
which a focal journal ranks highest. PubMed articles not published in
indexed journals were not considered as Clarivate sets a quality
threshold for journal inclusion in the index. Using this approach, we
identify 29,510 (8%) articles in our PubMed corpus as being published
in high-ranked journals.
We further identify high-quality conference proceedings by
making use of an external conference ranking, the so-called CORE
ranking, provided by the Computing Research and Education Asso-
ciation of Australasia. CORE provides expert-based assessments of all
major conferences in the computing disciplines with information on
their research subeld and is a standard resource for ranking com-
puter science conferences29. We consider all publications in A*-rated
conference proceedings to be of high quality, resulting in 1349 articles
(6% of our total sample). We differentiate our analyses between life
science and computer science articles in S7.
To characterize the scientic and clinical relevance of an article,
we leverage detailed forward citation data from OpenAlex. The data-
base provides not only detailed bibliometrics and metadata but also
more than 1 billion citation links between publications. We identify
these citation links for the articles in our sample by their PMIDs, DOIs,
and OpenAlex IDs. We further distinguish citations into thoseaccruing
from any type of publication and those from clinical research only. For
this distinction, we again use the keyword-based approach of Haynes
and colleagues to identify clinical research24,25.
Analysis
We analyze the geography of AI life science research according to the
dened three dimensions, namely productivity, quality-adjusted
10,794 conferences
2.4 million articles
Biomedical literature
35 million articles
388,633 articles
AI-related life science
research
231,000 articles
AI-related research
28,848 articles
AI-related life science
research
397,967 articles
with assigned countries
Fig. 10 | Sample creation. Overview of sample creation using PubMed and OpenAlex as main databases.
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 10
Content courtesy of Springer Nature, terms of use apply. Rights reserved
productivity, and relevance, employing data visualization and regres-
sion models. More specically, we visualize descriptive statistics,
including publicationcounts (Figs. 1and 2) and geographic percentage
shares stratied by content and quality of underlying research
(Figs. 36). To gauge the scientic and clinical relevance of AI life
science research by geographic region, we use negative binomial
regression models with the number of citations as the dependent
variable and dummy variables for the geographic regions as the main
independent variables (Fig. 7). Additionally, we account for publication
years using dummy variables in all our regression analyses to nor-
malize for the time an article was at risk of being cited. To estimate the
scientic and clinical relevance of AI-related articles conditional on the
underlying quality and content, we run additional models with more
than 9000 publication outlet xed effects. These xed effects (i.e.,
dummy variables for each outlet) absorb any confounding effects of
time-stationary outlet characteristics on citations, including journals
and conference proceedingspublished content and quality. As jour-
nals and conference proceedings with all zero outcomes for the
dependent variable (i.e., publishing research that is not cited) get
automatically dropped from the xed effects analyses, the sample
sizes are smaller in the corresponding regression models. Our results
remain consistent when estimating all models on the smaller samples.
We present our results as incidence rate ratios (IRRs) relative to the
baseline geographic region (Asia). Finally, we track the share of inter-
national collaborations over time as the share of research papers fea-
turing at least two authors with afliations from different countries
(Fig. 8). We estimate the scientic and clinical relevance of these
international collaborations in negative binomial regression models,
again controlling for publication year dummy variables, the locale of
the lead author, and the total number of authors on the author byline.
We characterize the geography and direction of international colla-
borations by considering the country of the lead author as the out-
going country and counting all dyadic connections between the
country of the lead author and any other country of the rst 10 authors
listed on the focal publication (Fig. 9). Importantly, 99% of the pub-
lications in our sample list 10 or fewer authors. Analyses are conducted
in Stata. Data visualizations are created with Python and Prism.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
The publication data assembled in this study have been deposited in
the Figshare database [https://doi.org/10.6084/m9.gshare.24412099].
Source data are provided with this paper.
Code availability
The computer code to perform the analyses of this study has been
deposited in the Figshare database (https://doi.org/10.6084/m9.
gshare.24412099).
References
1. Matheny,M.E.,Whicher,D.&Israni,S.T.Articial intelligence in
health care: a report from the National Academy of Medicine. JAMA
323,509510 (2020).
2. Copeland, B. Articial Intelligence. In: Encyclopedia Brit-
annica (2024).
3. Turbé, V. et al. Deep learning of HIV eld-based rapid tests. Nat.
Med. 27, 11651170 (2021).
4. Leite, M. L. et al. Articial intelligence and the future of life sciences.
Drug Discov. Today 26,25152526 (2021).
5. Noorbakhsh-Sabet, N., Zand, R., Zhang, Y. & Abedi, V. Articial
intelligence transforms the future of health care. Am. J. Med. 132,
795801 (2019).
6. Bohr, A. & Memarzadeh, K. The rise of articial intelligence in
healthcare applications. Articial Intelligence in Healthcare,2560
https://doi.org/10.1016/B978-0-12-818438-7.00002-2 (2020).
7. Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of
teams in production of knowledge. Science 316,10361039 (2007).
8. Adams, J. The fourth age of research. Nature 497,557560 (2013).
9. Jones, B. F., Wuchty, S. & Uzzi, B. Multi-university research teams:
Shifting impact, geography, and stratication in science. Science
322,12591262 (2008).
10. Coccia, M. & Wang, L. Evolution and convergence of the patterns of
international scientic collaboration. Proc. Natl. Acad. Sci. 113,
20572061 (2016).
11. Beam,A.L.etal.Articial intelligence in medicine. N. Engl. J. Med.
388, 12201221 (2023).
12. Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. &
Ghassemi, M. Underdiagnosis bias of articial intelligence algo-
rithms applied to chest radiographs in under-served patient popu-
lations. Nat. Med. 27,21762182 (2021).
13. Ricci Lara, M. A., Echeveste, R. & Ferrante, E. Addressing fairness in
articial intelligence for medical imaging. Nat. Commun. 13,
4581 (2022).
14. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and
medicine. Nat. Med. 28,3138 (2022).
15. Wahl, B., Cossy-Gantner, A., Germann, S. & Schwalbe, N. R. Articial
intelligence (AI) and global health: how can AI contribute to health
in resource-poor settings? BMJ Glob. Health 3, e000798
(2018).
16. AlShebli, B. et al. Beijings central role in global articial intelligence
research. Sci. Rep. 12, 21461 (2022).
17. Abadi,H.H.N.,He,Z.&Pecht,M.Articial intelligence-related
research funding by the US national science foundation and the
national natural science foundation of China. IEEE Access 8,
183448183459 (2020).
18. Klinger, J., Mateos-Garcia, J. & Stathoulopoulos, K. Deep learning,
deep change? Mapping the evolution and geography of a general
purpose technology. Scientometrics 126,55895621 (2021).
19. Secinaro, S., Calandra, D., Secinaro, A., Muthurangu, V. & Biancone,
P.Theroleofarticial intelligence in healthcare: a structured lit-
erature review. BMC Med. Inform. Decis. Mak. 21,123 (2021).
20. Xu, D., Liu, B., Wang, J. & Zhang, Z. Bibliometric analysis of articial
intelligence for biotechnology and applied microbiology: Exploring
research hotspots and frontiers. Front. Bioeng. Biotechnol. 10,
998298 (2022).
21. Fernandes, J. M., Costa, A. & Cortez, P. Author placement in com-
puter science: a study based on the careers of ACM Fellows. Sci-
entometrics 127,351368 (2022).
22. Lerchenmüller, C., Lerchenmueller, M. J. & Sorenson, O. Long-term
analysis of sex differences in prestigious authorships in cardiovas-
cular research supported by the national institutes of health. Cir-
culation 137,880882 (2018).
23. UN. Denition of World Regions. (ed Affairs DoEaS). United
Nations (2022).
24. Haynes, R. B., McKibbon, K. A., Wilczynski, N. L., Walter, S. D. &
Werre, S. R. Optimal search strategies for retrieving scientically
strong studies of treatment from Medline: analytical survey. BMJ
330,1179(2005).
25. Del Fiol, G., Michelson, M., Iorio, A., Cotoi, C. & Haynes, R. B. A deep
learning method to automatically identify reports of scientically
rigorous clinical research from the biomedical literature: com-
parative analytic study. J. Med. Internet Res. 20, e10281 (2018).
26. Merton, R. K. The Matthew effect in science. The reward and com-
munication systems of science are considered. Science 159,
5663 (1968).
27. Fortunato, S. et al. Science of science. Science 359,
eaao0185 (2018).
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
28. Singh, J. & Fleming, L. Lone inventors as sources of breakthroughs:
myth or reality? Manag. Sci. 56,4156 (2010).
29. Baruffaldi, S. et al. Identifying and measuring developments in
articial intelligence: Making the impossible possible. OECD Sci-
ence, Technology and Industry Working Papers, No. 2020/05,
168 (2020).
30. Allison, G. & Schmidt, E. Is China Beating the US to AI Supremacy?
Harvard Kennedy School, Belfer Center for Science and International
Affairs,124 (2020).
31. Ye,J.Chinatargets50%growthincomputingpowerinraceagainst
the U.S. Reuters, 9 October. Availabl e at: https://www.reuters.com/
technology/china-targets-30-growth-computing-power-race-
against-us-2023-10-09/ (2023).
32. Lundvall, B.-Å. & Rikap, C. Chinas catching-up in articial intelli-
gence seen as a co-evolution of corporate and national innovation
systems. Res. Policy 51, 104395 (2022).
33. Beraja, M., Kao, A., Yang, D. Y. & Yuchtman, N. AI-tocracy. Q. J. Econ.
138,13491402 (2023).
34. Lerchenmueller, M. J. & Sorenson, O. The gender gap in early
career transitions in the life sciences. Res. Policy 47,10071017
(2018).
35. Vos, T. et al. Global, regional, and national incidence, prevalence,
and years lived with disability for 310 diseases and injuries,
19902015: a systematic analysis for the global burden of disease
study 2015. Lancet 388,15451602 (2016).
36. Tat, E., Bhatt, D. L. & Rabbat, M. G. Addressing bias: articial intel-
ligence in cardiovascular medicine. Lancet Digital Health 2,
e635e636 (2020).
37. Lerchenmueller, M. J., Sorenson, O. & Jena, A. B. Gender differ-
ences in how scientists present the importance of their research:
observational study. BMJ 367, l6573 (2019).
38. Alperin,J.P.,Portenoy,J.,Demes,K.,Larivière,V.&Haustein,S.An
analysis of the suitability of OpenAlex for bibliometric analyses.
arXiv preprint arXiv:240417663 (2024).
39. Wang, K. et al. Microsoft academic graph: when experts are not
enough. Quant. Sci. Stud. 1,396413 (2020).
40. Priem, J., Piwowar, H., & Orr, R. OpenAlex: A fully-open index of
scholarly works, authors, venues, institutions, and concepts. arXiv
preprint arXiv:2205.01833 (2022).
41. Liu, N., Shapira, P. & Yue, X. Tracking developments in articial
intelligence research: constructing and applying a new search
strategy. Scientometrics 126,31533192 (2021).
Acknowledgements
We thank Jennifer Hahn and Luca Caprari for outstanding research
support. LS is supported by the Joachim Herz Foundation. MJL received
nancial support through the FAIR@UMA program of the University of
Mannheim. MJL and TWB received nancial support in scope of the
Helmholtz Information and Data Science School for Health
(HIDSS4Health). The publication of this article was partially funded by
the University of Mannheim.
Author contributions
LS, MJL, and TWB devised the original idea. LS and MJL assembled and
analyzed the data. LS and MJL wrote the manuscript. TWB edited the
manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Competing interests
MJL is a co-founder and shareholder of AaviGen GmbH, a cardiovascular
gene therapy company. The present study is not related to the company.
The remaining authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-024-51714-x.
Correspondence and requests for materials should be addressed to
Leo Schmallenbach.
Peer review information Nature Communications thanks Ravi Parikh,
and the other, anonymous, reviewer(s) for their contribution to the peer
review of this work. A peer review le is available.
Reprints and permissions information is available at
http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jur-
isdictional claims in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the articles Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the articles Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2024
Article https://doi.org/10.1038/s41467-024-51714-x
Nature Communications | (2024) 15:7527 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... It can visualize the detailed results and help researchers to develop a thorough understanding of the research trajectory in the field and identify research hotspots and gaps. For example, a recent bibliometric analysis presents a comprehensive global overview of artificial intelligence in life science research and suggests that coordinated international research efforts are necessary to advance this research area (Schmallenbach et al., 2024). As bibliometric analysis offers both quantitative and qualitative insights into the influence and evolution of academic communication, it assists policymakers to track emerging trends and make informed choices about research funding and collaboration strategies (Hassan and Duarte, 2024). ...
Article
Full-text available
Objective Autism spectrum disorder (ASD) is a common neurodevelopmental disorder. Increasing evidence suggests that it is potentially related to gut microbiota, but no prior bibliometric analysis has been performed to explore the most influential works in the relationships between ASD and gut microbiota. In this study, we conducted an in-depth analysis of the most-cited articles in this field, aiming to provide insights to the existing body of research and guide future directions. Methods A search strategy was constructed and conducted in the Web of Science database to identify the 100 most-cited papers in ASD and gut microbiota. The Biblioshiny package in R was used to analyze and visualize the relevant information, including citation counts, country distributions, authors, journals, and thematic analysis. Correlation and comparison analyses were performed using SPSS software. Results The top 100 influential manuscripts were published between 2000 and 2021, with a total citation of 40,662. The average number of citations annually increased over the years and was significantly correlated to the year of publication ( r = 0.481, p < 0.01, Spearman’s rho test). The United States was involved in the highest number of publications ( n = 42). The number of publications in the journal was not significantly related to the journal’s latest impact factor ( r = 0.016, p > 0.05, Spearman’s rho test). Co-occurrence network and thematic analysis identified several important areas, such as microbial metabolites of short-chain fatty acids and overlaps with irritable bowel syndrome. Conclusion This bibliometric analysis provides the key information of the most influential studies in the area of ASD and gut microbiota, and suggests the hot topics and future directions. The findings of this study can serve as a valuable reference for researchers and policymakers, guiding the development and implementation of the scientific research strategies in this area.
... In the last decade, AI techniques have been applied in earth science, climate change, COVID-19, life science research, energy [11][12][13][14]. In particular, deep learning (DL) technology provides a promising solution for improving the accuracy of air pollution and dust storm prediction [15][16][17][18]. ...
Article
Full-text available
Ambient air pollution affects human health, vegetative growth and sustainable socioeconomic development. Therefore, air pollution data in Dezhou City in China are collected from January 2014 to December 2023, and multiple deep learning models are used to forecast air pollution PM 2.5 concentrations. The ability of the multiple models is evaluated and compared with observed data using various statistical parameters. Although all eight deep learning models can accomplish PM 2.5 forecasting assignments, the precision accuracy of the CNN-GRU-LSTM forecasting method is 34.28% higher than that of the ANN forecasting method. The result shows that CNN-GRU-LSTM has the best forecasting performance compared to the other seven models, achieving an R (correlation coefficient) of 0.9686 and an RMSE (root mean square error) of 4.6491 µg/m 3. The RMSE values of CNN, GRU and LSTM models are 57.00%, 35.98% and 32.78% higher than that of the CNN-GRU-LSTM method, respectively. The forecasting results reveal that the CNN-GRU-LSTM predictor remarkably improves the performances of benchmark CNN, GRU and LSTM models in overall forecasting. This research method provides a new perspective for predictive forecasting of ambient air pollution PM 2.5 concentrations. The research results of the predictive model provide a scientific basis for air pollution prevention and control.
Article
Full-text available
Nations worldwide are mobilizing to harness the power of Artificial Intelligence (AI) given its massive potential to shape global competitiveness over the coming decades. Using a dataset of 2.2 million AI papers, we study inter-city citations, collaborations, and talent migrations to uncover dependencies between Eastern and Western cities worldwide. Beijing emerges as a clear outlier, as it has been the most impactful city since 2007, the most productive since 2002, and the one housing the largest number of AI scientists since 1995. Our analysis also reveals that Western cities cite each other far more frequently than expected by chance, East–East collaborations are far more common than East–West or West–West collaborations, and migration of AI scientists mostly takes place from one Eastern city to another. We then propose a measure that quantifies each city’s role in bridging East and West. Beijing’s role surpasses that of all other cities combined, making it the central gateway through which knowledge and talent flow from one side to the other. We also track the center of mass of AI research by weighing each city’s geographic location by its impact, productivity, and AI workforce. The center of mass has moved thousands of kilometers eastward over the past three decades, with Beijing’s pull increasing each year. These findings highlight the eastward shift in the tides of global AI research, and the growing role of the Chinese capital as a hub connecting researchers across the globe.
Article
Full-text available
Background: In the biotechnology and applied microbiology sectors, artificial intelligence (AI) has been extensively used in disease diagnostics, drug research and development, functional genomics, biomarker recognition, and medical imaging diagnostics. In our study, from 2000 to 2021, science publications focusing on AI in biotechnology were reviewed, and quantitative, qualitative, and modeling analyses were performed. Methods: On 6 May 2022, the Web of Science Core Collection (WoSCC) was screened for AI applications in biotechnology and applied microbiology; 3,529 studies were identified between 2000 and 2022, and analyzed. The following information was collected: publication, country or region, references, knowledgebase, institution, keywords, journal name, and research hotspots, and examined using VOSviewer and CiteSpace V bibliometric platforms. Results: We showed that 128 countries published articles related to AI in biotechnology and applied microbiology; the United States had the most publications. In addition, 584 global institutions contributed to publications, with the Chinese Academy of Science publishing the most. Reference clusters from studies were categorized into ten headings: deep learning, prediction, support vector machines (SVM), object detection, feature representation, synthetic biology, amyloid, human microRNA precursors, systems biology, and single cell RNA-Sequencing. Research frontier keywords were represented by microRNA (2012–2020) and protein-protein interactions (PPIs) (2012–2020). Conclusion: We systematically, objectively, and comprehensively analyzed AI-related biotechnology and applied microbiology literature, and additionally, identified current hot spots and future trends in this area. Our review provides researchers with a comprehensive overview of the dynamic evolution of AI in biotechnology and applied microbiology and identifies future key research areas.
Article
Full-text available
A plethora of work has shown that AI systems can systematically and unfairly be biased against certain populations in multiple scenarios. The field of medical imaging, where AI systems are beginning to be increasingly adopted, is no exception. Here we discuss the meaning of fairness in this area and comment on the potential sources of biases, as well as the strategies available to mitigate them. Finally, we analyze the current state of the field, identifying strengths and highlighting areas of vacancy, challenges and opportunities that lie ahead.
Article
Full-text available
Artificial intelligence (AI) systems have increasingly achieved expert-level performance in medical imaging applications. However, there is growing concern that such AI systems may reflect and amplify human bias, and reduce the quality of their performance in historically under-served populations such as female patients, Black patients, or patients of low socioeconomic status. Such biases are especially troubling in the context of underdiagnosis, whereby the AI algorithm would inaccurately label an individual with a disease as healthy, potentially delaying access to care. Here, we examine algorithmic underdiagnosis in chest X-ray pathology classification across three large chest X-ray datasets, as well as one multi-source dataset. We find that classifiers produced using state-of-the-art computer vision techniques consistently and selectively underdiagnosed under-served patient populations and that the underdiagnosis rate was higher for intersectional under-served subpopulations, for example, Hispanic female patients. Deployment of AI systems using medical imaging for disease diagnosis with such biases risks exacerbation of existing care biases and can potentially lead to unequal access to medical treatment, thereby raising ethical concerns for the use of these models in the clinic.
Article
Full-text available
Although deep learning algorithms show increasing promise for disease diagnosis, their use with rapid diagnostic tests performed in the field has not been extensively tested. Here we use deep learning to classify images of rapid human immunodeficiency virus (HIV) tests acquired in rural South Africa. Using newly developed image capture protocols with the Samsung SM-P585 tablet, 60 fieldworkers routinely collected images of HIV lateral flow tests. From a library of 11,374 images, deep learning algorithms were trained to classify tests as positive or negative. A pilot field study of the algorithms deployed as a mobile application demonstrated high levels of sensitivity (97.8%) and specificity (100%) compared with traditional visual interpretation by humans—experienced nurses and newly trained community health worker staff—and reduced the number of false positives and false negatives. Our findings lay the foundations for a new paradigm of deep learning–enabled diagnostics in low- and middle-income countries, termed REASSURED diagnostics¹, an acronym for real-time connectivity, ease of specimen collection, affordable, sensitive, specific, user-friendly, rapid, equipment-free and deliverable. Such diagnostics have the potential to provide a platform for workforce training, quality assurance, decision support and mobile connectivity to inform disease control strategies, strengthen healthcare system efficiency and improve patient outcomes and outbreak management in emerging infections.
Article
Recent scholarship has suggested that artificial intelligence technology and autocratic regimes may be mutually reinforcing. We test for such a mutually reinforcing relationship in the context of facial-recognition AI in China. To do so, we gather comprehensive data on AI firms and government procurement contracts, as well as on social unrest across China during the last decade. We first show that autocrats benefit from AI: local unrest leads to greater government procurement of facial-recognition AI as a new technology of political control, and increased AI procurement indeed suppresses subsequent unrest. We then show that AI innovation benefits from autocrats’ suppression of unrest: the contracted AI firms innovate more both for the government and commercial markets, and are more likely to export their products; and noncontracted AI firms do not experience detectable negative spillovers. Taken together, these results suggest the possibility of sustained AI innovation under the Chinese regime: AI innovation entrenches the regime, and the regime’s investment in AI for political control stimulates further frontier innovation.
Article
Artificial intelligence (AI) is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human–AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias. As these challenges are addressed, AI’s potential may be realized, making healthcare more accurate, efficient and accessible for patients worldwide. AI has the potential to reshape medicine and make healthcare more accurate, efficient and accessible; this Review discusses recent progress, opportunities and challenges toward achieving this goal.
Article
Inspired by Christopher Freeman's work on how radical technical change opens up for shifts in world leadership and on the role of innovation systems in this process, this paper explores China's emergence as a lead country in artificial intelligence as reflecting a co-evolution of Corporate and National Innovation Systems. Taking Freeman's (1987) work on Japan as our lead, we focus on the domestic interaction within and on the openness of China's national innovation system. To follow up on his prediction of the increasing importance of big companies as network leaders, we introduce the concept "corporate innovation system" with special attention to two Chinese tech giants: Alibaba and Tencent.
Article
Over the past few decades, the number of health and ‘omics-related data generated and stored has grown exponentially. Patient information can be collected in real time and explored using various artificial intelligence (AI) tools in clinical trials; mobile devices can also be used to improve aspects of both the diagnosis and treatment of diseases. In addition, AI can be used in the development of new drugs or for drug repurposing, in the faster diagnosis and more efficient treatment of various diseases, as well as to design data-driven hypotheses for scientists. In this review, we discuss how AI is starting to revolutionize the life sciences sector.