Native diversity buffers against severity of non-native tree invasions


Abstract and Figures

Determining the drivers of non-native plant invasions is critical for managing native ecosystems and limiting the spread of invasive species1,2. Tree invasions in particular have been relatively overlooked, even though they have the potential to transform ecosystems and economies3,4. Here, leveraging global tree databases5-7, we explore how the phylogenetic and functional diversity of native tree communities, human pressure and the environment influence the establishment of non-native tree species and the subsequent invasion severity. We find that anthropogenic factors are key to predicting whether a location is invaded, but that invasion severity is underpinned by native diversity, with higher diversity predicting lower invasion severity. Temperature and precipitation emerge as strong predictors of invasion strategy, with non-native species invading successfully when they are similar to the native community in cold or dry extremes. Yet, despite the influence of these ecological forces in determining invasion strategy, we find evidence that these patterns can be obscured by human activity, with lower ecological signal in areas with higher proximity to shipping ports. Our global perspective of non-native tree invasion highlights that human drivers influence non-native tree presence, and that native phylogenetic and functional diversity have a critical role in the establishment and spread of subsequent invasions.
Native diversity buffers against severity of
non-native tree invasions
Determining the drivers of non-native plant invasions is critical for managing native
ecosystems and limiting the spread of invasive species1,2. Tree invasions in particular
have been relatively overlooked, even though they have the potential to transform
ecosystems and economies3,4. Here, leveraging global tree databases5–7, we explore
how the phylogenetic and functional diversity of native tree communities, human
pressure and the environment inuence the establishment of non-native tree species
and the subsequent invasionseverity. We nd that anthropogenic factors are key to
predicting whether a location is invaded, but that invasion severity is underpinned by
native diversity, with higher diversity predicting lower invasion severity. Temperature
and precipitation emerge as strong predictors of invasion strategy, with non-native
species invading successfully when they are similar to the native community in cold or
dry extremes. Yet, despite the inuence of these ecological forces in determining
invasion strategy, we nd evidence that these patterns can be obscured by human
activity, with lower ecological signal in areas with higher proximity to shipping ports.
Our global perspective of non-native tree invasion highlights that human drivers
inuence non-native tree presence, and that native phylogenetic and functional
diversity have a critical role in the establishment and spread of subsequent invasions.
Plant invasions have multifaceted impacts on ecosystems and human
wellbeing across the globe
. It is expected that plant invasions will
continue to increase in the coming decades owing to human-assisted
introduction and naturalization of these species, with ever-growing
impacts on biodiversity within native forest ecosystems1,9,10. These
invasions will undoubtedly also have considerable economic impacts
in managed landscapes by disrupting timber production, agriculture
and human livelihoods
. In particular, non-native trees represent an
important and increasing concern globally, as they are often actively
planted far outside their native ranges for forestry, reforestation, resi-
dential, or ornamental purposes4,18. Along with the passive spread of
non-native species, the active propagation of trees by humans can often
result in an increased potential to become problematic invaders4,1921.
Given the prominent roles of trees in shaping the structure and function-
ing of ecosystems, such tree invasions have the capacity to alter plant
composition, productivity, biodiversity and the services provided to
. Previous research in invasion ecology has expanded our
understanding of community-level properties that influence ecosys-
tem susceptibility to invasion2325, as well as traits that make plant spe-
cies more likely to become invasive2630. However, most work has been
restricted to local and regional scales
, with contrasting ecological
mechanisms affecting invasion success in different regions. We thus
lack a global unified theory of the human and ecological drivers of tree
species invasions
. Developing an integrated global understanding of
ecological and anthropogenic forces that drive non-native tree invasions
is critical to improve decision making in conservation and management.
Countless ecological mechanisms have been proposed to explain the
susceptibility of different ecosystems to invasion by non-native species
in different locations. Traditionally, more diverse or ecologically com-
plex systems are thought to exhibit ‘biotic resistance’ to invasion
This hypothesis is based on the assumption that greater diversity in the
native community fills the available ecological niches and reduces avail-
able resources, limiting niche space to novel species. However, most
work has focused on testing this hypothesis using species richness as
an indicator of niche filling23,35, which may not fully capture the propor-
tion of niches that are filled in the native community. Instead, more
informative metrics for niche filling may be phylogenetic or functional
diversity. Phylogenetic diversity accounts for evolutionary similarity
and represents a reasonable proxy for similarity between taxa, whereas
functional diversity directly addresses the underlying mechanism of
biotic resistance (that is, the breadth of ecological niches filled), but
may be more difficult to measure. Conversely, there is also evidence for
the opposite pattern in some ecosystems, whereby a more diverse com-
munity is indicative of a more favourable habitat, where a wide range of
invasive species might survive. This ‘biotic acceptance’
leads to the expectation that highly diverse sites are optimal for many
plant species and could promote invasion of non-native species. None-
theless, we still lack a unified understanding of the relative importance
of these two competing processes, and their variation across the globe,
leading to ongoing calls to resolve this ‘invasion paradox’25.
Invasion success is also likely to depend on the ecological strategy
of the invading species relative to the recipient native community. One
school of thought is that environmental constraints are the primary
drivers of plant species distributions. Therefore, to be successful, inva-
sive species ought to be similar to native species that are adapted for
that region, especially in extreme environments
. Under this ‘envi-
ronmental filtering hypothesis’
(or ‘preadaptation hypothesis’),
invasive species will be more successful if their traits mirror those of
the native community
. For example, to be successful in a harsh desert
environment, non-native plants would need to be ecologically similar
Received: 2 November 2022
Accepted: 14 July 2023
Published online: 23 August 2023
Open access
Check for updates
A list of authors and their afiliations appears at the end of the paper.
to native plants to survive, possessing traits that protect them against
high heat and water loss. By contrast, the ‘limiting similarity hypothesis’
(also known as ‘Darwin’s naturalization hypothesis’) postulates that
invasive species need to be ecologically distinct from native species to
avoid niche overlap
. Here, invaders are thought to be more success-
ful if they can fill unique niche spaces that are not already used by the
native community, reducing competition and enabling their establish-
ment. These two processes suggest contrasting mechanisms for how
species invade: either species invade by being similar or dissimilar to
the native community (Darwin’s naturalization conundrum
). It is
possible that the relative importance of these opposing ecological
mechanisms varies under different environmental conditions, with
greater importance of environmental filtering in harsh conditions
and greater niche differentiation in more moderate environments
Such regional variation in the relative importance of these mechanisms
might help to explain the opposing responses observed across studies.
However, until now, we lack a broad-scale analysis of these different
invasion mechanisms that can help us to see past the idiosyncrasy of
local-scale observations to identify unifying trends.
A key challenge hindering a global consensus of the ecological pat-
terns and mechanisms underpinning plant invasion is that these pro-
cesses are likely strongly influenced by anthropogenic activity, which
may dampen the signal of ecological drivers. Humans drive contempo-
rary plant invasions through highly efficient transport—both intentional
and accidental—of non-native plants, with proximity to ports and air-
ports being associated with increased invasion11,53,54. A constant influx
of non-native species may override a native community’s ability to resist
(biotic resistance) and obscure the impacts and importance
of specific ecological drivers, such as native diversity, particularly at
early stages of invasion. That is, with increased propagule pressure
of non-natives species exerted by humans, the relative importance of
ecological drivers may be reduced. Moreover, sites with high levels of
non-native propagule pressure due to human activity are also likely to be
heavily disturbed, compounding this anthropogenic influence. Account-
ing for human global change drivers may be particularly important
when considering the role of invasion strategy, with the potential for
anthropogenic drivers and human propagule pressure to overwhelm the
impact of ecological drivers. This could occur through an increase in the
frequency and magnitude of introductions,which would be expected
to increase stochastic variation and dampen ecological signals. So far,
these hypotheses have been tested only at local and regional scales,
with few studies integrating ecological and anthropogenic drivers of
invasion at the global scale to disentangle the relative importance of
human activity, environmental conditions and biological diversity33.
Here, by combining global datasets of local-scale forest inventories,
native status, environmental climate variables and anthropogenic
drivers, we test for the relative importance of ecological and anthro-
pogenic influence on non-native tree invasion. Using this large-scale
approach, we search for a unifying perspective of the environmental
and anthropogenic contexts driving non-native invasion and invasion
severity, via both relative richness and abundance of non-natives, as
well as invasion strategy. We consider three hypotheses: (H1) greater
native diversity reduces non-native invasion23; (H2) high levels of
environmental filtering in extreme environmental conditions leads
to similarity of non-natives with the surrounding natives, and moderate
conditions are associated with greater levels of niche differentiation
and dissimilarity24; and (H3) human drivers, specifically proximity to
ports and areas of high human population density, will mediate and
potentially override these ecological relationships
. We explore these
hypotheses through the lens of different biodiversity metrics (phyloge-
neticdiversity, functionaldiversity and speciesrichness), providing a
comprehensive view of the interactions between ecological processes
and human influence on invasion. Addressing these hypotheses is
important to highlight generalizations in the field for prevention and
management of non-native tree invasions, which is key to mitigating the
potential severe ecological and socio-economic toll of these invasions.
Using the Global Forest Biodiversity Initiative database
, we deter-
mined native tree status (native or non-native) according to the Global
Naturalized Alien Flora
and the KEW Plants of the World databases
This dataset encompassed 471,888 plots, of which 4.9% of plots were
invaded, or contained at least one non-native tree species (Fig.1 and
Supplementary Table1a). Moreover, this dataset contained a larger
Per cent invaded
Fig. 1 | Dist ribution of t he study data . Distribut ion of the full stud y dataset,
coded for non-native severity (n = 471,888 plot s). The map shows average p er
cent invasio n across a 1-degr ee hexagonal gri d, from non-invaded (0%) p ixels
in green to co mpletely invade d (100%) pixels in purp le. Plots are con sidered
invaded if the re is any non-native tre e present.
proportion of invaded plots in tropical (15.2%) than in temperate
systems (5.2 %). Overall, 249 individual non-native tree species were
identified, with the most frequent being Robinia pseudoacacia, Pinus
sylvestris, Maclura pomifera, Picea abies and Ailanthus altissima
labelled as non-native in 3,976, 2,603, 2,493, 2,468 and 1,597 plots,
respectively (Supplementary Table2). Regions with the greatest like-
lihood of being invaded include North America, Europe and East Asia
(Extended Data Fig.1), consistent with previous findings
(but see
ref. 58). To test for drivers of non-native tree invasion and invasion
strategy, we used a down-sampled version of the dataset consisting
of 17,738 forest plots, distributed across 14 biomes proportional to
their global land cover.
We calculated three metrics of invasion: (1) presence of non-natives
in the plot (‘non-native presence’); (2) relative proportion of non-native
species richness to total tree richness (‘non-native richness’); and
(3) relative proportion of non-native species basal area to total tree
basal area (‘non-native abundance’). The first metric (non-native
presence) is simply a measure of the presence or absence of invasion,
whereas the latter two metrics (relative abundance and richness)
provide insight into the subsequent severity of the invasion.
To test how hypothesized human and environmental drivers affected
the probability a forest plot was invaded or the invasion severity within
invaded plots, we built generalized linear models (GLMs) and random
forest models using either phylogenetic or functional diversitymetrics
(both as richness and redundancy) as predictor variables (Extended
Data Fig.3). For both functional and phylogenetic diversity, we used
random forest models to determine variable importance and for visu-
alization purposes, whereas GLMs were used to test for significance and
directionality of relationships. Our models also included human drivers
(distance to shipping ports (hereafter referred to as ports) and popula-
tion density) and accounted for several additional soil chemical and
climate variables. Next, to test whether non-native treespecies invade
by being similar or dissimilar to the native community (termed ‘invasion
strategy’), we again built models predicting non-native similarity from
either native phylogenetic or functional diversity metrics, along with
the same environmental and human impact variables. The non-native
invasion strategy was defined as the change in redundancy due to
addition of non-native trees, with values below zero and values above
zero indicating invasion via similarity and dissimilarity, respectively,
to the native community.
Diversity limits invasion severity
We found that anthropogenic drivers were more important than local
native tree diversity in determining non-native invasion (presence)
globally (H3), whereas native diversity— both phylogenetic and
functional—was most important in determining invasion severity
(H1; Fig.2 and Supplementary Tables3 and 4; phylogenetic diversity
random forest area under the curve (AUC) = 0.634, functional diversity
random forest AUC = 0.631). These results indicate the importance of
human-induced propagule pressure in initiating invasion of forests
and of native biodiversity moderating the severity of the invasion. We
found that forest plots closer to ports are more likely to be invaded
(Supplementary Tables3 and 4; linear model P < 0.001). Notably, these
results are consistent whether we analyse all data together at the global
level or separate data into either the temperate and tropical bioclimatic
zones (Supplementary Tables3 and 4). By contrast, we did not find that
human population density was consistently related to non-native pres-
ence, with results being variable across diversity metrics and bioclimatic
zones considered (Supplementary Tables3 and 4). However, popula-
tion density was always positively correlated with invasion probability;
population density may be a weaker predictor as it only measures human
presence, which is not necessarily related to propagule pressure.
Proximity to ports has long been known to influence invasion11,53,54, with
locations closer to a port being likely to experience greater propagule
pressure. Moreover, proximity to ports may serve as a proxy for residence
time, where plots closer to ports are more likely to have longer exposure
to non-native propagule pressure, thus increasing the likelihood of inva-
sion56. Yet, at far enough distances, stochastic processes and historical
land-use patterns may begin to weaken the role of ports (Fig.3, distances
greater than 500 km). For example, the third most frequent non-native
tree in our dataset, M. pomifera, is widely naturalized throughout the
Absolute bedrock depth
Native functional redundancy
Native functional richness
Mean annual precipitation
Distance to ports
Population density
Sand content
Silt content
Coarse fragments
Mean annual temperature
0 0.01 0.02 0.03
Mean absolute SHAP value
Absolute bedrock depth
Native phylogenetic redundancy
Mean annual precipitation
Native phylogenetic richness
Population density
Distance to ports
Sand content
Silt content
Coarse fragments
Mean annual temperature
0 0.01 0.02 0.03
Mean absolute SHAP value
10100 1,000
Inuence on
non-native presence
100 300 1,000 3,000
100 200 300
10100 1,000
Distance to ports (km)
Native richness
Native redundancy
Phylogenetic diversity
Functional diversity
Inuence on
non-native presence
Inuence on
non-native presence
Inuence on
non-native presence
Distance to ports (km)
Inuence on
non-native presence
Native richness
Inuence on
non-native presence
Native redundancy
Fig. 2 | Anth ropogenic d rivers are more i mportan t than native diver sity
in determining invasion occurrence. a,b,Importa nce (Shapley addit ive
explanatio ns (SHAP) values) of all var iables include d in random forest mo dels
ordered from g reatest to le ast import ant, alongside i nfluence o f distance to
ports, n ative richnes s and native redun dancy on non-nati ve presence (whet her
a plot is invade d or not) for global mode ls of phylogeneti c (a) and functional
(b) diversity (phylogenetic diversity, n = 17,640plots; f unctional dive rsity,
n = 17,271plots). All result s shown are from ran dom forest model s. Note that
y-axis range s differ among pan els, with the var iable impor tance plots
represen ting the corre sponding mag nitude. Error ba nds represent 9 5%
confidence intervals.
interior of North America, where it has been used for various agricul-
tural purposes dating back to the 1850s59. Such results highlight the
idiosyncratic use of trees across the globe, leading to unique invasion
trends relative to herbaceous plants. Nevertheless, at more local scales,
this strong signal of anthropogenic activity and associated propagule
pressure relative to native diversity driving non-native presence is in
agreement with previous work that considers invasion across stages56 and
recent assessments of regional and global tree invasion57,60, and highlights
the prominent role of humans in reshaping biological communities.
Although proximity to ports determined the probability a forest
plot was invaded, native tree communities with higher phylogenetic
and functional diversity exhibited lower invasion severity (Fig.3,
Extended Data Fig.4 and Supplementary Tables3 and 4; phyloge
netic diversity random forest non-native richness R2 = 0.68, phy-
logenetic diversity random forest non-native abundance R
 = 0.14,
functional diversity random forest non-native richness R
 = 0.69 and
functional diversity random forest non-native abundance R2 = 0.07;
GLM phylogenetic and functional diversity P < 0.001). Addition-
ally, distance to ports was no longer significant in linear models
predicting invasion severity (Supplementary Tables3 and 4) for
both phylogenetic (P = 0.16 and 0.28 for non-native richness and
abundance, respectively) and functional diversity models (P = 0.63
and 0.86 for non-native richness and abundance, respectively), and
showed reduced variable importance in the random forest models
(Fig.3 and Extended Data Fig.4). When investigating these patterns
using conventionally analysed species richness instead of phylo-
genetic or functional richness, we find similar qualitative results
(Supplementary Table5, random forest non-native richness R2 = 0.71
and random forest non-native abundance R
 = 0.14), suggesting that
species diversity may be a useful proxy for projecting invasion sever-
ity in the absence of functional and phylogenetic information. Our
results are consistent with the hypothesis of biotic resistance (H1),
where increased native diversity reduces invasion success, which is
probably driven by the native community utilizing more available
niche spaces23,3436,61. These results are also consistent with work
investigating tree migration drivers that suggests that migration is
slower into more diverse communities owing to greater resource use
(fewer available niches) in these systems57.
Overall, these results show that anthropogenic drivers, particu-
larly distance to shipping centres (ports), are more important in
determining which locations will experience non-native invasions
compared with traditionally studied native diversity (H3). However,
it is the intrinsic ecological drivers, including native tree community
phylogenetic and functional diversity (richness and redundancy), that
are more important in determining invasion severity (H1). Repeated
human introduction of plant species has a more important role in the
initial invasion process, but invasion severity is predominantly a result
of native intrinsic diversity. Notably, both distance to ports and native
diversity show patterns of saturation of effects, suggesting a thresh-
old at which plots that are far enough from ports, or high enough in
native diversity, will not benefit from further distance or diversity with
regard to reduced invasion or invasion severity. Although our focus
here is on the relative importance of human versus biotic drivers of
introduction, we find that environmental variables—especially mean
annual temperature—correlate strongly with patterns of non-native
invasion, which may reflect resource availability26, belowground
microorganism composition30 or potential climate compatibility
between donor and recipient ranges
. Together, our results sug-
gest that locations near human activity are more likely to experience
non-native invasions in part due to increased propagule pressure,
whereas those with lower diversity are more likely to experience more
severe non-native invasions once non-natives are present. These
results may suggest that managing forests to maintain high native
tree diversity may be a good strategy to buffer communities against
invasion, particularly for locations that are far from human activity.
Evidence for environmental filtering
When considering a range of climate, soil and anthropogenic variables,
we find evidence for environmental filtering as a driver of invasion
Population density
Mean annual precipitation
Absolute bedrock depth
Silt content
Coarse fragments
Distance to ports
Sand content
Mean annual temperature
Native functional redundancy
Native functional richness
0 0.01 0.02 0.03 0.04
Mean absolute SHAP value
Population density
Absolute bedrock depth
Mean annual precipitation
Silt content
Coarse fragments
Sand content
Distance to ports
Mean annual temperature
Native phylogenetic redundancy
Native phylogenetic richness
0 0.01 0.02 0.03
Mean absolute SHAP value
10 100 1,000
Inuence on
non-native abundance
100 300 1,000 3,000 300
10 100 1,000
Distance to ports (km)
Native richness Native redundancy
Phylogenetic diversityFunctional diversity
Inuence on
non-native abundance
Inuence on
non-native abundance
Distance to ports (km) Native richness Native redundancy
Inuence on
non-native abundance
Inuence on
non-native abundance
Inuence on
non-native abundance
Fig. 3 | Nati ve diversity is th e most impor tant driver o f invasion severi ty.
a,b,Importan ce (Shapley additive ex planations (SH AP) values) of all variable s
included in r andom forest mode ls ordered from g reatest to lea st importa nt,
alongside i nfluence o f distance to p orts, native r ichness and na tive redundanc y
on invasion se verity for global mo dels of phylogene tic (a) and functional
(b) diversity (phylogenetic diversity, n = 3,498plots; function al diversity,
n = 3,368 pl ots). Plots are shown for the s everity of invasi on measured as
non-native sp ecies abunda nce (proport ion of basal area w ith non-native pla nt
species); plot s for non-native spe cies richne ss (proportio n of non-native plant
species) are sh own in Extende d Data Fig.4. All re sults shown are fr om random
forest mode ls. Note that the y-a xis ranges diffe r among panels, w ith the
variable importance plots represent ing the corresponding magnitude. Error
bands represent 95% confidence intervals.
strategy, in particular, with respect to mean annual temperature and
precipitation. In all global models, temperature was important for
predicting tree invasion strategy (Fig.4, Extended Data Fig.5 and Sup-
plementary Table6; phylogenetic diversity random forest R2 = 0.084,
functional diversity random forest R2 = 0.099; H2), with our global analy-
sis indicating that non-native trees were more similar to the native com
munity in environments at cold and hot temperature extremes (Fig.5
and Supplementary Table6, P < 0.001). That is, in order to invade into a
cold or hot environment, non-native plants are more successful if they
share similar traits with native plants to survive in these harsher temper-
ature conditions. By contrast, at locations with moderate temperatures,
non-natives are neither more nor less similar to native communities,
potentially because these less harsh environmental conditions allow a
wider range of life strategies to coexist
. For functional diversity, inva-
sion strategy at high temperatures is relatively neutral, with the line
approaching a value of zero, suggesting that although phylogenetically
similar, these communities show some level of functional divergence,
highlighting the importance of including functional diversity in future
studies. When separating the data into temperate and tropical systems,
we found divergent temperature patterns (Supplementary Table6;
temperate P < 0.001, tropical P = 0.01). In temperate systems, non-native
trees were more likely to be similar to the native tree community in
colder environments relative to hot environments, in line with previous
results in temperate North America
. In tropical systems, we found the
opposite pattern, with non-native trees being more likely to be similar
to the native tree community in hotter tropical environments. At the
lowest temperatures, non-natives invading through similarity were pri-
marily gymnosperms (fir, spruce and pine species) invading into native
communities containing species in the same genus; by contrast, at the
highest temperatures, non-natives invading through similarity were
angiosperms, with a high prevalence of palms and legumes. Further,
we detect a similar pattern of environmental filtering for mean annual
precipitation when analysing phylogenetic and functional diversity with
random forest models, where lower or higher precipitation is associated
with non-native invasion through similarity (Extended Data Fig.5). This
suggests that the most likely invaders at low or high temperatureor
precipitation may be ecologically similar to the host communities,
which could inform invasion risk checklists at ports.
Within the temperate bioclimatic zone, we found evidence that anthro-
pogenic activity weakened the environmental filtering pattern for phylo-
genetic and functional diversity seen for temperature and precipitation,
respectively (H3). In particular, proximity to ports modified the signal of
environmental filtering due to temperature, weakening the influence of
temperature on invasion strategy with respect to phylogenetic similarity
(Fig.5 and Supplementary Table6; P < 0.001). Colder ecosystems show
evidence of environmental filtering of invasion; however, increased
proximity to ports reduces the prevalence of this strategy. We suggest
that this may be due to increased introductions around shipping ports,
which would increase stochastic variation and dampen ecological strat
egies. However, we did not detect a similar interaction governing the
tropical bioclimatic zone, potentially owing to relatively lower human
pressure, and particularly lower ship traffic64, compared to temperate
systems. Alternatively, this pattern may also reflect the fact that some
temperate plots occur at greater distances to ports than tropical sites
(95th percentile of 784 km versus 311 km for temperate and tropical,
respectively), increasing statistical power for detecting this trend in
Functional diversity
Similar Dissimilar
Similar Dissimilar
Mean annual temperature (°C)
Invasion strategy
Invasion strategy
Bioclimatic zone
Temperate Tropical Other
Absolute bedrock depth
Soil pH
Population density
Distance to ports
Native phylogenetic diversity
Mean annual precipitation
Mean annual temperature
–1.5 –1.0 –0.5 0 0.5
Absolute bedrock depth
Soil pH
Population density
Distance to ports
Native phylogenetic diversity
Mean annual precipitation
Mean annual temperature
–0.25 0 0.25
Model estimate
Phylogenetic diversity
Mean annual temperature (°C)Model estimate
Fig. 4 | Enviro nmental f ilterin g at temperat ure extremes . a,c, Estimates of
overlapping var iables include d in temperate a nd tropical GLM mo dels (forest
plot) for phylogenetic (a) and functional (c) diversi ty models (phylo genetic
diversity, n = 3,498; functional diver sity, n = 3,368). Values to the l eft of the zero
line indica te negative mod el estimate s, and those to the r ight indicate p ositive
estimates. b,d, Relationship b etween mea n annual temper ature and invasion
strateg y for phylogeneti c (b) and functio nal (d) diversity mod els, showing tha t
at extreme te mperatures inv asion occurs th rough similarit y (Supplement ary
Table7; phylogenetic diversity: P(1) = 9.69 × 10−14, P(2) = 2.13 × 10−1 1; funct ional
diversity: P(1) < 2 × 10−16, P(2) = 1.07 × 10−4, where P(1) and P(2) repre sent each
temperature and temperature squared P values, respe ctively). Note for
functio nal diversity, this pat tern only holds a t low temperature s. Error bars
and bands represent standard error.
temperate regions. Furthermore, proximity to ports also marginally
weakened the signal of environmental filtering due to precipitation for
functional invasion strategy (Supplementary Table6; P = 0.07). These
results illustrate that human influence can override the ecological factors
driving invasion, suggesting that at high enough propagule pressure,
the phylogenetic and functional similarity of a non-native becomes less
important in predicting its ability to invade a native community. Never-
theless, as our analyses are not causal, these results could also reflect
correlations between port locations and invasion strategy. However,
when we investigated the same effect with human population density,
we did not see this weakening effect, demonstrating that distance to
ports seems to be a particularly relevant mediator of these patterns.
These results suggest that human activity may overwhelm ecological
drivers of non-native invasion strategies and reduce the influence of
ecological processes, making inclusion of human impacts critical for
studying global invasion strategies.
Collectively, our work integrates biotic and anthropogenic fac-
tors across phylogenetic and functional diversity for both invasion
presenceand invasion severity of non-native tree species worldwide.
Although non-native trees have been relatively overlooked relative
to herbaceous plants, their large size, long lifespans and impor-
tant history in forestry, food, reforestation and city landscaping
exposes trees to unique ecological and anthropogenic factors that
shape their worldwide distributions. Moreover, given that many
tree invasions are in their infancy, with substantial ‘invasion debts’
of recent tree plantings
, understanding the ecological drivers pro-
moting spread has the potential to provide real-time feedback for
the preventativemanagement of invasive trees. However, there are
important considerations when interpreting these findings, many of
which could be addressed with increased data resolution and increased
sampling within under-sampled geographic regions. First, our analy-
sis is largely observational, whereas community composition would
ideally be compared before and after invasion to better understand
the causality of the trends observed here. We can gain some insight
into this question by conducting a sensitivity analysis on the subset of
invaded plots that were measured at multiple time points and that had
no initial invasion. Doing so reveals that the reduction in native diversity
due to invasion can potentially account for as much as 10.4% (mean of
6.7%) of the observed biotic resistance (Supplementary Table9), but
that the remainder of this effect is attributable to difference in native
diversity (that is, biotic resistance) across plots. Additional long-term
data on plots that are uninvaded and become invaded will be useful in
further addressing the influence of invasion on native diversity. Second,
many tree species in our analysis were only identified to genus level
or were not present in the master plant phylogeny, which may lead to
an underestimation of native diversity or invasive species richness in
some plots, particularly in species-rich forests. Indeed, a key challenge
in global analyses such as ours is the underrepresentation of certain
ecosystems, for example, tropical ecosystems
. This is addressed to
some extent by our down-sampling approach, as well as our spatial
cross-validation approach (Methods), but ongoing efforts to fund and
develop open-access and fair
tropical forest inventory data are critical
for gaining better insight into these ecologically and socially important
0510 15 20 25
Invasion strategy
0510 15 20 25
Mean annual temperature (°C)
Invasion strategy
aPhylogenetic diversity b
Near to ports Far fromports
Similar Dissimilar
050 100 150 200 250
050 100 150 200 250
Mean annual precipitation
Functional diversity
Mean annual temperature (°C) Mean annual precipitation
Invasion strategyInvasion strategy
Fig. 5 | Proxi mity to por ts weakens envi ronmental f ilteri ng in the
temperate bioclimate zone. a,b, In temperate p lots far from por ts,
tempe rature i s posit ively cor related with an inv asion strateg y of increasin g
dissimilarity for phylogenetic (a) and functi onal (b) diversity (phylogenetic
diversity: n = 2,710 plot s, P = 6.37 × 10−6; func tional divers ity: n = 2,603,
P < 2 × 10−16). c,d, This rela tionship bet ween temper ature and invasion s trategy
weakens for phylogenetic (c) and functional (d) diver sity with proxi mity to
ports (Supplementary Table7; phylogenetic diversity: P = 0.0001; funct ional
diversity: P = 2.71 × 10−13). Lines and p oints repres ent the lowest (c,d) and
highest (a,b) 10% of data. Error band s represent st andard error.
Many tree species are intentionally introduced for forestry or wood
products and may be managed
, generating variation in the drivers
underpinning invasion that are unique to trees. To minimize the
influence of heavily managed forests, we included only plots with a
minimum of three species and thus our dataset does not include mono-
culture forestry plantations. In addition, when restricting our analysis
to the subset of global plots that occur in protected areas with minimal
human footprint, our core results and inferences remain unchanged
(Supplementary Table7). Having additional high-quality data on the
human role in invasion, including the type and time of management,
and overall level in disturbance regime
, would refine our results and
better separate ecological versus human drivers. Future work should
also focus on drivers of tree invasion and invasion strategies across
scales25,63,67, as patterns may differ at scales larger than the local plot
level that we include here, which may be important for regional versus
local management of non-native trees. Finally, emerging work shows
that the consideration of native range size and change in environment
and/or disturbance from donor to recipient community may be more
helpful in understanding introduction and invasion success than
simply quantifying these variables in the novel, recipient range
Therefore, including the change in environmental and human impact
variables would also be a fruitful avenue for future research.
Together, these results provide important unifying insights into
the global drivers of non-native tree invasions and the ecological
strategies that might be most successful in different regions. The
trends and ecological mechanisms identified here can provide tan-
gible guidelines to support forest management of non-native tree
invasions around the globe. However, because non-native trees are
introduced purposefully for forestry or to support local livelihoods,
which can lead to differences in forest management objectives and
strategies4, it is critical that local stakeholders are included when
making decisions about how to best manage these introductions
Ultimately, this emerging understanding of global tree invasions pro-
vides fundamental insights that are needed to understand how forest
composition is being reshaped under global change, and for forest
management practices to limit the spread and impacts of non-native
tree invasions worldwide.
Online content
Any methods, additional references, Nature Portfolio reporting summa-
ries, source data, extended data, supplementary information, acknowl-
edgements, peer review information; details of author contributions
and competing interests; and statements of data and code availability
are available at
Tree inventory and non-native status
For tree inventory data, we used the Global Forest Biodiversity Initiative
(GFBI) database7, which contains tree-level abundance data for more
than 1.2 million forest plots on all continents across the globe, contain-
ing more than 31 million unique georeferenced records of tree size and
density dating from 1958. Each observation in the dataset consists of a
unique tree ID, plot ID, plot coordinates, tree diameter at breast height
(DBH), tree-per-hectare expansion factors, year of measurement, and
binomial species names. In this study, we applied several filters to these
data before analyses. First, where plots had multiple years of data,
we kept only the most recent year of census data. We then subset the
data to include only plots with at least three species as required for
our phylogenetic metrics, excluding monoculture forest plantations
from the study.
To assign native status to each tree species (native or non-native,
representing naturalized and invasive), we established a consensus
status between the Global Naturalized Alien Flora (GloNAF)
and the
KEW Plants of the World
databases. All databases were standardized
to The Plant List taxonomy71. The GloNAF database contains detailed,
georeferenced information on the naturalized status of more than
10,000 plant species in each of 1,029 regions across the globe represent-
ing countries or federal states; the KEW database outlines native ranges
of vascular plant species for over 1.2 million plant species70. The GFBI
and GloNAF datasets were joined by matching each unique species by
location in GFBI to a GloNAF region polygon and species status. Then,
for each GFBI plot, we extracted the GloNAF region identifier using
Google Earth Engine72. This process was then repeated for the KEW
database. We then filtered out plots that included any species with
disagreement between GloNAF and KEW databases (that is, conflicting
native status), and only included trees with a minimum diameter of 5 cm
and a minimum height of 1.3 m to allow for DBH measurements. All trees
identified as ‘non-native’ were verified to be listed in the BGCI Tree
List, which defines a tree as, “A woody plant with usually a single stem
growing to a height of at least two metres, or if multi-stemmed, then at
least one vertical stem five centimetres in diameter at breast height”
Note that this is an inclusive definition which includes monocots and
tree ferns, as well as species that can occur both as tall single-stem and
shrub-like multi-stem phenotypes.
To account for unequal representation of plots across biomes (Fig.1),
we used a reduced version of this database, down-sampled to a number
of plots proportional to the land area covered by each of 14 biomes
(Supplementary Table1), while conserving as many tropical plots as
possible. This ensured that we were not overrepresenting historically
oversampled biomes, particularly in temperate regions. In addition,
we preferentially retained invaded plots during this down-sampling to
ensure adequate representation of invaded plots in the final dataset,
with a maximum of half of the plots within a biome being invaded. This
oversampling of invaded plots allowed for adequate representation of
invaded and non-invaded plots in our analyses of non-native presence,
and allowed sufficient data for our analyses of invasion severity, as
these analyses only used data from plots that had non-native species
invasions. Results were not qualitatively different if we did not pref-
erentially retain invaded plots in our down-sampling (Extended Data
Fig.6 and Supplementary Table8). Note also that the global mapping
used the full dataset, with no subsampling. Prior to analyses, we also
collapsed locations with multiple replicate plots and removed plots
where phylogenetic of functional diversity could not be calculated for
both native and full communities due to less than three species being
present (see below).
Non-native invasion metrics
We split our invasion metrics into the two broad categories of ‘non-
native invasion’ (presence) and ‘invasion severity’. Specif ically, using
our data, we were able to determine for each plot (1) whether any
non-native tree species were present (non-native presence); (2) the
proportion of tree species that were non-native relative to total tree
species (invasion severity, assessed via non-native richness)23; and
(3) the proportion basal area of non-native tree species relative to
total tree species basal area (invasion severity, assessed via non-native
abundance). These metrics are congruent with recently proposed
frameworks for measuring and reporting invasive plant species
The metric of relative introduced species richness may be hypoth-
esized to lead to a bias in detection of biotic resistance, with greater
biotic resistance falsely detected in diverse communities, as these
communities will have a lower proportion of non-native trees due to
the higher denominator (total site diversity). However, use of the bino-
mial approach in our GLM modelling of this proportion, as opposed
to direct proportion, overcomes this limitation, as it uses raw counts
of proportion, effectively weighting observations by the total species
number in the community23.
Climatic and anthropogenic variables
For climatic and anthropogenic variables, we relied on the Global
Environmental Composite
. This global database contains spatially
explicit geographic information system (GIS) layers of more than 260
unique environmental variables, encompassing climate, soil, land cover
and land use, plant biomass, topography, human footprint, and distur-
bance78,79. Climate variables were extracted from the CHELSA (clima-
tologies at high resolution for the earth’s land surface areas) dataset78,
whereas soil variables were from the SoilGrids80 dataset. In addition,
we created distance measures by calculating the spherical distance
to shipping ports
and airports
. All layers were standardized to a
30 arcsec resolution (~1 km
at the equator), a resolution at which these
variables have been shown to have an influence on plant biogeography
and assembly patterns
. We chose model variables to represent both
climate and soil properties that exhibited low collinearity for each of
three datasets: global (all 14 biomes from Supplementary Table1),
temperate (temperate broadleaf, coniferous, grassland biomes) and
tropical (tropical moist broadleaf, deciduous broadleaf, coniferous, and
grassland biomes). We chose to use distinct variables rather than trans-
forming them into principal component analysis axes for increased
interpretability of these variables and their effects. Because variables
exhibiting collinearity varied between the three datasets, the resulting
models include different variable combinations. For all models, we used
mean annual temperature (MAT), mean annual precipitation (MAP),
distance to shipping ports
(hereafter ‘ports’) and human population
. For the global models, we used the following additional envi-
ronmental variables: absolute depth to bedrock, coarse fragments,
sand content and soil pH. For temperate models, we used absolute
depth to bedrock, clay content, and soil pH as additional variables;
for tropical models we used absolute depth to bedrock, soil organic
content, and soil pH as additional variables. All soil variables used were
determined at a depth of 0 cm, or the top layer of soil.
Diversity metrics
We analysed data using either phylogenetic or functional diversity;
these two approaches were chosen to be as analogous as possible.
Phylogenetic alpha diversity explains the genetic relatedness of species
within a community and is often assumed to represent a proxy for func-
tional similarity across species within a community assemblage. Yet,
congruency between these two metrics remains under debate
their role in invasion patterns remains untested; therefore, we focused
on two major axes of diversity, explaining richness and divergence in the
community across both phylogenetic and functional space
, capturing
both evolutionary and ecological processes. For each native and entire
tree community (native and non-native species), we calculated Faith’s
phylogenetic diversity (phylogenetic richness) and mean nearest taxon
distance (MNTD, phylogenetic redundancy; Extended Data Fig.2).
Entire tree community metrics were calculated on all species, whether
they were matched to GloNAF and KEW or not; this included tree species
which were identified to genus level. Faith’s phylogenetic diversity was
calculated as the sum of the branch lengths on the phylogenetic tree
of the species in the community; MNTD was calculated as the average
distance to the nearest neighbour across the community. These metrics
were calculated based on tree placement of taxa in a recently published
reference backbone tree for plants89. Out of 13,345 starting taxa, a total
of 12,325 were placed on the reference tree, with 4,960 placed at the
species level and 7,365 placed at the genus level. We chose MNTD over
other available metrics describing community divergence because
we were interested in redundancy of the community, and this metric
captures this best
. To enable a more intuitive understanding of this
metric, we transformed each community-level value of MNTD to the
maximum MNTD across all communities minus calculated MNTD. This
transformed the maximum value to zero and all smaller values trans-
formed to increasingly larger numbers, with higher MNTD values indi-
cating a greater native redundancy, similar to the expected increased
redundancy with greater phylogenetic richness (Faith’s phylogenetic
diversity). To determine the non-native invasion strategy, or impact
of non-natives on native MNTD, we calculated the difference between
the native and non-native community relative to the native community
alone. We used the following formula for non-native invasion strategy:
(entire community MNTD – native community MNTD)/native commu-
nity MNTD. When non-native invasion strategy was greater than zero,
this indicated that the addition of the non-native species resulted in
a more dissimilar community, whereas a non-native invasion strategy
less than zero corresponded to the opposite.
For functional diversity, we calculated the analogous metrics using
trait distance matrices instead of phylogenetic tree-based distances.
We selected eight traits extracted from Maynard etal.
that repre-
sented the major clusters of functional trait diversity, thereby cap-
turing the full spectrum of tree form and function while minimizing
correlation between traits. Maynard etal.83 used data from the TRY
plant trait database to parametrize machine learning models to esti-
mate the expression of 18 traits as a function of the local environment
and/or phylogeny. The observed trait data underlying these models
encompassed 491,001 unique observations across 13,189 species from
2,313 genera, with consistent representation across taxonomic orders.
The resulting models were then used to generate trait estimates for
52,255 tree species, capturing approximately 80% of documented tree
. Using this trait database, we were able to assign trait value
to 81% of the tree species in GFBI reported to the species level. The
eight traits we included in our metrics were chosen to include traits
typically associated with plant invasion28,92 including those associated
with dispersal, establishment, resource acquisition and competitive
ability that represent the major trait clusters encompassing the full
dimensionality of trait space from Maynard etal.83 The eight traits
included in our study were the following: wood density, root depth,
leaf nitrogen, leaf phosphorus, leaf area, tree height, seed dry mass,
and bark thickness. All traits were log-transformed and normalized to
allow for statistically valid comparisons83. To obtain functional diversity
metrics analogous to those used for phylogenetic diversity, we used
the dendrogram approach of Petchey and Gaston
. Specifically, for
every plot we calculated the species-by-species trait distance matrix
encompassing all eight traits, and then used hierarchical clustering to
create a functional dendrogram. This dendrogram was subsequently
used to calculate ‘functional richness’ (analogous to Faith’s phylo-
genetic diversity) and ‘functional redundancy’ (MNTD); we use this
terminology for functional diversity to maintain naming of variables
between phylogenetic and functional diversity analyses. Metrics were
calculated in R using packages ape94, tidyverse95, abdiv96, doParallel97,
foreach98 and pez99.
Because both functional and phylogenetic diversity metrics have
unique limitations, we considered them both here so as to obtain a
more robust view of underlying patterns and processes. The benefit of
phylogenetic diversity is that it does not rely on imputed data, and thus
it provides more consistent results with lower uncertainty. However,
phylogenetic diversity is only a loose proxy for functioning, depending
on the degree to which the functional traits of interest are phylogeneti-
cally conserved. Thus, as a complement of this, we also use imputed trait
values to estimate functional diversity, which should better capture
underlying functional differences across species, but which is subject
to higher uncertainty relative to phylogeny (or measured trait values),
and may omit rare and potentially functionally unique species. Thus, by
simultaneously considering both functional and phylogenetic diversity
and showing that these metrics yield consistent global trends, our
approach provides consistent evidence that these patterns are robust
to the limitations of either approach taken individually.
Statistical analyses
We combined random forest
and GLM approaches to answer our
focal questions. Specifically, we used random forest models to visualize
patterns and determine variable importance, while GLMs were used to
assess statistical significance and directionality of patterns. We first
tested for environmental and anthropogenic drivers of non-native inva-
sion, including non-native presence and invasion severity (non-native
richness, non-native abundance). Our independent variables included
either phylogenetic or functional metrics, climate and soil variables,
and human impact variables. Next, we tested the impact of these
variables on non-native invasion strategy (difference in MNTD due to
non-natives). We focused on addressing specific hypotheses related to
drivers of non-native invasion and invasion strategy. We acknowledge
the importance of other variables, and therefore included them in our
models, but do not interpret each variable.
Random forest models and GLMs used the same model designs.
Models predicting non-native presence as well as invasion severity,
for both non-native richness and abundance, included independent
predictor variables of native diversity and native redundancy, as well as
climate and human driver variables detailed in ‘Climatic and anthropo-
genic variables’. For comparison, we repeated these models with native
tree species richness in place of both diversity variables (richness and
redundancy), as species richness is commonly used in the invasion
literature when testing for biotic resistance
. Finally, we used an
adapted version of the random forest models, removing diversity vari-
ables, to assess probability of locations with non-native trees globally
and generate an associated map (Extended Data Fig.1).
To account for spatial autocorrelation in the modelling step, we used
residual autocovariates (RACs)
. First, we used simple linear regres-
sion to determine the range of spatial autocorrelation for the models
with continuous outcomes (invasion severity and invasion strategy).
We then assessed residual spatial autocorrelation using correlelo-
gram plots using the ncf
package in R, which showed that residual
correlation was consistently negligible beyond 250 km, which was
also applied to the models with binary outcomes (non-native pres-
ence). Using this buffer distance, we generated RAC values using the
autocov_dist() function in the spdep package70,104, which determines
an inverse distance weighted residual value for each data point in the
250 km neighbourhood. RAC incorporates the spatial signature of the
model residuals, relative to the model without any spatial autocorrela-
tion correction, into a variable that is included in each model
. The
result is an inverse distance weighted residual value for each data point
in the 250 km neighbourhood, which we used as continuous predictors
in both the linear and random forest models.
Random forest models were used primarily to assess variable impor-
tance and influence. Specifically, we usedShapley additive explanations
(SHAP) values to infer variable importance in the model outcome
SHAP values are a machine learning analogue of partial regression,
quantifying the relative importance of each variable on the outcome,
accounting for all other variables in the model. To estimate the SHAP
values, random forest models were fit in R using the ranger package
using default hyperparameters (500 trees, observations sampled
with replacement, number of variables per split equal to the square
root of the number of predictors, a minimum of 5 observations per
node). We then used the fastshap package
to estimate approximate
SHAP values for each predictor, using n = 100 simulations. The overall
variable importance was taken as the sum of the absolute value of the
SHAP values, and the marginal effect of each variable was visualized
by plotting the covariate versus the corresponding SHAP value for
each observation.
To account for spatial autocorrelation in the accuracy assessment
of random forest models, we implemented spatially-buffered leave-
one-out cross-validation (LOO-CV) to obtain conservative lower-bound
accuracy measures
. To do this, we first randomly selected a focal
observation as the test data, and then we omitted all observations
within a 250 km buffer distance around this observation. The remain-
ing data were used to train the model, and the resulting fit was used to
predict outcome for the withheld focal observation. This was repeated
500 times for each model, each time selecting a new focal point and
predicting its outcome using the 250 km spatially-buffered training
set. The resulting accuracy measures were calculated on the set of 500
out-of-fit predictions. For continuous variables, we estimated accuracy
using the cross-validated coefficient of determination relative to the
one-to-one line (termed VEcv
), denoted simply R
here, and for binary
outcomes we used area under the ROC curve (AUC), which quantifies
the ability of the classifier to distinguish between classes, and serves
as an assessment of model performance.
To create a global map of invasion probability and its local uncer-
tainty, we used a repeated prediction approach in Google Earth Engine60
(Extended Data Fig.1a; AUC of spatial cross-validation = 0.84 ± 0.04,
mean F1 score of non-native presence = 0.36). This repeated prediction
approach used the full dataset without any down-sampling. To our
knowledge, no global maps on phylogenetic or functional diversity
metrics exist, so we were unable to include these diversity metrics in
the random forest model for mapping; therefore, these models include
the same covariates as the other models except diversity metrics. We
thought it reasonable to exclude diversity metrics in this analysis as dis-
tance to ports is the most important driver of invasion probability, while
native diversity is less important. After aggregating samples within the
30-arcsec pixels, 368,030 data points remained for our repeated predic-
tion approach. We first trained 50 random forest models on stratified
bootstrapped samples with a total of 10,000 data points each, using
biome as stratification category; this allowed us to repeatedly predict
the probability of non-native presence for each terrestrial pixel on
Earth. The resulting 50 predictions were used to create per-pixel mean
and coefficient of variation maps of the probability of non-native pres-
ence, with probabilities calibrated using Platt scaling
. These two
maps allow us to investigate the patterns of invasion and the regions
of uncertainty in the predictions. Next, the extrapolation extent was
estimated as a per-pixel percentage of predictor variables, and interac-
tions of predictor variables, outside of the training range, in univariate
and multivariate space, respectively (Extended Data Fig.1b)
. In addi-
tion, to account for gaps in predictor space, we estimated the Area of
Applicability113, used to mark regions of extrapolation in this map. All
maps are restricted to regions with a minimum of 10% forest cover114.
GLM models were used to estimate statistical parameters and con-
duct statistical tests. All GLM models included the same variables as
those in the random forest models. In the models predicting non-native
presence, we used a binomial distribution and logit link. For non-native
abundance, we used a beta regression approach to predict the propor-
tion of non-native basal area, as a method of modelling proportions
between 0 and 1. We could not use a binomial GLM analogous to that
used for non-native abundance because basal area measurements were
not whole numbers and we wanted to retain all information in the data.
Finally, to account for spatial autocorrelation and non-independently
distributed residuals, we employed the inclusion of RACs as described
above. These models were repeated separately for temperate and tropi-
cal bioclimatic zones, but results were qualitatively similar to the global
model, so we report only global results here. All GLM results can be
found in Supplementary Tables3–5. GLMs were run in R (v. 4.2.2)
using lme4
, lmerTest
, and betareg
, while visualizations for these
models used ggplot2119; tidyverse95 was used throughout as well.
Because invasion of non-native species may alter the native diversity
of the site into which they invade, we conducted a sensitivity test using
plots where we had data across two time points to incorporate this
effect. We first took all plots for which we had two time points, where
the first time point represented a fully native community (that is, no
presence of non-natives;n = 8,221plots). We then modelled the per
cent change of species richness in each plot from this uninvaded first
time point to a later time point. Our predictor variables included final
invasion status (non-natives present or not) to determine the impact of
invasion on per cent change of species richness, along with all climate,
soil, and anthropogenic impact variables we included in other global
models. We extracted the coefficient of final invasion status (along
with upper and lower confidence ranges), which quantifies the per
cent change in richness due to invasion, and we used this to update
the native species richness of the full global dataset. We then used
these coefficients to estimate the pre-invasion native diversity for each
plot in the global dataset by adding the corresponding species change
resulting from invasion. Finally, we reran our global analysis with this
updated pre-invasion native diversity. The relative contribution of
native species loss to biotic resistance was calculated by comparing
the relative change in the richness coefficient for each of the updated
models relative to the original model (Supplementary Table9).
Non-native invasion strategy was predicted using the difference
in redundancy (MNTD) in the tree community due to invasion. We
included the same variables as in the previous set of models, except
native redundancy, as this is integrated in our response variable and
therefore would exhibit high collinearity. In GLM models, we tested for
the interaction between MAP and MAT to detect potential non-additive
environmental filtering effects of these two dominant climate vari-
ables. In addition, we tested for the interaction between each MAP
and MAT with distance to ports, to examine whether this important
anthropogenic driver modified main ecological relationships. Final
reported models are those resulting from a process of first creating a
full model with all interactions, and subsequently removing nonsignifi-
cant interactions. All GLM results for invasion strategy can be found
in Supplementary Table7.
Reporting summary
Further information on research design is available in theNature Port-
folio Reporting Summary linked to this article.
Data availability
Data used in this study can be found in cited references for the Global
Naturalized Alien Flora (GloNAF) database
(non-native status), the
KEW Plants of the World database5 (native ranges) and the Global
Environmental Composite63,77 (environmental data layers). Plant trait
data were extracted from Maynard etal.
. Data from the Global Forest
Biodiversity Initiative (GFBI) database57 are not available due to data
privacy and sharing restrictions, but can be obtained upon request
via Science-I ( or GFBI ( and an
approval from data contributors.
Code availability
All code used to complete analyses for the manuscript is available at the
following link:
Data analyses were conducted and were visualizations generated
in R (v. 4.2.2), Python (v. 3.9.7), Google Earth Engine (earthengine-api
0.1.306), QGIS-LTR (v. 3.16.7) and the ETH Zurich Euler cluster.
70. Bivand, R. & Piras, G. Comparing implementations of estimation methods for spatial
econometrics. J. Stat. Softw. 63, 1–36 (2015).
71. Kalwij, J. M. Review of ‘The Plant List, a working list of all plant species’. J. Veg. Sci. 23,
998–1002 (2012).
72. Gorelick, N. etal. Google Earth Engine: planetary-scale geospatial analysis for everyone.
Remote Sens. Environ. 202, 18–27 (2017).
73. Beech, E., Rivers, M., Oldield, S. & Smith, P. P. GlobalTreeSearch: the irst complete
global database of tree species and country distributions. J. Sustain. For. 36, 454–489
74. Catford, J. A., Vesk, P. A., Richardson, D. M. & Pyšek, P. Quantifying levels of biological
invasion: towards the objective classiication of invaded and invasible ecosystems.
Glob. Change Biol. 18, 44–62 (2012).
75. Guo, Q. etal. A uniied approach for quantifying invasibility and degree of invasion.
Ecology 96, 2613–2621 (2015).
76. Van Den Hoogen, J. etal. Soil nematode abundance and functional group composition at
a global scale. Nature 572, 194–198 (2019).
77. Bastin, J.-F. etal. The global tree restoration potential. Science 365, 76–79 (2019).
78. Karger, D. N. etal. Climatologies at high resolution for the earth’s land surface areas.
Sci. Data 4, 170122 (2017).
79. Stewart, S. etal. Climate extreme variables generated using monthly timeseries data
improve predicted distributions of plant species. Ecography 44, 626–639 (2021).
80. Hengl, T. etal. SoilGrids250m: Global gridded soil information based on machine learning.
PLoS ONE 12, e0169748 (2017).
81. Global Ports. All Layers and Tables (GLOBAL/GlobalPorts)
services/GLOBAL/GlobalPorts/MapServer/layers (2021).
82. Global Airpor ts. The World Bank Data Catalog
dataset/0038117 (2021).
83. Maynard, D. S. etal. Global relationships in tree functional traits. Nat. Commun. 13, 3185
84. Joswig, J. S. etal. Climatic and soil factors explain the two-dimensional spectrum of
global plant trait variation. Nat. Ecol. Evol. 6, 36–50 (2022).
85. Florczyk, A. J. etal. GHSL Data Package 2019: Public Release GHS P2019 (European
Commission, Joint Research Centre, 2019).
86. Owen, N. R., Gumbs, R., Gray, C. L. & Faith, D. P. Global conservation of phylogenetic
diversity captures more than just functional diversity. Nat. Commun. 10, 859 (2019).
87. Mazel, F. etal. Prioritizing phylogenetic diversity captures functional diversity unreliably.
Nat. Commun. 9, 2888 (2018).
88. Tucker, C. M. etal. A guide to phylogenetic metrics for conservation, community ecology
and macroecology. Biol. Rev. 92, 698–715 ( 2017).
89. Jin, Y. & Qian, H. V. PhyloMaker: an R package that can generate very large phylogenies
for vascular plants. Ecography 42, 1353–1359 (2019).
90. Strauss, S. Y., Webb, C. O. & Salamin, N. Exotic taxa less related to native species are more
invasive. Proc. Natl Acad. Sci. USA 103, 5841–5845 (2006).
91. Cazzolla Gatti, R. etal. The number of tree species on Earth. Proc. Natl Acad. Sci. USA 119,
e2115329119 (2022).
92. Pyšek, P. & Richardson, D. M. in Biological Invasions (ed. Nentwig, W.) 97–125 (Springer,
93. Petchey, O. L. & Gaston, K. J. Functional diversity (FD), species richness and community
composition. Ecol. Lett. 5, 402–411 (2002).
94. Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and
evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
95. Wickham, H. etal. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).
96. Bittinger, K. Abdiv: Alpha and beta diversity measures. R package version 0.2.0 https:// (2020).
97. Calaway, R., Analytics, R., Weston, S., Tenenbaum, D. & Calaway, M. doParallel: Foreach
parallel adaptor for the ‘parallel’ package. R package version 1.0.17 https://cran.r-project.
org/web/packages/doParallel/index.html (2015).
98. Microsoft & Weston, S. foreach: Provides foreach looping construct. R package version
1.5.1. (2020).
99. Pearse, W. D. etal. Pez: phylogenetics for the environmental sciences. Bioinformatics 31,
2888–2890 (2015).
100. Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
101. Crase, B., Liedloff, A. C. & Wintle, B. A. A new method for dealing with residual spatial
autocorrelation in species distribution models. Ecography 35, 879–888 (2012).
102. Portier, J., Gauthier, S., Robitaille, A. & Bergeron, Y. Accounting for spatial autocorrelation
improves the estimation of climate, physical environment and vegetation’s effects on
boreal forest’s burn rates. Landsc. Ecol. 33, 19–34 (2018).
103. Bjornstad, O. N. ncf: Spatial covariance functions. R package version 1.3-2 https://cran. (2022).
104. Bivand, R. S., Pebesma, E. J., Gómez-Rubio, V. & Pebesma, E. J. Applied spatial data
analysis with R (Springer, 2008).
105. Lundberg, S. M. etal. From local explanations to global understanding with explainable
AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
106. Lundberg, S. M. & Lee, S.-I. in Advances in Neural Information Processing Systems 30
(eds Guyon, I. etal.) (NIPS, 2017).
107. Wright, M. N., Wager, S. & Probst, P. Ranger: A fast implementation of random forests.
R package version 0.12 (2020).
108. Greenwell, B. fastshap: Fast approximate Shapley values. R package version 0.0.7 https:// (2020).
109. Roberts, D. R. etal. Crossvalidation strategies for data with temporal, spatial, hierarchical,
or phylogenetic structure. Ecography 40, 913–929 (2017).
110. Li, J. Assessing the accuracy of predictive models for numerical data: not r nor r2, why
not? Then what? PLoS ONE 12, e0183250 (2017).
111. Platt, J. in Advances in Large Margin Classiiers, Vol. 10 (eds Smola, A. J. etal.) 61–74
(MIT Press, 1999).
112. Niculescu-Mizil, A. & Caruana, R. in Proc. 22nd International Conference on Machine
Learning 625–632 (Association for Computing Machinery, 2005).
113. Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of
applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633 (2021).
114. FAO. Global Forest Resources Assessment 2020: Main Report (Food and Agriculture
Organization of the United Nations, 2020).
115. R Core Team. R: A Language and Environment for Statistical Computing. http://www. (R Foundation for Statistical Computing, 2019).
116. Bates, D., Maechler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using
lme4. J. Stat. Softw. (2015).
117. Kunzetsova, A., Brockhoff, P. & Christensen, R. lmerTest package: tests in linear mixed
effect models. J. Stat. Softw. (2017).
118. Zeileis, A. etal. betareg: Beta regression. R package version 3.1-4 https://cran.r-project.
org/web/packages/betareg/index.html (2021).
119. Wickham, H., Chang, W. & Wickham, M. H. ggplot2: Create elegant data visualisations
using the grammar of graphics. R package version 2
refmans/ggplot2/html/ggplot2-package.html (2016).
Acknowledgements The authors acknowledge the Bernina Foundation and DOB Ecology
for inancial support. C.S.D. thanks the Swiss National Science Foundation (Postdoctoral
Fellowship #TMPFP3_209925). C.S.D. also acknowledges funding from the Marc R. Benioff
Revocable Trust, which, in collaboration with the World Economic Forum, also made this
work possible. D.S.M. thanks the Swiss National Science Foundation (Ambizione Grant
#PZ00P3_193612). The authors thank L. Mo for assistance in compiling the author list and
G. Smith for early discussions about invasion severity. J.C.S. considers this work a contribution
to Center for Ecological Dynamics in a Novel Biosphere (ECONOVO), funded by Danish
National Research Foundation (grant DNRF173) and his VILLUM Investigator project ‘Biodiversity
Dynamics in a Changing World’, funded by VILLUM FONDEN (grant 16549). P.Schall thanks
the Deutsche Forschungsgemeinschaft (DFG) Priority Program 1374 Biodiversity Exploratories.
G.A. thanks the French National Forest Inventory and the Italian Forest Inventory; G.A. was
supported by the Italian National Recovery Plan through the National Biodiversity Future
Center. Financial support from Monafor network in Mexico was funded by the National Forestry
Commission (CONAFOR),Council of Science and Technology of the State of Durango
(COCYTED), the Natural Environment Research Council, UK (NERC; NE/T011084/1),and local
support of Ejidos and Comunidades.
Author contributions C.S.D., T.W.C., D.S.M. and C.M.Z. contributed the conceptualization
of the project. C.S.D. and D.S.M. contributed methodology, investigation and project
administration. All authors contributed to data collection and/or curation. C.S.D., D.S.M.,
N.M.R. and T.L. contributed visualization. T.W.C., D.S.M. and C.S.D. obtained funding and
provided supervision. Writing was led by C.S.D. and D.S.M., with review and editing contributed
by all other co-authors.
Funding Open access funding provided by Swiss Federal Institute of Technology Zurich.
Competing interests The authors declare no competing interests.
Additional information
Supplementary information The online version contains supplementary material available at
Correspondence and requests for materials should be addressed to Camille S. Delavaux.
Peer review information Nature thanks Blas Benito, Kevin Potter, Marcel Rejmánek and the
other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Reprints and permissions information is available at
Extended Data Fig. 1 | Map of non-native invasion probability. Map showing
probabilit y of non-native tree p resence base d on the probabilit y output
of the random for est classif ier (A, total n = 36 8,030plots, n per ite ration =
10,000plots) along side maps showin g uncertain ty in predict ions (B) including
local unc ertainty o f invasion probabil ity via boot strapped co efficie nt of
variation (i) an d extent of extrap olation as perc entage of bands o utside
univariate (ii) a nd multivariate (ii) t raining range. Re gions outsi de the Area of
Applicab ility are indica ted with dots .
Extende d Data Fig. 2 | Ma p of non-native inva sion probab ility insi de the
area of applicability. Map showing probabili ty of non-native tr ee presence
based on th e probability ou tput of the rando m forest classif ier (A, total
n = 368,030plot s, n per iteratio n = 10,000plots) alongsid e maps showing
uncert ainty in predic tions (B) includin g local uncer tainty of inva sion
probabilit y via boots trapped coef ficien t of variation (i) and ext ent of
extrapola tion as percen tage of bands out side univariat e (ii) and multivariat e
(ii) training ran ge. Regions ou tside the Area o f Applicabilit y are masked.
Extende d Data Fig. 3 | Me an nearest t axon distanc e (MNTD). Mean n earest
taxon dist ance is the average di stance to nea rest neighb or by branch leng th on
the tree, whi ch represent s redundancy in t he community ( A). For each specie s i,
the sum of all shor test dist ances d to each ot her taxa j is cal culated; these v alues
are then average d across the tot al species in th e tree (N). If invasion o ccurs via
non-native s being similar to th e native communit y, this would lead to the
expect ation that MNT D decrease s, increasing re dundancy (B). Convers ely,
if non-native inva sion occurs v ia non-natives be ing dissimilar to t he native
communit y, this would lead to the exp ectation t hat MNTD incre ases, reduci ng
redundanc y (C). Taxon D represen ts a non-native add ition to the comm unity.
Extende d Data Fig. 4 | Nat ive diversity m ediates de gree of non- native
invasion. Variable impor tance (SHAP val ues) of all variables i ncluded in
random fores t models, ordere d from greates t to least impor tance along side
influe nce of distan ce to ports, na tive richness a nd native redunda ncy on
invasion sever ity (propor tion of non-native p lant species) for (A) phyl ogenetic
diversity a nd (B) functional d iversity glob al models (phyloge netic n = 3,498
plots; func tional n = 3,3 68plots). All results s hown are from rando m forest
models. N ote that y-axis ran ges differ among p anels, with th e variable
importance plots represent ing the corresponding magnitude.
Extende d Data Fig. 5 | Varia ble impor tance for non- native invasion s trategy.
Variable impor tance (SHA P values) of all variabl es included in ran dom forest
models, ord ered from great est to least imp ortance al ongside inf luence of
native richness, mean annual temperature and mean annual precipitation on
invasion str ategy for (A) phylo genetic diversi ty and (B) functi onal diversity
global mod els (phylogenet ic n = 3,498plots; functio nal n = 3,368plots).
All result s shown are from rand om forest models . Note that y-axis ran ges
differ amon g panels, with th e variable impor tance plots re presenting th e
corresponding magnitude. Error bands represent 95% confidence intervals.
Extende d Data Fig. 6 | Varia ble import ance for analys es using da ta
down-sampled without preferentially retaining invaded plots. Variable
import ance (SHAP value s) for all variables incl uded in random fore st models,
ordered from g reatest to lea st importa nce for (A) non-native pre sence,
(B) richness , and (C) abundance, eac h for (i) phylogenetic diver sity and (ii)
functio nal diversity gl obal models (pre sence: phylogen etic n = 18,898;
functio nal n = 18,611, rich ness: phylogen etic n = 840plots; fu nctional
n = 823plots, ab undance: phylo genetic n = 840plo ts; function al n = 823plots).
All result s shown are from ran dom forest model s with down-sam pled data, but
without pre ferentially ret aining invaded pl ots. Error band s represent 95%
confidence intervals.
