ArticlePDF Available

Species delimitation with gene flow: A methodological comparison and population genomics approach to elucidate cryptic species boundaries in Malaysian Torrent Frogs



Accurately delimiting species boundaries is a non-trivial undertaking that can have significant effects on downstream inferences. We compared the efficacy of commonly-used species delimitation methods (SDMs) and a population genomics approach based on genome-wide single nucleotide polymorphisms (SNPs) to assess lineage separation in the Malaysian Torrent Frog Complex currently recognized as a single species (Amolops larutensis). First, we used morphological, mitochondrial DNA and genome-wide SNPs to identify putative species boundaries by implementing non-coalescent and coalescent-based SDMs (mPTP, iBPP, BFD*). We then tested the validity of putative boundaries by estimating spatiotemporal gene flow (fastsimcoal2, ABBA-BABA) to assess the extent of genetic isolation among putative species. Our results show that the A. larutensis complex runs the gamut of the speciation continuum from highly divergent, genetically isolated lineages (mean Fst = 0.9) to differentiating populations involving recent gene flow (mean Fst = 0.05; Nm > 5). As expected, SDMs were effective at delimiting divergent lineages in the absence of gene flow but overestimated species in the presence of marked population structure and gene flow. However, using a population genomics approach and the concept of species as separately evolving metapopulation lineages as the only necessary property of a species, we were able to objectively elucidate cryptic species boundaries in the presence of past and present gene flow. This study does not discount the utility of SDMs but highlights the danger of violating model assumptions and the importance of carefully considering methods that appropriately fit the diversification history of a particular system. This article is protected by copyright. All rights reserved.
Species delimitation with gene flow: A methodological
comparison and population genomics approach to elucidate
cryptic species boundaries in Malaysian Torrent Frogs
Kin Onn Chan
Alana M. Alexander
L. Lee Grismer
Yong-Chao Su
Jesse L. Grismer
Evan S. H. Quah
Rafe M. Brown
Biodiversity Institute and Department of
Ecology and Evolutionary Biology,
University of Kansas, Lawrence, KS, USA
Department of Biology, La Sierra
University, Riverside, CA, USA
Department of Biomedical Science and
Environmental Biology, Kaohsiung Medical
University, Kaohsiung City, Taiwan
Department of Biological Sciences, Auburn
University, Auburn, AL, USA
La Kretz Center for Californian
Conservation Science, Institute of the
Environment and Sustainability, University
of California Los Angeles, Los Angeles, CA,
School of Biological Sciences, Universiti
Sains Malaysia, Penang, Malaysia
Kin Onn Chan, University of Kansas,
Lawrence, KS, USA.
Funding information
National Geographic Society, Grant/Award
Number: 9722-15
Accurately delimiting species boundaries is a nontrivial undertaking that can have
significant effects on downstream inferences. We compared the efficacy of com-
monly used species delimitation methods (SDMs) and a population genomics
approach based on genomewide single-nucleotide polymorphisms (SNPs) to assess
lineage separation in the Malaysian Torrent Frog Complex currently recognized as a
single species (Amolops larutensis). First, we used morphological, mitochondrial DNA
and genomewide SNPs to identify putative species boundaries by implementing
noncoalescent and coalescent-based SDMs (mPTP, iBPP, BFD*). We then tested
the validity of putative boundaries by estimating spatiotemporal gene flow (FASTSIM-
COAL2, ABBA-BABA) to assess the extent of genetic isolation among putative spe-
cies. Our results show that the A. larutensis complex runs the gamut of the
speciation continuum from highly divergent, genetically isolated lineages (mean
=0.9) to differentiating populations involving recent gene flow (mean F
>5). As expected, SDMs were effective at delimiting divergent lineages in the
absence of gene flow but overestimated species in the presence of marked popula-
tion structure and gene flow. However, using a population genomics approach and
the concept of species as separately evolving metapopulation lineages as the only
necessary property of a species, we were able to objectively elucidate cryptic spe-
cies boundaries in the presence of past and present gene flow. This study does not
discount the utility of SDMs but highlights the danger of violating model assump-
tions and the importance of carefully considering methods that appropriately fit the
diversification history of a particular system.
Amolops, FASTSIMCOAL2, gene flow, migration rate, single-nucleotide polymorphism, site
frequency spectrum
Delimiting species boundaries is a fundamental component of sys-
tematic biology that forms the framework for understanding the
evolutionary processes that generate biodiversity (Mayr, 1968). As
such, accurately delimiting species boundaries is a nontrivial step
that can have cascading ramifications (Veach, Di Minin, Pouzols, &
Moilanen, 2017). Species delimitation is usually performed using
Received: 3 March 2017
Revised: 12 June 2017
Accepted: 1 August 2017
DOI: 10.1111/mec.14296
Molecular Ecology. 2017;116. ©2017 John Wiley & Sons Ltd
certain properties of a species as criteria for assessing lineage inde-
pendence. The most commonly used criteria include phenotypic dis-
tinctiveness, molecular divergence and phylogenetic placement
(Brown & Stuart, 2012; Leavitt, Moreau, & Lumbsch, 2015; Leliaert
et al., 2014; de Queiroz, 2007; Tobias et al., 2010; Wiens & Penkrot,
2002). These traditionalproperties can be useful in delimiting allo-
patric (Chan, Grismer, & Grismer, 2011), phenotypically distinct (Gris-
mer et al., 2010) and genetically distant lineages where barriers to
gene flow or sufficient time have passed for fixed character differ-
ences to accumulate (Chan, Grismer, Zachariah, Brown, & Abraham,
2016; Grismer et al., 2013). However, cryptic lineages that occur in
sympatry, have similar niches and are not readily distinguishable phe-
notypically such as those characterized by recent/rapid radiations
can be harder to diagnose because divergent lineages no longer con-
nected by gene flow cannot be easily distinguished from the local
population structure within such lineages, forming a hierarchy of
genetic differentiation and divergence (Barley, White, Diesmos, &
Brown, 2013; Carstens, Pelletier, Reid, & Satler, 2013; Rannala,
2015; Sukumaran & Knowles, 2017). In such cases, traditional crite-
ria are limited in utility for assessing lineage separation (de Queiroz,
2005) and, if not implemented with caution, can lead to erroneous
results (Carstens et al., 2013).
Advances in genomic sequencing and bioinformatics have led to
the ability to detect population structure between closely related
populations at unprecedented resolution (Benestan et al., 2015;
Candy et al., 2015; Larson et al., 2014; Leslie et al., 2015). One of
the challenges with such data sets is to distinguish between struc-
ture that is associated with intraspecific variation from that resulting
from speciation (Sukumaran & Knowles, 2017). Model-based meth-
ods make simplifying assumptions about certain parameters (e.g.,
gene flow, population size) during the speciation process, and range
in complexity from noncoalescent, sequence-based methods that
model speciation in terms of number of substitutions (Zhang, Kapli,
Pavlidis, & Stamatakis, 2013) to highly parameterized Bayesian mod-
els based on the multispecies coalescent (Yang & Rannala, 2010) that
allow for the integration of multiple data types into a single model-
based framework (Sol
ıs-Lemus, Knowles, & An
e, 2015). The efficacy
of each method depends on how well the model fits the data, and
processes that violate model assumptions such as gene flow (Bur-
brink & Guiher, 2015; Sousa & Hey, 2013; Streicher et al., 2014) or
spatial autocorrelation (Meirmans, 2012; Reeves & Richards, 2011)
can yield inaccurate species delimitations.
Implicit within most species definitionsthat species are sepa-
rately evolving metapopulation lineages (de Queiroz, 2007; Simpson,
1961; Wiley, 1978)is the expectation that populations within a
metapopulation lineage are connected by gene flow, but remain dis-
tinct from other such lineages (Frost & Hillis, 1990; Petit & Excoffier,
2009; de Queiroz, 2005). Levels of gene flow among populations are
not only influenced by intrinsic traits (e.g., dispersal ability) but also
extrinsic spatial and temporal processes that shape genetic patterns
across a landscape (Richardson, Brady, Wang, & Spear, 2016). If
these processes are overlooked, inferences of lineage boundaries
may fail to recognize historical population associations (Knowles &
Carstens, 2007) or may be unable to distinguish true discontinuities
(i.e., lineage separation) from variation that occurs within a species
as a result of other phenomena such as continuous geographic clines
or isolation by distance (Medrano, L
opez-Perea, & Herrera, 2014; de
Queiroz, 2007; Sexton, Hangartner, & Hoffmann, 2014). Moreover,
standard phylogenetic estimation methods have been shown to pro-
duce highly supported, erroneous topologies when gene flow is pre-
sent, thereby invalidating downstream inferences that are based on
these phylogenies, such as the identification of terminal mono-
phyletic groups (Reeves & Richards, 2007; Rosenberg, 2007).
Here, we compared a wide range of commonly used species
delimitation methods and a population genomics approach to eluci-
date cryptic species boundaries in an understudied South-East Asian
frog complex. The South-East Asian Sundaland Biodiversity Hotspot
harbours one of the highest concentrations of endemic plants and
vertebrates on the planet (Mittermeier, Myers, Thomsen, da Fonseca,
& Olivieri, 1998; Myers, Mittermeier, Mittermeier, da Fonseca, &
Kent, 2000). Unfortunately, only 7.8% of Sundalands original pri-
mary forest remains and some estimates suggest that up to 42% of
its biodiversity could be lost by 2100 (Myers et al., 2000; Sodhi,
Koh, Brook, & Ng, 2004). Consequently, systematic research in this
region has focused heavily on discovering and describing new spe-
cies before they are lost. This is epitomized by the rapid surge of
new amphibian and reptile descriptions over the last 15 years,
resulting in more than a 20% increase in species richness (Brown &
Stuart, 2012; Giam et al., 2012; Grismer, 2011). In virtually every
one of these descriptions, species boundaries were delimited based
on morphology and/or mitochondrial DNA (mtDNA) (e.g., Chan,
Brown, Lim, & Grismer, 2014; Chan & Grismer, 2010; Chan, Grismer,
& Brown, 2014; Grismer et al., 2012, 2014). Given the present
South-East Asian biodiversity crisis (Miettinen, Shi, & Liew, 2011;
Sodhi et al., 2004; Wilcove, Giam, Edwards, Fisher, & Koh, 2013)
and the need to rapidly inventory the regions diversity, it is impor-
tant to tackle this problem using all available methods, including not
only traditional morphology and mtDNA-based approaches but also
species delimitation approaches that are better suited to elucidating
cryptic lineage diversity. Such methods, which importantly can take
gene flow into consideration, are best implemented using genomic-
scale dataincreasingly available even for nonmodel organisms. Tor-
rent frogs of the genus Amolops are represented by 51 species that
collectively range from Tibet, northeastern India, southern China,
southward throughout Indochina and the Thai-Malay Peninsula
(Frost, 2015). The bulk of species diversity lies in southern China
and Indochina, yet only one species, Amolops larutensis, occurs in
extreme southern Thailand and Peninsular Malaysia. This species has
never been studied outside the context of higher-level phylogenetics
where it was represented by samples from only two named localities
(Hasan et al., 2014; Matsui et al., 2006; Stuart, 2008), and as a
result, little is known of its intraspecific phenotypic and genetic
Our sampling of A. larutensis from new localities throughout
Peninsular Malaysia revealed subtle phenotypic variation among geo-
graphic populations, leading us to hypothesize that A. larutensis
constitutes a complex of cryptic lineages. Because no prior data
were available, we used a two-step approach to species delimitation
by applying widely used species delimitation analyses (Carstens
et al., 2013; Rannala, 2015) using a variety of data types including
morphology, mtDNA and genomewide single-nucleotide polymor-
phisms (SNPs) to form preliminary hypotheses of species boundaries.
We then tested these putative boundaries using a rigorous popula-
tion genomics framework. Specifically, we diagnose lineage separa-
tion by assessing spatiotemporal gene flow under the general
concept of species as a separately evolving metapopulation lineage
and treat this as the only necessary property of a species (de
Queiroz, 2007). Therefore, the objectives of this study are twofold:
(i) evaluate the efficacy of commonly used species delimitation meth-
ods in assessing lineage separation in cryptic species; (ii) determine
whether population genomics can be an effective tool in elucidating
cryptic species boundaries.
Sampling, data collection and accessibility
Our total data set consisted of 225 samples for which some combi-
nation of morphological, mtDNA and SNP data was available
(Tables S1 and S2). Morphological data were obtained from a subset
of 141 vouchered museum specimens examined from collections at
La Sierra University Herpetological Collection, Riverside, California;
Zoological Reference Collection, Lee Kong Chian Natural History
Museum, Singapore; and University of Kansas Natural History
Museum (KU), Lawrence, Kansas (Table S1).
DNA for mitochondrial and genomic sequencing was extracted
from liver tissue using the Qiagen DNeasy Blood & Tissue Kit. A
total of 117 samples (including 79 of the 141 samples scored for
morphology) were sequenced for mtDNA and genomewide SNPs
(Table S2). These samples were chosen from populations that maxi-
mized geographic coverage and altitudinal variation across all major
mountain ranges. For mtDNA, we sequenced the 16S rRNA-encod-
ing gene using primers from Evans et al. (2003) and sequencing pro-
tocol from McLeod (2010). Raw sequence data were aligned using
the MUSCLE algorithm, and resulting alignments were subsequently
refined by eye in GENEIOUS PRO version 5.3 (Kearse et al., 2012). In
addition to these samples, 49 16S rRNA Amolops sequences were
obtained from GenBank to assess the monophyly and phylogenetic
placement of Peninsular Malaysian populations. Samples and corre-
sponding GenBank Accession nos are listed in Table S2.
A subset of 95 samples (including 67 samples scored for mor-
phology, 18 with mtDNA data; Table S2) were selected for genomic
sequencing of nuclear DNA in the form of genomewide SNPs using
a single-end multiplexed shotgun genotyping protocol (Andolfatto
et al., 2011). Briefly, 500 ng of DNA from each sample was digested
with NdeI (New England Biosystems), ligated with a sample-specific
barcode and then pooled in sets of 48 samples and run through a
Pippin Prep
(Sage) to select fragments between 400 and 500 bp.
Genomic samples were sequenced in one lane of the Illumina Hiseq
2500 platform at the Genome Sequencing Core Facility at the
University of Kansas. Loci were subsequently assembled and filtered
using the program PYRAD v.3.0.5 (Eaton, 2014). The maximum number
of low quality, undetermined sites (N) in filtered sequences was set
to 4, and proportion of shared polymorphic sites in a locus was set
at 10% (Eaton, 2014). Because overly stringent similarity thresholds
have been shown to cause oversplittingby splitting orthologous
reads into multiple loci (Catchen, Hohenlohe, Bassham, Amores, &
Cresko, 2013; Harvey et al., 2015; Ilut, Nydam, & Hare, 2014),
whereas more liberal thresholds were found to have minimal bias
effects on inference (Ilut et al., 2014; Rubin, Ree, & Moreau, 2012),
we employed a relatively relaxed threshold of 88% similarity
between reads when clustering loci. We then used two different set-
tings for minimum depth of coverage (min. read depth =5 and 10)
and minimum taxon coverage (30% and 50% missing samples per
locus) to produce four SNP data sets (Table 1). To avoid linkage
across sites within the same locus, the single SNP with the highest
sample coverage was selected from each locus.
Establishing putative species boundaries
Putative species boundaries were established using the following
species delimitation framework based on traditional and widely used
1. We estimated mtDNA phylogenies and identified monophyletic
lineages. Each monophyletic lineage that corresponded to a dis-
crete or recognizable geographic region was defined as an opera-
tional taxonomic unit (OTU).
2. We calculated uncorrected genetic distance between OTUs
based on the mtDNA sequence alignment.
3. We assessed morphological variation and distinctiveness between
OTUs using multivariate analyses.
4. Finally, given the support of geographic data, clades recovered
with our concatenated SNP phylogenies and the sNMF popula-
tion structure assignments (methods and results discussed below)
for the OTUs described using mtDNA, we then performed non-
coalescent and coalescent-based species delimitation analyses to
establish putative species boundaries for downstream hypothesis
TABLE 1 Summaries of the four different SNP data sets
generated using different values for minimum read depth coverage
and percentage of missing data
Total vari-
able sites
Min. #
# loci
5 50 94,313 70,833 1,761 17,831 17,123
5 25 53,208 43,191 1,572 7,572 7,544
10 50 65,541 49,821 112 12,478 11,951
10 25 32,253 26,122 97 4,826 4,744
PIS, Parsimony Informative sites.
Min. # and Max. # loci give the minimum and maximum number of loci
observed within an individual for each data set.
Phylogenetic estimation
Bayesian and maximum-likelihood (ML) methods were used to infer
phylogenetic relationships from mtDNA. Bayesian inference was
implemented in the program MRBAYES 3.2.6 (Ronquist et al., 2012)
using a reversible jump MCMC +Γsubstitution model and default
priors. Four independent MCMC runs (10,000,000 generations and
four chains per run) were combined and assessed for convergence
using the program TRACER v1.6 (Rambaut, Suchard, Xie, & Drummond,
2014), and a 50% majority rule consensus tree was produced by
excluding the first 25% of sampled trees as burn-in. The program IQ-
TREE (Nguyen, Schmidt, von Haeseler, & Minh, 2014) was used to
perform ML analyses. The Bayesian information criterion was used
to select the most appropriate substitution model, and branch sup-
port was assessed using 10,000 ultrafast bootstrap approximation
replicates (Minh, Nguyen, & von Haeseler, 2013).
For SNP data, a ML phylogeny was also estimated using IQ-TREE.
We applied an ascertainment bias correction using the ASC model
(Lewis, 2001), and branch support was assessed using 10,000 ultra-
fast bootstrap approximation replicates. Additionally, we estimated a
species tree under the Bayesian multispecies coalescent framework
implemented in the program SNAPP (Bryant, Bouckaert, Felsenstein,
Rosenberg, & Roychoudhury, 2012) through BEAST v.2.3.1 (Drum-
mond & Bouckaert, 2015). We used the previously identified OTUs
as a priori species assignments and the following parameter settings:
mutation rates (uand v) and the shape parameter for the gamma dis-
tribution prior on population sizes (alpha) were set at 1.0; the beta
scale parameter was set at 333 (calculated from the mean value of
the total number of polymorphic sites); and the speciation rate prior
(lambda) was sampled from a broad gamma distribution of alpha =2
and beta =200. The MCMC chain was run for 10,000,000 genera-
tions, sampling every 1,000 states, and stationarity was assessed in
the program TRACER v1.6. The posterior distribution was considered
adequately sampled when effective sample size values for parame-
ters were >200.
Morphological variation and species
Nine continuous morphological characters were measured from adult
specimens: snout-vent-length (SVL), head length, head width,
internarial distance, snout length, forearm length, femur length, tibia
length and third finger disc width following Chan et al. (2016). Due
to pronounced sexual size dimorphism, male and female measure-
ments were analysed separately. Characters were adjusted for allo-
metric growth using the following equation: X
), where X
=adjusted value; X=measured value;
b=unstandardized regression coefficient for each OTU; SVL =mea-
sured snout-vent-length; SVL
=overall average SVL of all OTUs
(Lleonart, Salat, & Torres, 2000; Thorpe, 1975, 1983; Turan, 1999).
Adjusted variables were then log-transformed prior to downstream
analyses. We performed a principal components analysis (PCA) on
this adjusted morphological data set to find the best low-dimensional
representation of variation in the data. Components with eigenvalues
above 1.0 were retained in accordance to Kaisers criterion (Kaiser,
1960). To further characterize population clustering, a discriminant
analysis (DA) of principal components (DAPC) was performed to find
the linear combinations of variables that have the largest between-
group variance and the smallest within-group variance. The DAPC
analysis relies on data transformation using PCA as a prior step to a
DA, ensuring that variables submitted to the DA analysis are uncor-
related and that their number is less than that of analysed individuals
(Jombart, Devillard, & Balloux, 2010).
Uncorrected pairwise p-distances were calculated from the mito-
chondrial sequence alignments using the program PAUP*(Swofford,
2002). We then carried out species delimitation based on mtDNA
using a method that has been shown to perform well with single-
locus data (Tang, Humphreys, Fontaneto, & Barraclough, 2014). The
multirate Poisson tree processes (mPTP) is a noncoalescent, ML,
sequence-based method that models speciation in terms of number
of substitutions (Zhang et al., 2013). This method identifies changes
in the tempo of branching events, where the number of substitutions
between species is assumed to be significantly higher than the num-
ber of substitutions within species. Additionally, the model incorpo-
rates different levels of intraspecific genetic diversity deriving from
differences in either evolutionary history or sampling of each species
(Zhang et al., 2013). During phylogenetic inference, identical
sequences are assigned very short nonzero branch lengths to retain
the binary shape of the tree. Because this program requires a binary
phylogeny, the 50% majority consensus tree estimated from the
Bayesian analysis was used as the input tree. Confidence of the
delimitation scheme was assessed using two independent MCMC
chains at 5,000,000 generations each. Support values indicate the
fraction of sampled delimitations in which a node was part of the
speciation process.
We jointly analysed morphological and mtDNA in a common
coalescent Bayesian framework using the program iBPP. This
method has been shown to improve the accuracy of species delimi-
tation by integrating phenotypic and genetic data (Sol
ıs-Lemus et al.,
2015). The iBPP analysis was performed with and without mtDNA
data to maximize the signal derived from phenotypic variation and
to evaluate the influence of mtDNA sequence data on this inte-
grated analysis. Male and female data sets were analysed separately
to avoid biases from sexual dimorphism. We used three different
combinations of priors for ancestral population size (h) and root age
) drawn from a gamma distribution specified as G(a,b), where ais
the shape and bis the rate parameter. All other divergence time
parameters were assigned the uniform Dirichlet prior (Fujita, Leache,
Burbrink, McGuire, & Moritz, 2012; Pyron, Hsieh, Lemmon, Emily, &
Hendry, 2016; Yang & Rannala, 2010). We chose a diffuse shape
parameter (a= 1 or 2) and parameterized bfor large ancestral popu-
lations and deep divergences, h~G(1, 10), t
~G(1, 10); small ances-
tral populations and shallow divergences h~G(2, 2000), t
2000); and large ancestral populations with shallow divergences h~
G(1, 10), t
~G(2, 2000). Both rjMCMC algorithms were imple-
mented: Algorithm 0 with e=5; Algorithm 1 with e=2 and m=1.
Two independent runs were performed for each algorithm and prior
combination with a chain length of 50,000 sampled every 50 genera-
tions, discarding 1,000 generations as burn-in. MCMC convergence
was assumed when results were the same between multiple runs
using the two algorithms (Yang, 2015).
Species delimitation analysis on genomic SNPs was performed
using the Bayes factor delimitation method (BFD*; Leach
e, Fujita,
Minin, & Bouckaert, 2014). Different species delimitation models
were constructed by lumping and splitting OTUs based on plausible
biogeographic scenarios and phylogenetic topologies derived from
prior phylogenetic analyses (Table 2). The marginal likelihood of each
model was estimated via path sampling using 48 steps, an alpha of
.3, and a MCMC chain length of 100,000 with a preburnin of
100,000 (Leach
e et al., 2014). Natural log Bayes factors (BF) were
used to compare the log marginal likelihoods (MLE) of competing
models using the equation BF =2[MLE(model1) MLE(model2)],
where model 1 was the model with the largest number of species. A
positive BF value indicates support for model 1 and a negative value
support for model 2.
Validation of putative species boundaries
using genomewide SNPs
Population structure and differentiation
Population structure was characterized by estimating individual
ancestry coefficients that represent the proportions of an individual
genome that originate from multiple ancestral gene pools. Calcula-
tions were implemented in the program sNMF based on sparse non-
negative matrix factorization and least-squares optimization (Frichot,
Mathieu, Trouillon, Bouchard, & Francßois, 2014; Kim & Park, 2007).
Ancestry coefficients estimated using the sNMF method have been
shown to produce results that are comparable to other widely used
programs such as ADMIXTURE and STRUCTURE, but have the advantage of
estimating homozygote and heterozygote frequencies and avoiding
HardyWeinberg equilibrium assumptions (Frichot et al., 2014). We
calculated ancestry coefficients for 116 ancestral populations (K)
using 100 replicates for each K. The preferred number of Kwas cho-
sen using a cross-entropy criterion based on the prediction of
masked genotypes to evaluate the error of ancestry estimation. The
sNMF method was implemented in the Rpackage LEA (Frichot &
Francßois, 2015).
To determine whether genetic structure was spatially autocorre-
lated, we conducted a Mantel test by examining the correlation
between genetic distance and Euclidean geographic distance. Corre-
lation values were compared against a distribution of permuted val-
ues based on 1,000 replicates simulated under the absence of spatial
structure. The Mantel test was performed using the Rpackage ADE-
GENET 2.0.1 (Jombart, 2008).
Genetic distances between population pairs were estimated using
and JostsD(Jost, 2008; Meirmans & Hedrick, 2011;
Whitlock, 2011; Wright, 1951). Population differentiation was tested
with analysis of molecular variance, AMOVA (Excoffier, Smouse, &
Quattro, 1992) using the number of different alleles (F
) based on
the infinite allele model (Weir & Cockerham, 1984), nesting individu-
als within populations and populations within the eastern (East) vs.
central +western (West) mountain ranges. Significance was assessed
using 1,000 permutations. These calculations were performed using
the program GENODIVE v2.0b27 (Meirmans & Van Tienderen, 2004).
Hybridization and demographic analyses
Hybridization at the contact zone was investigated by calculating the
hybrid index (Buerkle, 2005). East 1 and Larutensis were selected as
parental populations while West 1 was designated as the putative
hybrid population.
Population connectivity was assessed by estimating the effective
number of migrants exchanged between populations per generation
) using FASTSIMCOAL2 v.52.21 (Excoffier, Dupanloup, Huerta-
anchez, Sousa, & Foll, 2013). Due to computational constraints, we
TABLE 2 Results of the BFD*analysis
based on nine species delimitation models
ranging from 2 to 5 species within the
western clade
# Species Model MLE BF Rank
5 (L) (W1) (W2) (W3) (W4) 99,906.30527 1
4 (L) (W1) (W2) (W3 +W4) 103,884.8744 3,978.57 2
4(L+W1) (W2) (W3) (W4) 107,029.7769 7,123.47 3
3 (L) (W1) (W2 +W3 +W4) 107,736.139 7,829.83 4
3 (L) (W1 +W2) (W3 +W4) 108,921.1603 9,014.86 5
3(L+W1) (W2) (W3 +W4) 111,397.7147 11,491.41 6
2 (L) (W1 +W2 +W3 +W4) 113,053.1408 13,146.84 7
2(L+W1) (W2 +W3 +W4) 115,914.2886 16,007.98 8
2(L+W1 +W2) (W3 +W4) 116,802.3151 16,896.01 9
Models were split or lumped according to plausible biogeographic scenarios and phylogenetic
topologies. Competing models were compared and ranked using log marginal likelihood estimates
(MLE) and Bayes factors (BF) following the equation: BF =29(MLE of model 1 MLE of model
2), where model 1 was the model with five species. A positive BF value indicates support for model
1 over model 2 and vice versa. Model abbreviations are L =Larutensis, W =West.
only analysed populations form the western clade. A folded site fre-
quency spectrum (SFS) was obtained with custom R-code (Alexander,
2017) and dAdIv1.7.0 (Gutenkunst, Hernandez, Williamson, & Busta-
mante, 2009), projecting down population sizes to maximize the
number of segregating sites using custom R-code (Alexander, 2017).
Four scenarios were examined: contemporary and historical migra-
tion, contemporary migration only, historical migration only, no
migration (Fig. S5; Table S5). Population divergence times followed
Chan and Brown (2017), with the exception of the timing of the
divergence of West 2 from West 3/West 4, which was set as half-
way between the coalescence of all populations and the divergence
of West 3/West 4 (as the SNP topology differs from the mtDNA for
this lineage). For each scenario, 50 replicate FASTSIMCOAL2 runs were
carried out with the following settings: n 100,000 N 100,000 m
multiSFS qM 0.001 l10L 40. Initial prior distributions fol-
lowed a uniform distribution based on the population-specific theta
(distribution range: one order of magnitude lower and higher than
theta estimate) estimated using the program GENEPOP (Rousset, 2008),
and the data were modelled as FREQ, with the number of indepen-
dent chromosomes equal to the number of nonmonomorphic SNPs
in the SFS (4018). We used a mutation rate of 1.91 910
ing Crawford (2003) and assumed vicariant splits between popula-
tions (i.e., the number of simulated individuals remained constant
through time). The range in parameter estimates for the initial 50
runs were used as the prior distributions for the next run, and this
process continued until no further increase in likelihood was
detected. Using the parameter values from the run with the highest
likelihood, an additional run with n/N=1,000,000 was carried
out to more accurately estimate the likelihood. The best-fitting sce-
nario was then assessed by Akaikes information criterion (AIC)
score. The parameter estimates for the best-fitting scenario were
used to simulate 100 parametric bootstraps of the SFS. The data
type was changed to DNA, with a mutation rate of 1.91 910
and the number of chromosomal segments equalling the total num-
ber of sites in the SFS (including monomorphic sites: 5695). The
length of the chromosomal segments was set at 100 bp, and the
mutation rate (to three significant figures) was adjusted by trial and
error until the closest match to the number of nonmonomorphic in
the observed SFS was obtained. After the bootstrap replicates were
generated, the *.tpl and *.est files that led to the run with the high-
est likelihood in the initial screening runs of the best scenario were
then used with the bootstrap replicates to obtain confidence inter-
vals for the parameter estimates, discarding the 2.5% lowest and
highest estimates for each parameter.
To differentiate between introgression and incomplete lineage
sorting, we used PattersonsD-statistic (ABBA-BABA test), based on
the frequencies of discordant SNP genealogies in a pectinate four-
taxon tree [(((P1,P2),P3),O)]. This test assumes that two SNP patterns,
ABBAand BABA,should be equally frequent under a scenario of
incomplete lineage sorting without gene flow, where Adenotes the
ancestral allele and B,the derived allele. An excess of ABBA or
BABA patterns would therefore be indicative of introgression (Dur-
and, Patterson, Reich, & Slatkin, 2011; Patterson et al., 2012). We
calculated Dover combinations of four taxa that fitted the four-taxon
tree configuration across all plausible topologies inferred from our
phylogenetic analyses. Population pairs that did not conform to any
plausible relationships were not included in the test. Four samples
from each population were randomly chosen to form taxon sets.
Ingroup taxa (P1P3) were then iterated over all possible combina-
tions of individuals that were chosen, while samples were pooled into
groups for the outgroup population (O). This approach allows the use
of any locus shared by the three sampled ingroup taxa and at least
one outgroup, effectively down-weighting Dif the ancestral allele
was not fixed across multiple outgroup samples, thus making it a
more conservative test. The standard deviation of Dwas calculated
from 200 bootstrap replicates, and the observed Dwas converted to
aZ-score measuring the number of standard deviations it deviated
from 0. Significance was assessed using a p-value at a=.01 after the
HolmBonferroni correction for multiple testing (number of possible
combinations fitting the given species tree hypothesis; Eaton & Ree,
2013; Eaton, Hipp, Gonz
ıguez, & Cavender-Bares, 2015).
The D-statistic test was implemented in PYRAD.
Phylogenetic relationships
Mitochondrial DNA
Both Bayesian and ML phylogenetic analyses on mtDNA produced
congruent topologies at most major nodes, inferred Peninsular
Malaysian Amolops as a monophyletic clade sister to A. cremnobatus
from Indochina, and had identical topologies within the Peninsular
Malaysian subclade (Fig. S1). Within the Peninsular Malaysian sub-
clade, all nodes were highly supported in the ML tree (bootstrap
>90%, Figure 1a), whereas in the Bayesian tree, one node received
relatively low support (posterior probability 0.4; Fig. S1). Two highly
divergent (1416% p-distance, 16S; Fig. S2), reciprocally mono-
phyletic Peninsular Malaysian Amolops clades were recovered with
high support by both methods. These corresponded to populations
from the eastern vs. western +central mountain ranges (hereafter
referred to simply as eastern and western clades; Figure 1). In the
eastern clade, we defined two genetically distinct and reciprocally
monophyletic OTUs (separated by 78% p-distance, 16S) that corre-
sponded to populations from the northeastern mountain range (East
1) and southeastern mountain range (East 2). In the western clade,
five subclades were recovered (15% p-distance, 16S). We desig-
nated these OTUs as Larutensis (type locality of A. larutensis), West
1, West 2, West 3 and West 4 (Figure 1).
Genomewide SNPs
After quality control filtering of the initial 153 million reads obtained
across all samples, a total of c. 130 million reads were retained. The
total number of unlinked SNPs in the final data sets that were used
for downstream analyses ranged from 4,744 to 17,123 (Table 1).
Maximum-likelihood analyses on concatenated SNP data sets led
to four different phylogenies depending on how loci were filtered
(Fig. S3). At a minimum depth of 5% and 50% missing data, all major
splits were highly supported; however, a topology differing from the
mtDNA tree was produced (Figure 1b). The SNP data set at a mini-
mum depth of five and less missing data (30%) inferred a similar phy-
logeny, albeit with low support for the relationships among
populations within the western clade (Fig. S3). Phylogenies con-
structed from the SNP data sets with a minimum depth of 10 failed
to recover the West 2, West 3 and West 4 populations as mono-
phyletic groups. Furthermore, support for deeper nodes was signifi-
cantly lower. Topological placement of the East 1 and East 2
populations were congruent and highly supported throughout all phy-
logenetic analyses and data sets, including mtDNA. One sample from
the contact zone (denoted by an asterisk in Figure 1) was embedded
within West 1 in the mtDNA phylogeny but recovered as a distinct
lineage within the eastern clade across all SNP phylogenies, indicating
a putative hybrid. The SNAPP analysis failed to converge when includ-
ing populations from both eastern and western clades. As the rela-
tionships of populations in the eastern clade were highly supported
in all other analyses, we performed a separate analysis on a data set
that only included populations from the western clade. This analysis
converged and produced a maximum clade credibility tree topology
similar to the concatenated SNP ML phylogeny (min. depth =5, 50%
missing data) with 1.0 posterior probability at each node (not shown).
Morphological variation and putative species
In both the male and female morphological data sets, the first three
principal components (PCs) had eigenvalues above 1.0 and were
retained for subsequent analyses. These PCs captured 69% (males)
and 78% (females) of the total variance (Table S3). In males, East 2
showed some separation along the first and third (but not second)
axes, whereas in females, the East 2 formed a distinct, nonoverlap-
ping cluster along the first axis but was undifferentiated along the
second and third axes (Figure 2). When variances between OTUs
were maximized, the DAPC analysis also isolated the East 2 as a dis-
tinct cluster in both males and females but showed no separation
for the other populations. No clear separation was detected in either
sex across the other populations in the PCA or DAPC analysis (Fig-
ure 2 and Fig. S4).
FIGURE 1 Ultrametric maximum-likelihood phylogenies inferred from (a) 1,466 bp of the 16S rRNA-encoding mitochondrial gene. All major
nodes were highly supported with >90% bootstrap; (b) 17,123 unlinked SNP loci filtered at a minimum depth of 5 and allowing for 50%
missing data. All major nodes were highly supported with >90% bootstrap. The asterisk (*) denotes the putative hybrid sample that was placed
in the western clade in the mtDNA phylogeny and the eastern clade in the SNP phylogeny. The distribution map (right) shows sampling
localities and examples of phenotypic differences between populations from the western (circles) and eastern (triangles) clades. The star
represents the type locality of Amolops larutensis, and the red box indicates the location of the putative contact zone between the eastern and
western clades. Inset: location of Peninsular Malaysia within South-East Asia
A total of five species were delimited using the mPTP species
delimitation method. East 1, East 2, West 3 and West 4 were delim-
ited as separate species with maximum average support values of
1.0, whereas Larutensis, West 1 and West 2 received low support
(0.003), suggesting that these OTUs should be lumped as a single
species (Figure 2).
All independent iBPP runs under both rjMCMC algorithms and
all combinations of priors produced the same results, indicating con-
vergence (Yang, 2015). Species delimitation results were similar
regardless of whether sequence data were included or excluded in
the analyses. In the male data set (with sequences included), all
OTUs were highly supported as distinct species (pp =1.0). For the
female data set, all OTUs were supported as distinct species (poste-
rior probability =1.0) with the exception of the split between West
1 and Larutensis, which was moderately supported (pp =0.7,
Figure 2).
Due to the previous lack of convergence in SNAPP analyses includ-
ing both western and eastern clade individuals, and because relation-
ships were unambiguous for the eastern clade, we restricted the
BFD*analysis to populations from the western clade only. Marginal
likelihood estimates improved as the number of species increased
and favoured the model that defined each population as a distinct
species. The second-ranked model favoured four species by lumping
West 3 and West 4 as a single species. However, when compared
to the five-species model, the BF value was high (+3,978), indicating
strong support for the five-species model (Table 2).
Validation of putative species boundaries
Population structure and differentiation
The population structure analysis (sNMF) on both SNP data sets at a
minimum depth of five inferred similar patterns of population struc-
ture and admixture, where K=2 split individuals into eastern and
western clusters. For the data set with 50% missing data, K=7 had
the lowest cross-entropy value (Figure 3), whereas the data set with
30% missing data inferred K=6 as the preferred number of genetic
clusters. Signatures of admixture were detected among populations
from the western clade. At the contact zone, the putative hybrid
sample appeared admixed between East 1 and West 1 genotypes.
Apart from the putative hybrid (further investigated below), no fur-
ther admixture was detected between the eastern and western
clades. This analysis also inferred an additional population at the
central region of the eastern mountain range (the southernmost East
FIGURE 2 Left: Principal components scores of morphological variables visualized as three-dimensional hypervolumes constructed using
multidimensional kernel density estimation. Geometry of hypervolumes is based on minimum convex polytopes, and axes show the first three
principal components and their proportion of variance. Right: Results of the mPTP and iBPP species delimitation analyses depicted on an
mtDNA cladogram. Values at internal nodes denote the average support value for the mPTP analysis followed by posterior probabilities from
the iBPP analysis for males and females, respectively
1 population in Figure 3). We refer to this subpopulation as East 1.2
in subsequent analyses, to distinguish it from the north East 1.1 sub-
Within the western clade, F
and JostsDvalues based on SNP
data were low among populations, ranging from 0.030.09
(mean =0.053) to 0.0010.003 (mean =0.002), respectively. These
values were higher among populations within the eastern clade, rang-
ing from 0.570.93 (mean =0.7) to 0.020.14 (mean =0.08) for F
and JostsD, respectively. Similarly, F
and JostsDvalues were also
high when western and eastern populations were compared with
each other (Table 3). Results of the AMOVAs analyses on populations
from the western clade showed that most of the variation (74%)
occurred within individuals, whereas in the eastern clade, most of the
variation (65%) occurred among populations. When populations were
nested within the western and eastern clades, most of the variation
(53%) was attributed to the eastern vs. western groupings (Table 4).
Hybridization and demographic analyses
The hybrid index analysis showed that one of seven samples (sample
ID 21011, previously identified as a putative hybrid above) within
the West 1 population was a hybrid between Larutensis and the
combined East 1 parental populations (h=0.549). The other six sam-
ples had h-values close to zero, indicating a strong affinity with
Larutensis and that it was very unlikely they were of hybrid origin
(Table S4).
For the FASTSIMCOAL2 analysis, the full migration model was the
best fit according to AIC. We examined a version of this model
where migration rates between populations were constrained to be
symmetrical, but it had a poorer fit to the data than the full migra-
tion model that allowed for asymmetrical migration rates (Table S5).
We therefore restrict our discussion to the full asymmetrical migra-
tion model only. Contemporary migration rates between Larutensis
and all other populations were low (N
=0.20.6), suggesting repro-
ductive isolation. Gene flow was highest between West 1 and West
=5.5) and West 3 and West 4 (N
=5.8), whereas relatively
low levels of gene flow were detected between West 2 and West 3
=1.0; Figure 4, Table 3). However, it should be pointed out that
the confidence intervals of all point estimates associated with this
model were wide (Table S5), suggesting denser sampling of the gen-
ome would be needed to accurately estimate parameters of this
parameter-rich model. Among the historical migration rates, an out-
lier was the very high rate of migration from West 2 into the ances-
tor of the Larutensis/West 1 populations (Table S5). This high
inferred gene flow could explain the discrepancy between the SNP
and mtDNA phylogenies, with the sister relationship of West 2 and
Larutensis/West 1 in the latter due to this introgression event.
The D-statistic was used to differentiate between introgression
and incomplete lineage sorting among adjacent populations within
the western clade, eastern clade, and between both western and
eastern clades (Figure 4, Table 3). Within the western clade, high
levels of introgression (significant for all 63/63 combinations) were
FIGURE 3 Estimated population structure as inferred by the sNMF analysis. Each individual is partitioned into K-coloured segments that
represent the proportions of an individuals genome that originate from one or multiple inferred genetic clusters coloured consistently with the
other figures (note the light yellow cluster was not detected using morphological PCA). Asterisk (*) indicates the putative hybrid sample at the
inferred contact zone between eastern and western clades (red box). The inset graph plots cross-entropy values (y-axis) vs. number of
ancestral populations (x-axis). K=2 splits individuals into eastern and western genotypes. Population structure for K=7 (the number of
clusters with the lowest cross-entropy value) is plotted on the map using the average ancestry coefficient values for each estimated population
detected between West 1 and West 2, while low levels of introgres-
sion were detected between West 2 and West 3 (significant for 28/
63 combinations). No introgression was detected between Larutensis
and West 1 or Larutensis and West 2. These results are congruent
with estimates from the FASTSIMCOAL2 analysis (Table 3). Introgression
between the sister lineages LarutensisWest 1 and West 3West 4
were not assessed because the D-statistic is unable to test for gene
flow between sister lineages P1 and P2 in a pectinate four-taxon
tree [(((P1,P2),P3),O)].
Within the eastern clade, low levels of introgression were
detected between East 1.1 and East 1.2 (significant in 8/23 combi-
nations), whereas introgression was not detected between East 1.2
and East 2. Introgression was also absent among adjacent
populations from the western and eastern clades, even between syn-
topic populations at the contact zone, excluding the hybrid sample
(Table 3; Figure 4).
Spatial autocorrelation was not detected when the Mantel test
was performed on the entire SNP data set (p=.242), but was signif-
icant when the test was performed separately on the eastern
(p=.014) and western (p=.009) clades (Fig. S6). Although spatial
autocorrelation can result in a correlation of genetic and geographic
distances, distant and divergent populations can also result in such a
pattern. To distinguish between these two scenarios, we used a non-
parametric approach by plotting both genetic and geographic dis-
tances and using two-dimensional kernel density estimation (KDE) to
measure local densities. Continuous genetic clines such as those
caused by spatial autocorrelation would result in a single cloud of
points without discontinuities, whereas distant and divergent popula-
tions would be represented by separate high density patches. The
KDE plots show that the western clade consists mostly of a single
cloud, with a few outliers (samples from the Larutensis populations
located on a different mountain range). The eastern clade was repre-
sented by two distinct patches (Fig. S6), indicating that the East 1
and East 2 populations are both distant and divergent and are not
spatially autocorrelated.
Our results show that commonly used species delimitation methods
were effective at assessing lineage separation in highly divergent lin-
eages where gene flow was absent (East 1 and East 2) but overesti-
mated the number of species in younger lineages where gene flow
was prevalent but populations were markedly structured genetically.
Splittingof lineages within a metapopulation occurred even when
genomic data were used. We attribute this to the violation of the
underlying assumptions of the models implemented by these pro-
grams: the guide tree is assumed to be correct (Zhang et al., 2013);
speciation is modelled as an instantaneous event (Nee, 2006; Suku-
maran & Knowles, 2017); and divergence is assumed to occur without
gene flow (Yang & Rannala, 2010). Using these methods on a system
that violated these assumptions led to model misspecification and
inaccurate estimation of species boundaries (Camargo, Morando,
Avila, & Sites, 2012; Carstens et al., 2013; Ence & Carstens, 2011;
Jackson, Carstens, Morales, & OMeara, 2016; Sukumaran & Knowles,
2017). On the other hand, we showed that a population genomics
approach can be an effective tool at delimiting species boundaries
both when gene flow is absent and when it is present at varying levels.
By considering lineage independence as the only necessary property
of a species, we can shift our focus away from traditional criteria (phe-
notypic distinctness, monophyly, genetic divergence, etc.) and recast
the species delimitation framework as one that strictly focuses on
assessing lineage cohesion/separation. Using this approach, we
demonstrate that Peninsular Malaysian Amolops are comprised of at
least three species, the true A. larutensis (i.e., the western clade) and
two unnamed lineages from the eastern clade (East 1 and East 2).
TABLE 3 Pairwise comparisons of demographic parameters within
and between eastern and western populations
distances Migration
rates (N
Z-range (nSig.)Pop 1 Pop 2 F
Within West
Larutensis West
0.0860 0.0030 0.5814 NT
Larutensis West
0.0670 0.0020 0.3331 0.03.0 (0/63)
West 1 West
0.0630 0.0020 5.4939 2.87.3 (63/63)
West 2 West
0.0290 0.0010 1.0291 0.95.6 (28/63)
West 3 West
0.0340 0.0010 5.7762 NT
Mean 0.0530 0.0017
Within East
East 1 East
0.5700 0.0150 NT 0.85.3 (8/23)
East 1.2 East
0.8880 0.1380 NT 0.43.4 (0/47)
Mean 0.7290 0.0765
Between West/East
East 1 West
0.8970 0.1260 NT 0.04.0 (0/191)
East 1 West
0.9140 0.1350 NT 0.01.9 (0/63)
East 1.2 West
0.8820 0.1380 NT 0.02.7 (0/71)
East 1.2 West
0.8890 0.1380 NT 0.02.7 (0/71)
East 3 West
0.9320 0.2210 NT 0.01.6 (0/63)
Mean 0.9028 0.1516
NT, not tested.
Examined parameters include genetic distance (F
and JostsD), migration
rates (N
) and D-statistic scores represented by the range of Z-scores
followed by the number of significant location comparisons assessed using
ap-Value at a=.01 after the HolmBonferroni correction.
Support for lineage separation
All analyses unanimously supported at least three separately evolving
lineages. The western and eastern lineages were separated by very
large uncorrected mitochondrial distances and F
values. The differ-
entiation between these two lineages was also supported by the
majority of AMOVA variance being explained by these lineages as
opposed to populations within these lineages. Furthermore, the D-
statistic test showed no evidence of introgression between the west-
ern and eastern lineages (with the exception of a single hybrid sample
discussed below). Similar results were obtained when comparing the
populations East 1 and East 2 within the eastern lineage, thereby
supporting the divergence and isolation of these two species.
Despite the presence of a single hybrid sample, our analyses indi-
cated that all other samples from the eastern/western contact zone
(n=21) consisted of either eastern or western genotypes with no
genetic intermediates. We view this as evidence of strong reproduc-
tive isolation and hypothesize that hybridization events between
these separately evolving lineages are rare and produce hybrids of
low fitness that do not subsequently reproduce successfully. How-
ever, denser sampling will be required to better understand the
extent and viability of hybrids at this contact zone.
FIGURE 4 Left: A phylogram depicting contemporary and historical migration rates (N
) for populations from the western clade estimated
using FASTSIMCOAL2 under a full asymmetrical migration model. Right: Contemporary gene flow scenarios based on plausible phylogenetic
relationships, population structure and geography. High gene flow: N
>1 and all sample location comparisons significant for D-statistic; low
gene flow: 0.1 <N
<1 and some comparisons significant for D-statistic; no gene flow: N
<0.1 or no significant comparisons for D-statistic
TABLE 4 AMOVA results showing the proportion of variation and F
analogues calculated for different hierarchical levels of population
structure under the infinite allele model
Source of variation Nested in %var F-stat F-value SD p-Value F0-value
Within West
Within individual 0.744 F
0.256 0.011 ––
Among individual Population 0.12 F
0.139 0.008 .001
Among population 0.136 F
0.136 0.009 .001 0.138
Within East
Within individual 0.203 F
0.797 0.007 ––
Among individual Population 0.15 F
0.425 0.012 .001
Among population 0.647 F
0.647 0.01 .001 0.655
Between West/East
Within individual 0.262 F
0.738 0.005 ––
Among individual Population 0.114 F
0.303 0.004 .001
Among population East/West 0.091 F
0.195 0.005 .001 0.2
p-Values were assessed using 1,000 permutations.
These multiple lines of congruent evidence from different sources
of data provide strong support for the recognition of at least three
distinct species of Amolops in Peninsular Malaysia: the true A. laruten-
sis, consisting of populations from the western lineage; and two unde-
scribed species represented by the lineages East 1 and East 2.
Support for population cohesion
Within the highly structured western clade, the relatively high mito-
chondrial distances were consistent with interspecific distances
among other amphibian species (Fouquet et al., 2007; Vences, Tho-
mas, Bonett, & Vieites, 2005; Vences, Thomas, van der Meijden,
Chiari, & Vieites, 2005) and were identified as separate species
based on traditional species delimitation methods. However, we
reject this hypothesis based on results from population genomic
analyses (sNMF, F
, AMOVA, H-index, FASTSIMCOAL2, D-statistic, Man-
tel test) that detected different levels of gene flow among these
populations (discussed in further detail below). Disturbingly, the pop-
ulations that had the highest mitochondrial divergences (West 1,
West 2, West 3 and West 4) were also the populations that were
most undifferentiated and showed the highest levels of gene flow
based on genomic data, potentially as a result of sex-biased gene
flow. Genetic variation within the western clade is more reflective of
intraspecific population structure than divergence associated with
speciation events. As such, we consider the entire western lineage
to be a single, cohesive metapopulation lineage represented by the
taxon name A. larutensis.
Within the eastern clade, the D-statistic showed low levels of
gene flow between the subpopulations East 1.1 and East 1.2 but not
between East 1.2 and East 2. We attribute the low levels of gene
flow and migration between the subpopulations East 1.1 and East
1.2 to the lack of samples from the region spanning those popula-
tions. We hypothesize that as samples from that area become avail-
able, populations from the entire northeastern mountain range will
form a cohesive metapopulation lineage (East 1), separate from the
southeastern population East 2 due to the lack of contiguous habitat
between East 1 and East 2.
Biogeography and the speciation continuum
The different levels of genetic differentiation within Peninsular
Malaysian Amolops illustrate the complex nature of speciation, rang-
ing from the presence of continuous variation within a group with-
out reproductive isolation, to complete and irreversible reproductive
isolation between groups (Hendry, Bolnick, Berner, & Peichel, 2009).
Deep divergence coupled with strong reproductive isolation could be
caused by divergent selection (McKinnon et al., 2004; Rundle &
Nosil, 2005; Schluter, 2009) or allopatric speciation (Coyne & Orr,
2004; Wiley, 1978). In this study, ecological conservatism in Amolops
and the discontinuous genetic variation between western and east-
ern lineages are more indicative of the latter as opposed to the for-
mer. At the other end of the divergence spectrum, populations from
the western lineage were highly structured and showed varying
levels of historical and contemporary migration consistent with a
complex history involving gene flow between recently diverging lin-
eages. Because speciation with gene flow can occur in nature (Nie-
miller, Fitzpatrick, & Miller, 2008; Nosil, 2008; Zarza et al., 2016),
we applied a migration threshold for genetic isolation of one individ-
ual per 10 generations as a cut-off to determine the level of gene
flow below which we consider populations to be separately evolving
lineages (Rannala, 2015; Zhang, Zhang, Zhu, & Yang-, 2011). Using
this threshold, gene flow among populations of the western clade
has not been sufficiently reduced to be considered genetically iso-
lated enough to represent distinct species. However, it is worth not-
ing that gene flow between Larutensis on the northwestern
mountain range and the geographically proximate W1 and W2 popu-
lations on the central range were the most reduced (N
=0.3). We
interpret this as an indication of incipient speciation triggered by
recent and rapid human development and the disruption of habitat
corridors along the BintangKledang range, a small mountain range
situated between the northwestern and central mountain ranges
(Jamaluddin, Pau, & Siti-Azizah, 2011; Khoo & Lubis, 2005). Given
sufficient time, Larutensiss lower long-term migration rates could
lead this population to qualify as a species separate from the other
western populations. However at this point, the data do not support
this split and we therefore consider these populations as belonging
to a single species.
Effects of genomic filtering parameters
Filtering parameters for SNP assembly can have a significant impact
on downstream analyses and inferences. The correlation between
including sites with more missing taxa and better bipartition support
is consistent with previous simulation (Huang & Knowles, 2016) and
empirical studies (Eaton & Ree, 2013; Wagner et al., 2013). Con-
versely, allowing large amounts of missing data can also result in
high bootstrap support for incorrect clades (Leach
e, Banbury, Felsen-
stein, de Oca, & Stamatakis, 2015). Since a consensus has yet to be
reached on the best ways to process large data sets, our preference
for the data set with lower minimum read depth and higher allow-
ance for missing data should be interpreted with caution, especially
as our preferred phylogeny might not actually represent the true
species tree. However, our use of population genomic methods to
delimit the number of Peninsular Malaysian Amolops species means
our conclusions are relatively robust to errors in reconstructing the
true topology of lineages included in this study.
In a separate study on the trade-off between coverage depth and
the number of individuals in a sample, Buerkle and Gompert (2013)
showed that low coverage sequencing (as low as 19coverage) is not
only sufficient, but also could be optimal to accurately estimate popu-
lation parameters, as this allows the inclusion of greater numbers of
individuals or sites in the genome. Lower coverage was also optimal
for phylogenetic estimation in our data sets, as our higher minimum
depth data sets had low branch support, and failed to recover some
monophyletic groups. However, these recommendations are depen-
dent on the overall level of sequencing depth in our project: our study
supports previous research in that general rules of thumb for SNP fil-
tering are unlikely but instead may depend on the properties of the
data set and species biology, which should be evaluated on a case-by-
case basis (Huang & Knowles, 2016; Leach
e et al., 2015).
This study does not discount the utility of traditional species delimi-
tation methods but instead highlights the importance of choosing
the right tool for the right task. Using methods that do not account
for gene flow to delimit cryptic species boundaries where gene flow
occurs will inherently yield erroneous results. We therefore caution
against using these methods to delimit recent and rapidly diverging
populations where gene flow may be prevalent. For such cases, we
demonstrate that a population genomics approach can be used to
objectively assess lineage separation in line with the general lineage
concept of species.
Our findings are especially significant for systematic research in
regions where new species are being described at a high rate. Malay-
sia stands as a particularly relevant test case (i.e., a potential future
study system for evaluating the performance of species delimitation
procedures) in that numerous newly described species, codistributed
throughout the range of localities studied here, have been split into
multiple, formally named species using traditional species delimitation
methods. This study does not invalidate those descriptions but pro-
vides evidence that gene flow is present among co-occurring popula-
tions in one taxonomic group (Amolops). Our findings suggest that
other codistributed and taxonomically diverse taxa could provide
compelling examples for future genomic species delimitation studies.
Field and molecular work for this study was supported by the
National Geographic Explorers Grant (9722-15) to LLG and KOC. We
thank the Advanced Computing Facility staff at the University of Kan-
sas for computational resources; and the KU Genome Sequencing
Core Laboratory (supported by the National Institute of General Medi-
cal Sciences of the National Institutes of Health: P20GM103638). We
thank J. Kelly for assisting with genomic data collection and R. Glor, L.
Welton, S. Travers, K. Olson, C. Hutter, R. Abraham, P. De Mello, K.
Allen, K. Chovanec, J. Wienell and W. Tapondjou for providing intel-
lectual input. We thank K. Lim at Lee Kong Chian Museum of Natural
History, Singapore, for specimen loans. For assistance in the field, we
thank T. Daugherty, M. Muin and S. Anuar.
1Sampling localities, morphological data and mtDNA sequences
(GenBank accessions) uploaded as Tables S1 and S2.
2Genomic data and associated scripts and files for analyses are
available from the Dryad Digital Repository:
K.O.C. designed the study, collected samples, performed laboratory
work and analyses and wrote the manuscript. A.M.A. helped with
genomic analyses and provided valuable comments and edits
throughout numerous rounds of revisions. L.L.G. provided funding
for fieldwork, collected samples and contributed to many of the
ideas in the project design. Y.-C.S. provided valuable insights and
ideas during many sessions of discussion. J.L.G. helped with initial
project design and sampling. E.S.H.Q. helped with sampling. R.M.B.
funded laboratory work and sequencing and provided intellectual
input throughout the entire study.
Kin Onn Chan
Alexander, A. (2017). Creating_dadi_SNP_input_from_structure. Retrieved
Andolfatto, P., Davison, D., Erezyilmaz, D., Hu, T. T., Mast, J., Sunayama-
morita, T., & Stern, D. L. (2011). Multiplexed shotgun genotyping for
rapid and efficient genetic mapping. Genome Research,21(4), 610
Barley, A. J., White, J., Diesmos, A. C., & Brown, R. M. (2013). The challenge
of species delimitation at the extremes: Diversification without mor-
phological change in Philippine Sun Skinks. Evolution,67, 35563572.
Benestan, L., Gosselin, T., Perrier, C., Sainte-Marie, B., Rochette, R., &
Bernatchez, L. (2015). RAD genotyping reveals fine-scale genetic
structuring and provides powerful population assignment in a widely
distributed marine species, the American lobster (Homarus ameri-
canus). Molecular Ecology,24(13), 32993315.
Brown, R. M., & Stuart, B. L. (2012). Patterns of biodiversity discovery
through time: An historical analysis of amphibian species discoveries
in the Southeast Asian mainland and adjacent island archipelagos. In
D. J. Gower, K. Johnson, J. Richardson, B. Rosen, L. Ruber, & S. Wil-
liams (Eds.), Biotic evolution and environmental change in Southeast
Asia (pp. 348389). Cambridge: Cambridge University Press.
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & Roychoud-
hury, A. (2012). Inferring species trees directly from biallelic genetic
markers: Bypassing gene trees in a full coalescent analysis. Molecular
Biology and Evolution,29, 19171932.
Buerkle, C. A. (2005). Maximum-likelihood estimation of a hybrid index
based on molecular markers. Molecular Ecology Notes,5, 684687.
Buerkle, A. C., & Gompert, Z. (2013). Population genomics based on low
coverage sequencing: How low should we go? Molecular Ecology,22,
Burbrink, F. T., & Guiher, T. J. (2015). Considering gene flow when using
coalescent methods to delimit lineages of North American pitvipers
of the genus Agkistrodon.Zoological Journal of the Linnean Society,
173, 505526.
Camargo, A., Morando, M., Avila, L. J., & Sites, J. W. (2012). Coalescent-
based methods with ABC and other coalescent-based methods : A
test of accuracy with simulations and an empirical example with
lizards of the Liolaemus darwinii complex (Squamata : Liolaemidae).
Evolution,66, 28342849.
Candy, J. R., Campbell, N. R., Grinnell, M. H., Beacham, T. D., Larson, W.
A., & Narum, S. R. (2015). Population differentiation determined from
putative neutral and divergent adaptive genetic markers in Eulachon
(Thaleichthys pacificus, Osmeridae), an anadromous Pacific smelt.
Molecular Ecology Resources,15(6), 14211434.
Carstens, B. C., Pelletier, T. A., Reid, N. M., & Satler, J. D. (2013). How to
fail at species delimitation. Molecular Ecology,22, 43694383.
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A., & Cresko, W. A.
(2013). Stacks: An analysis tool set for population genomics. Molecu-
lar Ecology,22, 31243140.
Chan, K. O., & Brown, R. M. (2017). Did true frogs dispersify?Biology
Letters,13, 20170299.
Chan, K. O., Brown, R. M., Lim, K. K. P., & Grismer, L. L. (2014). A new
species of frog (Amphibia: Anura: Ranidae) of the Hylarana signata
Complex from Peninsular Malaysia. Herpetologica,70, 228240.
Chan, K. O., & Grismer, L. L. (2010). Re-assessment of the Reinwardts
Gliding Frog, Rhacophorus reinwardtii (Schlegel 1840) (Anura: Rha-
cophoridae) in Southern Thailand and Peninsular Malaysia and its re-
description as a new species. Zootaxa,2505,4050.
Chan, K. O., Grismer, L. L., & Brown, R. M. (2014). Reappraisal of the
Javanese Bullfrog complex, Kaloula baleata (M
uller, 1836) (Amphibia:
Anura: Microhylidae), reveals a new species from Peninsular Malaysia.
Zootaxa,3900, 569580.
Chan, K. O., Grismer, L. L., Zachariah, A., Brown, R. M., & Abraham, R. K.
(2016). Polyphyly of Asian Tree Toads, genus Pedostibes G
1876 (Anura: Bufonidae), and the description of a new genus from
Southeast Asia. PLoS ONE,11, e0145903.
Chan, K. O., Grismer, L. L., & Grismer, J. L. (2011). A new insular, endemic
frog of the genus Kalophrynus Tschudi, 1838 (Anura: Microhylidae)
from Tioman Island, Pahang, Peninsular Malaysia. Zootaxa,68,6068.
Coyne, J. A., & Orr, H. A. (2004). Speciation. Sunderland, MA: Sinauer
Crawford, A. J. (2003). Relative rates of nucleotide substitution in frogs.
Journal of Molecular Evolution,57(6), 636641.
Drummond, A. J., & Bouckaert, R. R. (2015). Bayesian evolutionary analysis
with BEAST (p. 260). Cambridge: Cambridge University Press.
Durand, E. Y., Patterson, N., Reich, D., & Slatkin, M. (2011). Testing for
ancient admixture between closely related populations. Molecular
Biology and Evolution,28, 22392252.
Eaton, D. A. R. (2014). PyRAD: Assembly of de novo RADseq loci for
phylogenetic analyses. Bioinformatics,30, 18441849.
Eaton, D. A. R., Hipp, A. L., Gonz
ıguez, A., & Cavender-Bares, J.
(2015). Historical introgression among the American live oaks and
the comparative nature of tests for introgression. Evolution,69,
Eaton, D. A. R., & Ree, R. H. (2013). Inferring phylogeny and introgres-
sion using RADseq data: An example from glowering plants (Pedicu-
laris: Orobanchaceae). Systematic Biology,62, 689706.
Ence, D. D., & Carstens, B. C. (2011). SpedeSTEM: A rapid and accurate
method for species delimitation. Molecular Ecology Resources,11,
Evans, B. J., Brown, R. M., McGuire, J. A., Supriatna, J., Andayani, N.,
Diesmos, A., ... Cannatella, D. C. (2003b). Phylogenetics of fanged
frogs: Testing biogeographical hypotheses at the interface of the
asian and Australian faunal zones. Systematic Biology,52(6), 794
Excoffier, L., Dupanloup, I., Huerta-S
anchez, E., Sousa, V. C., & Foll, M.
(2013). Robust Demographic Inference from Genomic and SNP data.
PLoS Genetics,9(10).
Excoffier, L., Smouse, P. E., & Quattro, J. M. (1992). Analysis of molecular
variance inferred from metric distances among DNA haplotypes:
Application to human mitochondrial DNA restriction data. Genetics,
131, 479491.
Fouquet, A., Gilles, A., Vences, M., Marty, C., Blanc, M., & Gemmell, N. J.
(2007). Underestimation of species richness in neotropical frogs
revealed by mtDNA analyses. PLoS ONE,2(10),
Frichot, E., & Francßois, O. (2015). LEA: An R package for landscape and
ecological association studies. Methods in Ecology and Evolution,6,
Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G., & Francßois, O. (2014).
Fast and efficient estimation of individual ancestry coefficients.
Genetics,196, 973983.
Frost, D. R. (2015). Amphibian species of the world: An online reference.
Version 6.0. New York, NY: American Museum of Natural History.
Electronic Database Retrieved from
Frost, D. R., & Hillis, D. M. (1990). Species in concept and practice: Her-
petological applications. Herpetologica,46,86104.
Fujita, M. K., Leache, A. D., Burbrink, F. T., McGuire, J. A., & Moritz, C.
(2012). Coalescent-based species delimitation in an integrative taxon-
omy. Trends in Ecology and Evolution,27, 480488.
Giam, X., Scheffers, B. R., Sodhi, N. S., Wilcove, D. S., Ceballos, G., & Ehr-
lich, P. R. (2012). Reservoirs of richness: Least disturbed tropical for-
ests are centres of undescribed species diversity. Proceedings of the
Royal Society B: Biological Sciences,279(1726), 6776.
Grismer, L. L. (2011). Lizards of Peninsular Malaysia, Singapore and their
adjacent archipelagos. Frankfurt, Germany: Edition Chimaira.
Grismer, L. L., Anuar, S., Quah, E. S. H., Muin, M. A., Chan, K. O., Grismer,
J. L., & Ahmad, N. (2010). A new spiny, prehensile-tailed species of
Cyrtodactylus (Squamata: Gekkonidae) from Peninsular Malaysia with
a preliminary hypothesis of relationships based on morphology. Zoo-
taxa,52(2625), 4052.
Grismer, L. L., Wood, P. L., Anuar, S., Muin, M. A., Quah, E. S. H.,
McGuire, J. A., ... Pham, H. T. (2013). Integrative taxonomy uncovers
high levels of cryptic species diversity in Hemiphyllodactylus Bleeker,
1860 (Squamata: Gekkonidae) and the description of a new species
from Peninsular Malaysia. Zoological Journal of the Linnean Society,
169(4), 849880.
Grismer, L. L., Wood, P. L. J., Anuar, S., Riyanto, A., Ahmad, N., Muin, M.
A., ... Pauwels, O. S. G. (2014). Systematics and natural history of
Southeast Asian Rock Geckos (genus Cnemaspis Strauch, 1887) with
descriptions of eight new species from Malaysia, Thailand and
Indonesia. Zootaxa,3880(1), 1147.
Grismer, L. L., Wood, P. L. J., Quah, E. S. H., Anuar, S., Muin, M. A.,
Sumontha, M., ... Pauwels, O. S. G. (2012). A phylogeny and taxon-
omy of the Thai-Malay Peninsula Bent-toed Geckos of the Cyrto-
dactylus pulchellus complex (Squamata: Gekkonidae): Combined
morphological and molecular analyses with descriptions of seven new
species. Zootaxa,3520,155.
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H., & Bustamante, C.
D. (2009). Inferring the joint demographic history of multiple popula-
tions from multidimensional SNP frequency data. PLoS Genetics,5
Harvey, M. G., Duffie-Judy, C., Seeholzer, G. F., Maley, J. M., Graves, G.
R., & Brumfield, R. T. (2015). Similarity threshholds used in short read
assembly reduce the comparability of population histories across spe-
cies. PeerJ,3, e895.
Hasan, M., Islam, M. M., Khan, M. M. R., Igawa, T., Alam, M. S., Djong, H.
T., ... Sumida, M. (2014). Genetic divergences of South and South-
east Asian frogs: A case study of several taxa based on 16S riboso-
mal RNA gene data with notes on the generic name Fejervarya.
Turkish Journal of Zoology,38(4), 389411.
Hendry, A. P., Bolnick, D. I., Berner, D., & Peichel, C. L. (2009). Along the
speciation continuum in sticklebacks. Journal of Fish Biology,75,
Huang, H., & Knowles, L. L. (2016). Unforeseen consequences of exclud-
ing missing data from next-generation sequences: Simulation study of
RAD sequences. Systematic Biology,65(3), 357365.
Ilut, D. C., Nydam, M. L., & Hare, M. P. (2014). Defining loci in restric-
tion-based reduced representation genomic data from nonmodel spe-
cies: Sources of bias and diagnostics for optimal clustering. BioMed
Research International,2014(675158).
Jackson, N. D., Carstens, B. C., Morales, A. E., & OMeara, B. C. (2016).
Species delimitation with gene flow. Systematic Biology, syw117.
Jamaluddin, J. A. F., Pau, T. M., & Siti-Azizah, M. N. (2011). Genetic
structure of the Snakehead Murrel, Channa striata (Channidae) based
on the Cytochrome-c Oxidase Subunit I gene: Influence of historical
and geomorphological factors. Genetics and Molecular Biology,34,
Jombart, T. (2008). Adegenet: A R package for the multivariate analysis
of genetic markers. Bioinformatics,24, 14031405.
Jombart, T., Devillard, S., & Balloux, F. (2010). Discriminant analysis of
principal components: A new method for the analysis of genetically
structured populations. BMC Genetics,11, 94.
Jost, L. (2008). GST and its relatives do not measure differentiation.
Molecular Ecology,17, 40154026.
Kaiser, H. F. (1960). The application of electronic computers to factor
analysis. Educational and Psychological Measurement,20, 141151.
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock,
S., ... Drummond, A. (2012). Geneious basic: An integrated and
extendable desktop software platform for the organization and analy-
sis of sequence data. Bioinformatics,28(12), 16471649.
Khoo, S. N., & Lubis, A. R. (2005). Kinta Valley: Pioneering Malaysias mod-
ern development. Penang: Areca Books.
Kim, H., & Park, H. (2007). Sparse non-negative matrix factorizations via
alternating non-negativity-constrained least squares for microarray
data analysis. Bioinformatics,23, 14951502.
Knowles, L. L., & Carstens, B. C. (2007). Estimating a geographically expli-
cit model of population divergence. Evolution,61, 477493.
Larson, W. A., Seeb, L. W., Everett, M. V., Waples, R. K., Templin, W. D.,
& Seeb, J. E. (2014). Genotyping by sequencing resolves shallow pop-
ulation structure to inform conservation of Chinook salmon (Oncor-
hynchus tshawytscha). Evolutionary Applications,7(3), 355369.
e, A. D., Banbury, B. L., Felsenstein, J., de Oca, A. N., & Stamatakis,
A. (2015). Short tree, long tree, right tree, wrong tree: New acquisi-
tion bias corrections for inferring SNP phylogenies. Systematic Biol-
ogy,64, 10321047.
e, A. D., Fujita, M. K., Minin, V. N., & Bouckaert, R. R. (2014). Spe-
cies delimitation using genome-wide SNP Data. Systematic Biology,
63, 534542.
Leavitt, S. D., Moreau, C. S., & Lumbsch, H. T. (2015). The dynamic disci-
pline of species delimitation: Progress toward effectively recognizing
species boundaries in natural populations. In D. K. Upreti, P. K. Diva-
kar, V. Shukla, & R. Bajpai (Eds.), Recent advances in lichenology: Mod-
ern methods and approaches in lichen systematics and culture
techniques, Vol. 2(pp. 1144). New Delhi, India: Springer.
Leliaert, F., Verbruggen, H., Vanormelingen, P., Steen, F., L
J. M., Zuccarello, G. C., & De Clerck, O. (2014). DNA-based species
delimitation in algae. European Journal of Phycology,49(2), 179196.
Leslie, S., Winney, B., Hellenthal, G., Davison, D., Boumertit, A., Day, T.,
... Bodmer, W. (2015). The fine-scale genetic structure of the British
population. Nature,519(7543), 309314.
Lewis, P. O. (2001). A likelihood approach to estimating phylogeny from
discrete morphological character data. Systematic Biology,50, 913925.
Lleonart, J., Salat, J., & Torres, G. J. (2000). Removing allometric effects
of body size in morphological analysis. Journal of Theoretical Biology,
Matsui, M., Shimada, T., Liu, W. Z., Maryati, M., Khonsue, W., & Orlov,
N. (2006). Phylogenetic relationships of Oriental torrent frogs in the
genus Amolops and its allies (Amphibia, Anura, Ranidae). Molecular
Phylogenetics and Evolution,38(3), 659666.
Mayr, E. (1968). The role of systematics in biology. Science,159, 595
McKinnon, J. S., Mori, S., Blackman, B. K., David, L., Kingsley, D. M.,
Jamieson, L., ... Schluter, D. (2004). Evidence for ecologys role in
speciation. Nature,429, 294298.
McLeod, D. S. (2010). Of least concern? Systematics of a cryptic species
complex: Limnonectes kuhlii (Amphibia: Anura: Dicroglossidae). Molec-
ular Phylogenetics and Evolution,56, 9911000.
Medrano, M., L
opez-Perea, E., & Herrera, C. M. (2014). Population genet-
ics methods applied to a species delimitation problem: Endemic trum-
pet daffodils (Narcissus Section Pseudonarcissi) from the Southern
Iberian Peninsula. International Journal of Plant Sciences,175, 501517.
Meirmans, P. G. (2012). The trouble with isolation by distance. Molecular
Ecology,21(12), 28392846.
Meirmans, P. G., & Hedrick, P. W. (2011). Assessing population structure:
FST and related measures. Molecular Ecology Resources,11,518.
Meirmans, P. G., & Van Tienderen, P. H. (2004). GENOTYPE and GEN-
ODIVE: Two programs for the analysis of genetic diversity of asexual
organisms. Molecular Ecology Notes,4, 792794.
Miettinen, J., Shi, C., & Liew, S. C. (2011). Deforestation rates in insular
Southeast Asia between 2000 and 2010. Global Change Biology,17,
Minh, B. Q., Nguyen, M. A. T., & von Haeseler, A. (2013). Ultrafast
approximation for phylogenetic bootstrap. Molecular Biology and Evo-
lution,30, 11881195.
Mittermeier, R. A., Myers, N., Thomsen, J. B., da Fonseca, G. A. B., & Oli-
vieri, S. (1998). Biodiversity hotspots and major tropical Wilderness
areas: Approaches to setting conservation priorities. Conservation
Biology,12, 516520.
Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B., &
Kent, J. (2000). Biodiversity hotspots for conservation priorities. Nat-
ure,403, 853858.
Nee, S. (2006). Birth-death models in macroevolution. Annual Review of
Ecology, Evolution, and Systematics,37,117.
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2014). IQ-
TREE: A fast and effective stochastic algorithm for estimating maxi-
mum likelihood phylogenies. Molecular Biology and Evolution,32, 268
Niemiller, M. L., Fitzpatrick, B. M., & Miller, B. T. (2008). Recent diver-
gence with gene flow in Tennessee cave salamanders (Plethodonti-
dae: Gyrinophilus) inferred from gene genealogies. Molecular Ecology,
17, 22582275.
Nosil, P. (2008). Speciation with gene flow could be common. Molecular
Ecology,17, 20062008.
Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., ...
Reich, D. (2012). Ancient admixture in human history. Genetics,192
(3), 10651093.
Petit, R. J., & Excoffier, L. (2009). Gene flow and species delimitation.
Trends in Ecology and Evolution,24, 386393.
Pyron, R. A., Hsieh, F. W., Lemmon, A. R., Emily, M., & Hendry, C. R.
(2016). Integrating phylogenomic and morphological data to assess
candidate species-delimitation models in brown and red-bellied snakes
(Storeria). Zoological Journal of the Linnean Society,177,937949.
de Queiroz, K. (2005). Ernst Mayr and the modern concept of species.
Proceedings of the National Academy of Sciences of the United States
of America,102, 66006607.
de Queiroz, K. (2007). Species concepts and species delimitation. System-
atic Biology,56, 879886.
Rambaut, A., Suchard, M. A., Xie, D., & Drummond, A. J. (2014). Tracer
v1.6. Retrieved from
Rannala, B. (2015). The art and science of species delimitation. Current
Zoology,61, 846853.
Reeves, P. A., & Richards, C. M. (2007). Distinguishing terminal
monophyletic groups from reticulate taxa: Performance of phenetic,
tree-based, and network procedures. Systematic Biology,56,
Reeves, P. A., & Richards, C. M. (2011). Species delimitation under the gen-
eral lineage concept: An empirical example using wild North American
hops (Cannabaceae: Humulus lupulus). Systematic Biology,60,4559.
Richardson, J. L., Brady, S. P., Wang, I. J., & Spear, S. F. (2016). Navigat-
ing the pitfalls and promise of landscape genetics. Molecular Ecology,
25, 849863.
Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A.,
Hohna, S., ... Huelsenbeck, J. P. (2012). MrBayes 3.2: Efficient Baye-
sian phylogenetic inference and model choice across a large model
space. Systematic Biology,61(3), 539542.
Rosenberg, N. A. (2007). Statistical tests for taxonomic distinctiveness
from observations of monophyly. Evolution,61, 317323.
Rousset, F. (2008). GENEPOP007: A complete re-implementation of the
GENEPOP software for Windows and Linux. Molecular Ecology
Resources,8, 103106.
Rubin, B. E. R., Ree, R. H., & Moreau, C. S. (2012). Inferring phylogenies
from RAD sequence data. PLoS ONE,7,112.
Rundle, H. D., & Nosil, P. (2005). Ecological speciation. Ecology Letters,8,
Schluter, D. (2009). Evidence for ecological speciation and its alternative.
Science,323, 737741.
Sexton, J. P., Hangartner, S. B., & Hoffmann, A. A. (2014). Genetic isola-
tion by environment or distance: Which pattern of gene flow is most
common? Evolution,68,115.
Simpson, G. G. (1961). Principles of animal taxonomy. New York: Columbia
University Press.
Sodhi, N. S., Koh, L. P., Brook, B. W., & Ng, P. K. L. (2004). Southeast
Asian biodiversity: An impending disaster. Trends in Ecology and Evo-
lution,19, 654660.
ıs-Lemus, C., Knowles, L. L., & An
e, C. (2015). Bayesian species delimi-
tation combining multiple genes and traits in a unified framework.
Evolution,69, 492507.
Sousa, V., & Hey, J. (2013). Understanding the origin of species with gen-
ome-scale data: Modelling gene flow. Nature Reviews. Genetics,14,
Streicher, J. W., Devitt, T. J., Goldberg, C. S., Malone, J. H., Blackmon, H.,
& Fujita, M. K. (2014). Diversification and asymmetrical gene flow
across time and space: Lineage sorting and hybridization in polytypic
barking frogs. Molecular Ecology,23(13), 32733291.
Stuart, B. L. (2008). The phylogenetic problem of Huia (Amphibia: Rani-
dae). Molecular Phylogenetics and Evolution,46,4960.
Sukumaran, J., & Knowles, L. L. (2017). Multispecies coalescent delimits
structure, not species. Proceedings of the National Academy of
Sciences,114, 16071612.
Swofford, D. L. (2002). PAUP*. Phylogenetic Analysis Using Parsimony
(*and Other Methods). Sunderland, MA: Sinauer Associates.
Tang, C. Q., Humphreys, A. M., Fontaneto, D., & Barraclough, T. G.
(2014). Effects of phylogenetic reconstruction method on the robust-
ness of species delimitation using single-locus data. Methods in Ecol-
ogy and Evolution,5, 10861094.
Thorpe, R. S. (1975). Quantitative handling of characters useful in snake sys-
tematics with particular reference to intraspecific variation in the Ringed
Snake Natrix natrix.Biological Journal of the Linnean Society,7,2743.
Thorpe, R. S. (1983). A review of the numerical methods for recognizing
and analyzing racial differentiation. In: J. Felsenstein (Ed.), Numerical
taxonomy: Proceedings of a NATO advanced studies institute NATO ASI
series (pp. 404423). Berlin, Heidelberg: Springer Verlag.
Tobias, J. A., Seddon, N., Spottiswoode, C. N., Pilgrim, J. D., Fishpool, L.
D. C., & Collar, N. J. (2010). Quantitative criteria for species delimita-
tion. Ibis,152(4), 724746.
Turan, C. (1999). A note on the examination of morphometric differentia-
tion among fish populations: The Truss System. Turkish Journal of
Zoology,23, 259263.
Veach, V., Di Minin, E., Pouzols, F. M., & Moilanen, A. (2017). Species
richness as criterion for global conservation area placement leads to
large losses in coverage of biodiversity. Diversity and Distributions,23,
Vences, M., Thomas, M., Bonett, R. M., & Vieites, D. R. (2005). Decipher-
ing amphibian diversity through DNA barcoding: Chances and chal-
lenges. Philosophical Transactions of the Royal Society of London, Series
B, Biological Sciences,360, 18591868.
Vences, M., Thomas, M., van der Meijden, A., Chiari, Y., & Vieites, D. R.
(2005). Comparative performance of the 16S rRNA gene in DNA bar-
coding of amphibians. Frontiers in Zoology,2,5.
Wagner, C. E., Keller, I., Wittwer, S., Selz, O. M., Mwaiko, S., Greuter, L.,
... Seehausen, O. (2013). Genome-wide RAD sequence data provide
unprecedented resolution of species boundaries and relationships in
the Lake Victoria Cichlid adaptive radiation. Molecular Ecology,22(3),
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the
analysis of population structure. Evolution,38, 13581370.
Whitlock, M. C. (2011). GST and D do not replace FST. Molecular Ecol-
ogy,20, 10831091.
Wiens, J. J., & Penkrot, T. A. (2002). Delimiting species using DNA and
morphological variation and discordant species limits in spiny lizards
(Sceloporus). Systematic Biology,51,6991.
Wilcove, D. S., Giam, X., Edwards, D. P., Fisher, B., & Koh, L. P. (2013).
Navjots nightmare revisited: Logging, agriculture, and biodiversity in
Southeast Asia. Trends in Ecology and Evolution,28, 531540.
Wiley, E. O. (1978). The evolutionary species concept reconsidered. Sys-
tematic Zoology,27,1726.
Wright, S. (1951). The genetical structure of populations. Annals of
Eugenics,15, 322354.
Yang, Z. (2015). A tutorial of BPP for species tree estimation and species
delimitation. Current Zoology,61, 854865.
Yang, Z., & Rannala, B. (2010). Bayesian species delimitation using multi-
locus sequence data. Proceedings of the National Academy of Sciences
of the United States of America,107, 92649269.
Zarza, E., Faircloth, B. C., Tsai, W. L. E., Bryson, R. W., Klicka, J., &
McCormack, J. E. (2016). Hidden histories of gene flow in highland
birds revealed with genomic markers. Molecular Ecology,25, 5144
Zhang, J., Kapli, P., Pavlidis, P., & Stamatakis, A. (2013). A general species
delimitation method with applications to phylogenetic placements.
Bioinformatics,29(22), 28692876.
Zhang, C., Zhang, D., Zhu, T., & Yang-, Z. (2011). Evaluation of a bayesian
coalescent method of species delimitation. Systematic Biology,60,
Additional Supporting Information may be found online in the sup-
porting information tab for this article.
How to cite this article: Chan KO, Alexander AM, Grismer
LL, et al. Species delimitation with gene flow: A
methodological comparison and population genomics
approach to elucidate cryptic species boundaries in Malaysian
Torrent Frogs. Mol Ecol. 2017;00:116.
... gerutu Chan, Abraham, Grismer &Grismer, 2018, all of which occur south of the Isthmus of Kra in extreme southern Thailand and Peninsular Malaysia (Chan-ard 2003;Chan et al. 2018;Niyomwan et al. 2019). Amolops larutensis was recently partitioned into A. larutensis, A. australis, and A. gerutu based on corroborated lines of evidence in mitochondrial DNA, genomic DNA, and morphology (Chan et al. 2017(Chan et al. , 2018. Multiple molecular phylogenetic analyses have demonstrated the sister relationship between A. cremnobatus in Indochina and the geographically disparate A. larutensis (as one or three species) in the Malay Peninsula (Matsui et al. 2006;Cai et al. 2007;Wiens et al. 2009;Kurabayashi et al. 2010;Pyron and Wiens 2011;Goutte et al. 2016;Chan et al. 2017;Zeng et al. 2020;Mahony et al. 2022). ...
... Amolops larutensis was recently partitioned into A. larutensis, A. australis, and A. gerutu based on corroborated lines of evidence in mitochondrial DNA, genomic DNA, and morphology (Chan et al. 2017(Chan et al. , 2018. Multiple molecular phylogenetic analyses have demonstrated the sister relationship between A. cremnobatus in Indochina and the geographically disparate A. larutensis (as one or three species) in the Malay Peninsula (Matsui et al. 2006;Cai et al. 2007;Wiens et al. 2009;Kurabayashi et al. 2010;Pyron and Wiens 2011;Goutte et al. 2016;Chan et al. 2017;Zeng et al. 2020;Mahony et al. 2022). Recent analysis of mitochondrial DNA of A. cremnobatus from three localities, one each in Laos, Vietnam, and Thailand, revealed surprisingly high levels of genetic divergences , and a species delimitation method using Multi-rate Poisson Tree Processor (mPTP, Kapli et al. 2017) interpreted those mitochondrial sequences to represent two distinct species . ...
... Amolops larutensis, the sister taxon of A. cremnobatus (as one or five species), was recently partitioned into three species (A. larutensis, A. australis, and A. gerutu) in the Malay Peninsula, also based on corroborating lines of evidence in morphological, mt, and nu data (Chan et al. 2017;Chan et al. 2018), with notably greater representation of nu data than was used here consisting of genome-wide single-nucleotide polymorphisms (Chan et al. 2017). Chan et al. (2018) found considerable overlap in morphological variation among their three species, and therefore advocated for primarily using the tuberculation and pattern on the rear of the thighs to diagnose them morphologically. ...
Full-text available
The Lao torrent frog Amolops cremnobatus Inger & Kottelat, 1998 was recently hypothesized, based on mitochondrial DNA, to consist of more than a single species across its range in Laos and flanking regions of Vietnam and Thailand. We tested this hypothesis using mitochondrial DNA, nuclear DNA, and quantitative and qualitative morphological data from adults and larvae. We found corroborating lines of evidence for five distinct evolutionary lineages that we hypothesize to be species. Amolops cremnobatus sensu stricto is restricted to the southeastern portion of its previous range, and remaining populations are described as four new species. Some of the new species are easier to diagnose with morphology as larvae than as adults. Further sampling in northern Thailand may reveal an additional species of this torrent frog complex.
... A c c e p t e d M a n u s c r i p t 4 Delimiting species in the presence of gene flow remains challenging despite the increasing availability of genome-scale data and the development of sophisticated models (Chan et al. 2017(Chan et al. , 2020(Chan et al. , 2022aJackson et al. 2017;Sukumaran and Knowles 2017;Dincă et al. 2019;Leaché et al. 2019;Jiao and Yang 2020;Burbrink and Ruane 2021). This is partly because the confounding effects of gene flow are multifarious and can affect different facets of the species delimitation process. ...
... The optimal substitution model was obtained via MODELFINDER (Kalyaanamoorthy et al. 2017) and branch support was determined using ultrafast bootstrapping (Hoang et al. 2017) and a Shimodaira-Hasegawa-like approximate likelihood ratio test (Shimodaira and Hasegawa 1999;Guindon et al. 2010). The IQ-TREE analysis was performed on datasets that retained the most loci (50% missing taxa) to maximize phylogenetic content (Chan et al. 2017;Eaton et al. 2017;Crotti et al. 2019). We performed two separate analyses on datasets derived from clustering thresholds of 0.85 and 0.95 (both at 50% missing taxa) to determine if different amounts of loci resulting from different clustering thresholds affected phylogenetic inference. ...
... The confounding effects of gene flow on phylogenetic inference and species delimitation have been demonstrated in simulation and empirical studies (Chan et al. 2017(Chan et al. , 2020(Chan et al. , 2022aJackson et al. 2017;Sukumaran and Knowles 2017;Leaché et al. 2019;Pyron et al. 2022). Studies have shown that gene flow can result in spurious, misleading, yet highly supported phylogenetic arrangements that can cause a myriad of cascading issues. ...
Full-text available
Mangrove pit vipers of the Trimeresurus purpureomaculatus-erythrurus complex are the only species of viper known to inhabit mangroves. Despite serving integral ecological functions in mangrove ecosystems, the evolutionary history, distribution, and species boundaries of mangrove pit vipers remain poorly understood, partly due to overlapping distributions, confusing phenotypic variations, and the lack of focused studies. Here, we present the first genomic study on mangrove pit vipers and introduce a robust hypothesis-driven species delimitation framework that considers gene flow and phylogenetic uncertainty in conjunction with a novel application of a new class of speciation-based delimitation model implemented through the program Delineate.Our results showed that gene flow produced phylogenetic conflict in our focal species and substantiates the artefactual branch effect where highly admixed populations appear as divergent non-monophyletic lineages arranged in a stepwise manner at the basal position of clades. Despite the confounding effects of gene flow, we were able to obtain unequivocal support for the recognition of a new species based on the intersection and congruence of multiple lines of evidence. This study demonstrates that an integrative hypothesis-driven approach predicated on the consideration of multiple plausible evolutionary histories, population structure/differentiation, gene flow, and the implementation of a speciation-based delimitation model can effectively delimit species in the presence of gene flow and phylogenetic conflict.
... However, many recent investigations use only a limited number of available procedures (Esselstyn et al., 2012;Carstens et al., 2013), lack a unified general analytical framework, and have fallen short of fundamental hypothesis-testing analytical procedures. The limitations of many incomplete investigations have been uncovered by more comprehensive reinvestigations in recent times (Leaché et al., 2009;Vieites et al., 2009;Chan et al., 2017Chan et al., , 2020Sukumaran and Knowles, 2017). ...
... For any standard, two-step (A) proposition of hypothesized species boundaries (''discovery'' stage), and subsequent (B) testing of hypothesized species boundaries (''validation'' stage) procedure, it is the data-driven, statistically defensible species delimitation (Sites and Marshall, 2003;Welton et al., 2013;Freudenstein et al., 2016;Chan et al., 2017) and analytically reproduceable framework (Fujita et al., 2012;Leaché et al., 2014) that has the potential to withstand healthy scientific skepticism. To aid the standardization of integrated taxonomy, three categories of variably characterized candidate species have been adopted: unconfirmed candidate species, confirmed candidate species, and deep conspecific lineages (Vieites et al., 2009). ...
... Also, failure rates of using exclusively morphological data or single-marker barcoding confrm that neither should be used as a single information source (Hillis, 1987;Smith and Carstens, 2019). This awareness raises the need for a cultural change in the practice of revisionary taxonomy, which places an objective burden of proof on authors, necessitating statistical analyses of multiple data streams (Fujita et al., 2012;Chan et al., 2017Chan et al., , 2018Chan et al., , 2020Jackson et al., 2017;Oliver et al., 2018). In recent years, integrative taxonomic approaches that combine multiple, independent, data or character sets (such as external morphological, internal anatomical, ecological, acoustic, and larval traits; apart from geographic considerations, sympatry versus allopatry, and inference of biogeographical range evolution), and rigorous statistical procedures, are becoming industry standard (Dayrat, 2005;Padial et al., 2010;Schlick-Steiner et al., 2010). ...
Full-text available
Taxonomic studies over the past decade of the endemic Night Frog genus Nyctibatrachus (originally described in 1882) from Peninsular India have more than tripled, from 11 at the turn of this century to 36 by 2017. Despite these revisionary contributions, it is still challenging for field biologists to identify night frog species reliably, due to a near-complete absence of diagnostic, discrete character states or trait values. Worse, many questionably diagnosed night frog species' status has ostensibly been ''supported'' by phylogenies derived from sparsely sampled gene-trees that are based on a single locus or a handful of markers-with topology and arbitrary genetic distance thresholds of 3-6% used to support new species descriptions. We sought to re-evaluate and validate the species boundaries of six currently nominated species of Nyctibatrachus of the aliciae group (N. aliciae, N. periyar, N. deveni, N. pillaii), N. vasanthi, and N. poocha clade using a comprehensive integrative taxonomic approach that integrates classical taxonomy, molecular species delimitation analysis, statistical analysis of morphological characters of adults and larvae, analyses of bioacoustics, and natural history information. Our results indicate that recent descriptions of Nyctibatrachus deveni, N. periyar, and N. pillaii represent cases of taxonomic inflation (over-splitting), because the evidence cited in support of their recognition is irreproducible, subjective, and devoid of strong statistical support. We demonstrate the need for multidimensional species delimitation approaches in the celebrated Western Ghats biodiversity hotspot paleo-endemic genus Nyctibatrachus and suspect that this concerning trend of over-splitting amphibian species based on limited data and untenable support may be applicable to other amphibian groups.
... Recent studies have shown that integrating multiple types of data and methods can enhance the accuracy of species delimitation (Chan et al., 2017;Masonick & Weirauch, 2019;Kirchner et al., 2021). Hence, we used five distinct methods (two distance-based and three tree-based species delimitation methods) and two datasets (215COI and PCG123) to assess species boundaries within ...
... Comprehensive sampling of various species is critically important for discovering cryptic species and robustly evaluating levels of biodiversity and Accepted Article genetic diversity (Liu et al., 2020;Kirchner et al., 2021;Jiang et al., 2021 Resolving the taxonomic boundaries between related species has long been a major focus of species delimitation studies. The use of multiple data sources and population genetic analysis has greatly aided species delimitation studies (Chan et al., 2017;Leaché et al., 2018;Meleshko et al., 2021). Phylogenetic analyses revealed that C. zhengi, C. quaternaria, C. albida sp. ...
Full-text available
Economically significant bean pests of the genus Chauliops are species‐rich in the areas surrounding the Qinghai–Tibet Plateau and provide an excellent system for speciation studies. Here, an integrative taxonomic approach employing morphological analyses, population genetic methods and multiple molecular species delimitation methods was used to clarify the taxonomy of Chauliops in East and Southeast Asia. Four new species (Chauliops parahorizontalis Li & Bu, sp. nov., Chauliops albida Li & Bu, sp. nov., Chauliops bicoloripes Li & Bu, sp. nov., and Chauliops paraconica Li & Bu, sp. nov.) were described, which increases the number of Chauliops species in this area from six to ten; a key for Chauliops species is also provided. Phylogenetic analysis and divergence time estimation revealed that Chauliops was divided into four clades: Clade A (C. bisontula + (C. horizontalis + C. parahorizontalis sp. nov.)), Clade B (C. albida sp. nov. and C. bicoloripes sp. nov.), Clade C (C. quaternaria and C. zhengi), and Clade D (C. fallax + (C. conica + C. paraconica sp. nov.)). Two species diversification events of Chauliops estimated to have occurred 7–1 million years ago (Ma) and 25–13 Ma were detected. These speciation events were consistent with the two historical uplift events of the Qinghai–Tibet Plateau, suggesting that orogeny might have provided opportunities for the diversification of Chauliops species on the southeastern margin of the Qinghai–Tibet Plateau. Our findings show that population genetic analyses can be used to delimit related species and that orogeny is a key driver of species diversification on the southeastern margin of the Qinghai–Tibet Plateau. This article is protected by copyright. All rights reserved.
... Consequently, deep intraspecific divergence of genetic lineages (Avise, 2000;Coyne & Orr, 2004) can be generated by a variety of geographic and ecological processes that are either (i) part of a generalized trajectory of evolutionary divergence towards speciation or (ii) population structure that is nonetheless unified by high rates of gene flow. Therefore, demographic-model selection and tests of genetic isolation by distance and migration (Chan et al., 2017;Jackson et al., 2017) can be instrumental in differentiating structure versus speciation (Sukumaran & Knowles, 2017). When the former is produced by landscape-scale processes that are held in check by ongoing migration (Seeholzer & Brumfield, 2018), this is detectable by a variety of methods as described below and elsewhere (see reviews in Carstens et al., 2022;Sukumaran et al., 2021). ...
Numerous mechanisms drive ecological speciation, including isolation by adaptation, barrier, distance, environment, hierarchy, and resistance. These promote genetic and phenotypic differentiation of local populations, formation of phylogeographic lineages, and ultimately, completed speciation via reinforcement. In contrast, it is possible that similar mechanisms might lead to lineage cohesion through stabilizing rather than diversifying ecomorphological selection and the long-term persistence of population structure within species. Processes that drive the formation and maintenance of geographic genetic diversity while facilitating high rates of migration and limiting phenotypic divergence may thereby result in population structure that is not accompanied by divergence towards reproductive isolation. We suggest that this framework can be applied more broadly to address the classic dilemma of “structure versus speciation” when evaluating phylogeographic diversity, unifying population genetics, species delimitation, and the underlying study of speciation. We demonstrate one such instance in the Seepage Salamander (Desmognathus aeneus) from the southeastern United States. Recent studies estimated up to 6.3% mitochondrial divergence and 4 phylogenomic lineages with broad admixture across geographic hybrid zones, which could potentially represent distinct species. However, while limited dispersal promotes substantial isolation by distance, extreme microhabitat specificity appears to yield stabilizing selection on ecologically mediated phenotypes. As a result, climatic cycles promote recurrent contact between lineages that are not adaptively differentiated and therefore experience repeated bouts of high migration and introgression through time. This leads to a unified, single species with deeply divergent phylogeographic lineages that nonetheless do not appear to represent incipient species.
... In contrast to most studies, which assess population structure only under optimal K or under a few near-optimal K values (e.g., Chan et al. 2017;Weiss et al. 2018;Quattrini et al. 2019), we considered gene pool sharing across a wide range of K because optimal K can be unstable and difficult to reproduce (Gilbert et al. 2012;Novembre 2016), especially where alternative K values have similar cross-entropy scores (e.g., Supplementary Fig. S5a). Consistent with the recommendation that population structure be assessed across a range of K values (Pritchard et al. 2000), we therefore devised a metric, multi-K gene pool sharing (MKS), which quantifies the number of K levels across which a pair of populations share at least one gene pool. ...
Species delimitation in the genomic era has focused predominantly on the application of multiple analytical methodologies to a single massive parallel sequencing (MPS) data set, rather than leveraging the unique but complementary insights provided by different classes of MPS data. In this study we demonstrate how the use of two independent MPS data sets, a sequence capture data set and a single nucleotide polymorphism (SNP) data set generated via genotyping-by-sequencing, enables the resolution of species in three complexes belonging to the grass genus Ehrharta, whose strong population structure and subtle morphological variation limit the effectiveness of traditional species delimitation approaches. Sequence capture data are used to construct a comprehensive phylogenetic tree of Ehrharta and to resolve population relationships within the focal clades, while SNP data are used to detect patterns of gene pool sharing across populations, using a novel approach that visualizes multiple values of K. Given that the two genomic data sets are independent, the strong congruence in the clusters they resolve provides powerful ratification of species boundaries in all three complexes studied. Our approach is also able to resolve a number of single-population species and a probable hybrid species, both of which would be difficult to detect and characterize using a single MPS data set. Overall, the data reveal the existence of 11 and five species in the E. setacea and E. rehmannii complexes, with the E. ramosa complex requiring further sampling before species limits are finalized. Despite phenotypic differentiation being generally subtle, true crypsis is limited to just a few species pairs and triplets. We conclude that, in the absence of strong morphological differentiation, the use of multiple, independent genomic data sets is necessary in order to provide the cross-data set corroboration that is foundational to an integrative taxonomic approach.
... We used the python script easySFS to convert the VCF files to SFS, which also reduced the data set to one SNP per locus, as recommended (Overcast, 2020). SNP mutation rate in amphibians is not well known, so we used 1.9e-8, as has been used in similar studies of anurans, and is close to the general vertebrate mutation rate (Chan et al., 2017;Crawford, 2003). We used the 1-year generation rate across all populations to compare, however, we note there may be some variation in generation time across years and populations (Lever, 2001). ...
Full-text available
Widespread introduced species can be leveraged to investigate the genetic, ecological and adaptive processes underlying rapid evolution and range expansion, particularly the contributions of genetic diversity to adaptation. Rhinella marina, the cane toad, has been a focus of invasion biology for decades in Australia. However, their introduction history in North America is less clear. Here, we investigate the roles of introduction history and genetic diversity in establishment success of cane toads across their introduced range. We used reduced representation sequencing (ddRAD) to obtain 34,000 SNPs from 247 toads in native (French Guiana, Guyana, Ecuador, Panama, Texas) and introduced (Bermuda, southern Florida, northern Florida, Hawaiʻi, Puerto Rico) populations. Unlike all other cane toad introductions, we found that Florida populations are more closely related to native Central American lineages (R. horribilis), than to native Southern American lineages (R. marina). Furthermore, we find high levels of diversity and population structure in the native range, corroborating suggestions that R. marina is a species complex. We also find that introduced populations exhibit only slightly lower genetic diversity than native populations. Together with demographic analyses, this indicates founding populations of toads in Florida were larger than previously reported. Lastly, within R. marina, only one of 245 putatively adaptive SNPs showed fixed differences between native and introduced ranges, suggesting that putative selection in these introduced populations is based upon existing genetic variation. Our findings highlight the importance of genetic sequencing in understanding biological introductions and hint at the role of standing genetic variation in range expansion.
... Moreover, coalescent analyses are computationally intensive and are typically run with small datasets that may not encompass the wide scope of variation within and among closely related species (Giarla & Esselstyn, 2015). As a result, phylogenetic and multi-species coalescent models often do not perform well when ongoing gene flow, reticulate evolution and incomplete lineage sorting are present (Shaffer & Thomson, 2007;Yang & Ranala, 2010;Giarla & Esselstyn, 2015;Naciri & Linder, 2015;Chan et al., 2017;Sukumaran & Knowles, 2017), such as in rapid radiations. Population genetic approaches, although not traditionally used for taxonomic purposes, can overcome these limitations. ...
Species delimitation is challenging in rapid radiations because the typical markers of speciation are often obscured. Here, we use comprehensive sampling and genome-wide single nucleotide polymorphisms to assess species boundaries in a radiation of nine morphologically similar Leptospermum taxa that failed to be discriminated in previous phylogenomic analyses. Our data recovered clear separation of L. maxwellii, L. sericeum and L. inelegans as currently circumscribed. A phrase-named taxon, Leptospermum. sp. Peak Charles/Norseman, was not distinct from L. incanum, and we recommend their synonymization. Another pair, L. nitens and L. roei, were also indistinct and differ by a single morphological character that also varies in L. inelegans without taxonomic recognition. We recommend synonymization of L. nitens and L. roei and consistent treatment of this character as a non-diagnostic, variable trait. Difficulty arose in discriminating L. erubescens and L. oligandrum; we make three suggestions and recommend further morphological investigation to determine the most appropriate taxonomic outcome. As expected, hybridization was common across the complex, but, unexpectedly, many individual plants were genetically identical within, and sometimes between, populations of most species. We hypothesize that this is due to apomixis. Overall, this study demonstrates the value of population genomics in the integrative taxonomy toolbox for disentangling species in rapid radiations, while also offering insight to the evolution of this poorly known group of Australian Leptospermum.
Numerous mechanisms can drive speciation, including isolation by adaptation, distance, and environment. These forces can promote genetic and phenotypic differentiation of local populations, the formation of phylogeographic lineages, and ultimately, completed speciation. However, conceptually similar mechanisms may also result in stabilizing rather than diversifying selection, leading to lineage integration and the long‐term persistence of population structure within genetically cohesive species. Processes that drive the formation and maintenance of geographic genetic diversity while facilitating high rates of migration and limiting phenotypic differentiation may thereby result in population genetic structure that is not accompanied by reproductive isolation. We suggest that this framework can be applied more broadly to address the classic dilemma of “structure” versus “species” when evaluating phylogeographic diversity, unifying population genetics, species delimitation, and the underlying study of speciation. We demonstrate one such instance in the Seepage Salamander ( Desmognathus aeneus ) from the southeastern United States. Recent studies estimated up to 6.3% mitochondrial divergence and four phylogenomic lineages with broad admixture across geographic hybrid zones, which could potentially represent distinct species supported by our species‐delimitation analyses. However, while limited dispersal promotes substantial isolation by distance, microhabitat specificity appears to yield stabilizing selection on a single, uniform, ecologically mediated phenotype. As a result, climatic cycles promote recurrent contact between lineages and repeated instances of high migration through time. Subsequent hybridization is apparently not counteracted by adaptive differentiation limiting introgression, leaving a single unified species with deeply divergent phylogeographic lineages that nonetheless do not appear to represent incipient species.
Premise Species delimitation is an integral part of evolution and ecology and is vital in conservation science. However, in some groups species delimitation is difficult, especially where ancestral relationships inferred from morphological or genetic characters are discordant, possibly due to a complicated demographic history (e.g., recent divergences between lineages). Modern genetic techniques can take account of complex histories to distinguish species at a reasonable cost and are increasingly used in numerous applications. We focus on the scribbly gums, a group of up to five closely related and morphologically similar ‘species’ within the eucalypts. Methods Multiple populations of each recognized scribbly gum species were sampled over a wide region across climates and genome‐wide scans used to resolve species boundaries. Key results We found that none of the taxa were completely diverged, and there were two genetically distinct entities: the inland distributed Eucalyptus rossii and a coastal conglomerate consisting of four species forming three discernible, but highly admixed groups. Divergence among taxa was likely driven by temporal vicariant processes resulting in partial separation across biogeographic barriers. High interspecific gene flow indicated separated taxa reconnected at different points in time blurring species boundaries. Conclusions Our results highlight the need for genetic screening when dealing with closely related taxonomic entities, particularly those with modest morphological differences. We show that the use of high throughput sequencing can be effective at identifying species groupings and processes driving divergence, even in the most taxonomically complex groups, and support it as a standard practice for disentangling species complexes. This article is protected by copyright. All rights reserved.
Full-text available
The interplay between range expansion and concomitant diversification is of fundamental interest to evolutionary biologists, particularly when linked to intercontinental dispersal and/or large scale extinctions. The evolutionary history of true frogs has been characterized by circumglobal range expansion. As a lineage that survived the Eocene–Oligocene extinction event (EOEE), the group provides an ideal system to test the prediction that range expansion triggers increased net diversification. We constructed the most densely sampled, time-calibrated phylogeny to date in order to: (i) characterize tempo and patterns of diversification; (ii) assess the impact of the EOEE; and (iii) test the hypothesis that range expansion was followed by increased net diversification. We show that late Eocene colonization of novel biogeographic regions was not affected by the EOEE and surprisingly, global expansion was not followed by increased net diversification. On the contrary, the diversification rate declined or did not shift following geographical expansion. Thus, the diversification history of true frogs contradicts the prevailing expectation that amphibian net diversification accelerated towards the present or increased following range expansion. Rather, our results demonstrate that despite their dynamic biogeographic history, true frogs diversified at a relatively constantly rate, even as they colonized the major land masses of Earth. © 2017 The Author(s) Published by the Royal Society. All rights reserved.
Full-text available
Proc. R. Soc. B 279, 67–76. (7 January 2012; Published online 18 May 2011) (doi:10.1098/rspb.2011.0433) We would like to correct two errors in text. On page 70, §3a (Results: Species discovery trends), 9th line, the phrase ‘first-author taxonomists' should be changed to ‘first- and second-author taxonomists’. This is an isolated text error that does not alter our analytical results in any way. On page 72, table 1, incorrect values were erroneously entered into the Ndesc column. Putting in the correct values does not change any of the analytical results presented in the Ntot (IQR), Nundesc (IQR), and % columns of table 1. The corrected table is displayed below: View this table: • View inline • View popup • © 2013 The Author(s) Published by the Royal Society. All rights reserved.
Full-text available
Aim To quantify and compare species coverage in priority areas for conservation identified using species richness as opposed to approaches that use individual species range maps. Location Global. Methods We compare the coverage of species when global priority areas for conservation are identified based on (1) twelve species richness maps of all and small‐range amphibians, birds and mammals and all and small‐range threatened (i.e., vulnerable, endangered and critically endangered) species; (2) weighted range size rarity, a richness measure corrected for range size; and (3) a complementarity‐based analysis including species range maps for 21,075 terrestrial vertebrate species listed by the International Union for the Conservation of Nature. We also assessed whether any combination of small‐range and/or threatened species richness could be a suitable surrogate for a complementarity‐based analysis by assessing species coverage in priority areas located using (1) richness of small‐range species only; (2) richness of all threatened species only; and (3) richness of small‐range and threatened species. Results Our results show clear differences in the spatial pattern of priority areas for conservation among the prioritizations based on species richness, weighted range size rarity and species range maps, with the species richness‐based priority areas being highly aggregated in the tropics and the species range map priority areas being more evenly spread among the global terrestrial area. We also find that identifying priority areas for conservation using species richness produces a lower coverage of species than priority areas based on complementarity methods and identified using species range maps, where just one species was left without any protection. Main Conclusions As methods and software currently exist for processing large numbers of individual species distribution maps in spatial prioritization, the use of species richness appears to be an unnecessary simplification of biodiversity pattern.
Full-text available
Motivation: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences. Results: We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size. Availability and implementation: mPTP is implemented in C and is available for download at under the GNU Affero 3 license. A web-service is available at . Contact: : or or Supplementary information: Supplementary data are available at Bioinformatics online.
The flora and fauna of Southeast Asia are exceptionally diverse. The region includes several terrestrial biodiversity hotspots and is the principal global hotspot for marine diversity, but it also faces the most intense challenges of the current global biodiversity crisis. Providing reviews, syntheses and results of the latest research into Southeast Asian earth and organismal history, this book investigates the history, present and future of the fauna and flora of this bio- and geodiverse region. Leading authorities in the field explore key topics including palaeogeography, palaeoclimatology, biogeography, population genetics and conservation biology, illustrating research approaches and themes with spatially, taxonomically and methodologically focused case studies. The volume also presents methodological advances in population genetics and historical biogeography. Exploring the fascinating environmental and biotic histories of Southeast Asia, this is an ideal resource for graduate students and researchers as well as environmental NGOs.
The package adegenet for the R software is dedicated to the multivariate analysis of genetic markers. It extends the ade4 package of multivariate methods by implementing formal classes and functions to manipulate and analyse genetic markers. Data can be imported from common population genetics software and exported to other software and R packages. adegenet also implements standard population genetics tools along with more original approaches for spatial genetics and hybridization. Availability: Stable version is available from CRAN: Development version is available from adegenet website: Both versions can be installed directly from R. adegenet is distributed under the GNU General Public Licence (v.2). Supplementary information:Supplementary data are available at Bioinformatics online.
Significance Despite its widespread application to the species delimitation problem, our study demonstrates that what the multispecies coalescent actually delimits is structure. The current implementations of species delimitation under the multispecies coalescent do not provide any way for distinguishing between structure due to population-level processes and that due to species boundaries. The overinflation of species due to the misidentification of general genetic structure for species boundaries has profound implications for our understanding of the generation and dynamics of biodiversity, because any ecological or evolutionary studies that rely on species as their fundamental units will be impacted, as well as the very existence of this biodiversity, because conservation planning is undermined due to isolated populations incorrectly being treated as distinct species.