American Journal of Botany 105(3): 1–9, 2018; http://www.wileyonlinelibrary.com/journal/AJB © 2018 Botanical Society of America • 1
INVITED SPECIAL ARTICLE
For the Special Issue: Using and Navigating the Plant Tree of Life
A roadmap for global synthesis of the plant tree of life
Wolf L. Eiserhardt1,2,20 , Alexandre Antonelli3,4,5, Dominic J. Bennett3,4,5, Laura R. Botigué1, J. Gordon Burleigh6, Steven Dodsworth1,
Brian J. Enquist7,8, Félix Forest1, Jan T. Kim1, Alexey M. Kozlov9, Ilia J. Leitch1, Brian S. Maitner7, Siavash Mirarab10, William H. Piel11,
Oscar A. Pérez-Escobar1, Lisa Pokorny1, Carsten Rahbek12,13, Brody Sandel14, Stephen A. Smith15, Alexandros Stamatakis9,16, Rutger A. Vos17,18,
Tandy Warnow19, and William J. Baker1
Manuscript received 13 October 2017; revision accepted 8
1 Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
2 Department of Bioscience,Aarhus University, Ny Munkegade
116, 8000, Aarhus C, Denmark
3 Gothenburg Global Biodiversity Centre, Box 461, 405 30,
4 Department of Biological and Environmental Sciences,
University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
5 Gothenburg Botanical Garden, Carl Skottsbergs Gata 22B, SE-
413 19, Gothenburg, Sweden
6 Department of Biology,University of Florida, Florida 32611,
7 Department of Ecology and Evolutionary Biology,University of
Arizona, Tucson, AZ 85721, USA
8 e Santa Fe Institute, Santa Fe, NM 87501, USA
9 Scientic Computing Group,Heidelberg Institute for eoretical
Studies, 69118, Heidelberg, Germany
10 Department of Electrical and Computer Engineering,University
of California, San Diego, San Diego, CA 92093, USA
11 Yale-NUS College, 16 College Avenue West, Singapore, 138527,
Republic of Singapore
Providing science and society with an integrated, up- to- date, high quality, open, reproducible
and sustainable plant tree of life would be a huge service that is now coming within reach.
However, synthesizing the growing body of DNA sequence data in the public domain and
disseminating the trees to a diverse audience are often not straightforward due to numerous
informatics barriers. While big synthetic plant phylogenies are being built, they remain static
and become quickly outdated as new data are published and tree- building methods improve.
Moreover, the body of existing phylogenetic evidence is hard to navigate and access for
non- experts. We propose that our community of botanists, tree builders, and informaticians
should converge on a modular framework for data integration and phylogenetic analysis,
allowing easy collaboration, updating, data sourcing and exible analyses. With support from
major institutions, this pipeline should be re- run at regular intervals, storing trees and their
metadata long- term. Providing the trees to a diverse global audience through user- friendly
front ends and application development interfaces should also be a priority. Interactive
interfaces could be used to solicit user feedback and thus improve data quality and to
coordinate the generation of new data. We conclude by outlining a number of steps that we
suggest the scientic community should take to achieve global phylogenetic synthesis.
KEY WORDS angiosperms; bryophytes; GenBank; cyberinfrastructure; land plant phylogeny;
megaphylogenies; phylogenomics; phyloinformatics; pteridophytes; sampling.
12 Center for Macroecology, Evolution and Climate,University of Copenhagen, Universitetsparken 15, DK-2100, Copenhagen O, Denmark
13 Imperial College London, Silwood Park, Buckhurst Road, Ascot, Berkshire SL5 7PY, UK
14 Department of Biology,Santa Clara University, Santa Clara, CA 95053, USA
15 Department of Ecology and Evolutionary Biology,University of Michigan, Ann Arbor, MI 48109, USA
16 Institute for eoretical Informatics,Karlsruhe Institute of Technology, 76128, Karlsruhe, Germany
17 Naturalis Biodiversity Center, P.O. Box 9517, 2300RA, Leiden, e Netherlands
18 Institute of Biology Leiden, P.O. Box 9505, 2300RA, Leiden, e Netherlands
19 Department of Computer Science,University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
20 Author for correspondence (e-mail: email@example.com)
Citation: Eiserhardt, W. L., A. Antonelli, D. J. Bennett, L. R. Botigué, J. G. Burleigh, S. Dodsworth, B. J. Enquist, et al. 2018. A roadmap for global synthesis of the plant tree of life.
American Journal of Botany 105(3): 1–9.
2 • American Journal of Botany
e tree of life is a crucial reference system for the life sciences.
It is a fundamental infrastructure of scientic knowledge that is as
central to biology as the periodic table is to chemistry. Nevertheless,
the tree of life remains incompletely known and insuciently acces-
sible to potential users. at phylogenies are fundamental to evo-
lution and, thus, the life sciences has been recognized for decades
(Hennig, 1950; Felsenstein, 1985; McTavish etal., 2017), and the
demand for phylogenetic trees is higher than ever as the availability
of data that can be analyzed in a phylogenetic framework soars. For
example, trait and distribution data are now publicly available for
tens to hundreds of thousands of species (e.g. Kattge etal., 2011;
Enquist etal., 2016), facilitating very large comparative studies in
evolutionary biology, biogeography, ecology, conservation, and
other elds (e.g., Zanne etal., 2014). However, big data eorts in
biodiversity science and the global change biology community are
largely progressing without phylogenetic information (Jetz et al.,
2016; Joppa etal., 2016; Proença etal., 2017). While the scientic
community is nding ever more creative ways to utilize phyloge-
netic evidence (e.g., Strauss etal., 2006; Liu etal., 2012), access to
the tree of life is still insucient even aer several decades of big
tree building, and the huge contributions made by data synthesis
projects like TimeTree (Kumar etal., 2017) and e Open Tree Of
Life (Hinchli etal., 2015). us, our ability to address research
questions that can only be answered using very large phylogenetic
trees remains limited (Folk etal., 2018, in this issue).
e plant phylogenetic community has been highly collabora-
tive and productive over the last three decades. e major branches
of the land plant tree of life are now generally well established,
although some problematic nodes remain (Ruhfel et al., 2014;
Wickett etal., 2014; PPG I, 2016; Angiosperm Phylogeny Group,
2016; Gitzendanner etal., 2018, in this issue). Public databases such
as NCBI GenBank contain at least some DNA data from 27% of
known vascular plant species and 75% of genera (Hinchli and
Smith, 2014; RBG Kew, 2016). However, the extent to which these
data can resolve well- supported phylogenetic relationships has
been questioned (Hinchli and Smith, 2014). Moreover, the most
commonly sequenced loci represent a minuscule fraction of the
total information in plant genomes, with land plant nuclear ge-
nomes ranging in size from ca. 61 million to 149 billion base pairs
(Dodsworth et al., 2015). As of January 2017, only 225 vascular
plant genomes had been published, equivalent to <0.1% of land
plant diversity (RBG Kew, 2017). e gap between actually and po-
tentially available DNA sequence data for plants is thus immense.
More insidiously, public sequence data are plagued by serious
data quality concerns (e.g., Nilsson etal., 2006). For example, spe-
cies names are oen incorrectly spelled or, worse, taxonomically
incorrect. e problem is exacerbated as listed species names of-
ten are not linked to vouchers (Gratton etal., 2017). In addition,
species nomenclature does not keep pace with taxonomic updates.
Together, these issues point to the fact that data quality control is a
central challenge in the provision of an accurate plant tree of life.
Several new projects are now rising to the challenge of lling
the data gaps through high- throughput genomic sequencing across
the plants. For example, the Plant and Fungal Trees of Life Project
(PAFTOL) and Genealogy of Flagellate Plant Project (GoFlag) to-
gether aim to analyze hundreds of nuclear genes and plastid ge-
nomes from all genera and many species of land plants using a gene
capture approach (Weitemier et al., 2014). Large whole- genome
projects such as the Open Green Genomes Project and the 10,000
Plants Project (10KP: Normile, 2017) are also underway, which
build on the recent success of the 1,000 Plants Project (Wickett
etal., 2014). In dierent ways, these initiatives promise to deliver ex-
traordinary new resources for plant comparative biology. However,
together, they will tackle less than 10% of the known species di-
versity of land plants, presenting a fundamental limitation to the
usefulness of the phylogenies resulting from them. While complete
genome sequencing of all species of life on Earth is a stated ambi-
tion of the scientic community (Pennisi, 2017), the results may
not be realized for many years to come. It is essential, therefore, that
all available data, whether from public databases or new genomic
initiatives, are integrated to deliver the best possible estimate of the
plant tree of life at any given time.
e idea to generate synthetic phylogenies that combine all avail-
able phylogenetic evidence is not new. For example, e Open Tree of
Life and related AVATOL projects were herculean eorts to synthe-
size and facilitate the analysis of the entire tree of life (Hinchli etal.
2015). ese projects resulted in several resources that continue to
be useful and will continue to be updated (e.g., data store, taxonomy,
synthetic tree, online tree viewer). For plants, important synthetic
trees of life have been built through mining and compiling both pub-
lic DNA sequence data (e.g., Hinchli and Smith, 2014; Zanne etal.,
2014; Maitner etal., 2018), published phylogenies (Hinchli etal.,
2015), or a combination of both (Smith and Brown, 2018, in this
issue). While these trees have facilitated many analyses, each is lim-
ited in some respect. For example, despite the ever- increasing rate at
which DNA sequence data are generated, these synthetic trees are
not routinely updated and thus become quickly outdated. Moreover,
these phylogenies oen fail to capture the uncertainty and conict
underlying the data that has now been exposed by large genomic
analyses (Wickett etal., 2014; Shen etal., 2017). us, the users of the
plant tree of life are obliged either to choose an existing tree, regard-
less of its deciencies, or to build their own tree by mining public re-
positories and reconstructing phylogenetic relationships themselves.
Despite the creation of new pipelines (e.g., Antonelli etal., 2017;
Smith and Brown, 2018, this issue), the latter option remains beyond
the skills and desires of many potential users.
We believe that the plant phylogenetic community must nd
new ways to provide an integrated, up- to- date, high quality, open,
reproducible and sustainable tree (Table1) to a diverse user com-
munity. Here we propose a roadmap that outlines how our com-
munity could produce such a tree, focusing on the synthesis of all
publically available DNA sequence data. We argue that we need a
modular tree of life pipeline that allows distributed development of
tools across research groups. We nd it useful to break down this
pipeline into four main parts (Fig.1): gathering the data, phyloge-
netic reconstruction, data storage, and disseminating the tree of life.
Below, we outline the major challenges and opportunities associ-
ated with each part and conclude with a call to action, proposing
nine steps that we think would materially advance our quest for
global phylogenetic synthesis in plants. We note that the case study
here focuses on plants, but the principles could apply to any group
of organisms or even all of life.
GATHERING THE DATA
Constructing accurate and comprehensive phylogenies for extant
plants requires comprehensive molecular sampling. Despite hercu-
lean eorts by thousands of scientists over the last decades to col-
lect molecular data across the tree of life, there are still major data
2018, Volume 105 • Eiserhardt etal.—Roadmap for global synthesis of the plant tree of life • 3
gaps (Fig.2). Not only do we lack molecular data for approximately
285,000 of the 391,000 known species of vascular plants (RBG Kew,
2016), but also there is poor genomic coverage for most species for
which we do have data. Nevertheless, available molecular resources
are immense and continue to grow rapidly in size and complexity:
the NCBI database currently contains almost 38 million nucleotide
sequences for land plants, yet the challenge lies in the computational
demand of handling these data volumes. For example, all- versus- all
BLAST searching and clustering, a critical step in homology and
orthology assessment, becomes computationally prohibitive as
data increase. Moreover, data integration becomes more complex
as the number of databases increases, bringing dierent schemas
and interfaces. More importantly, we must now also adapt to di-
versifying data types, such as single loci, transcriptomes, genomes,
and restriction- site- associated DNA sequencing (RADSeq) data.
Despite these challenges, there have been signicant advances in
data set assembly that have addressed some of the complexity as-
sociated with genomic and transcriptomic data (Dunn etal., 2013;
Yang and Smith, 2014; Walker etal., 2018, in this issue). Researchers
can leverage these recent developments along with advances in large
data set construction (Freyman, 2015; Antonelli etal., 2017; Smith
and Brown, 2018, in this issue) to overcome the challenges faced by
diverse and large data sources.
In addition to the computational and biological complexities
that accompany diverse data, signicant concerns surround data
quality in public databases, such as contamination, lack of sequence
validation, and a dearth of links to specimens. e identication of
mislabeled or contaminant sequences is an important yet dicult
cleaning step that can now be facilitated by semi- automated meth-
ods (e.g., Kozlov etal., 2016; Rulik etal., 2017). In addition, a public
record of questionable sequences in GenBank is starting to emerge
(e.g., https://github.com/FePhyFoFum/seq_lters). Ideally, this in-
formation would be stored together with the sequence data, but
such storage is not currently possible given the limitations of public
databases. Community- curated reference sequence databases have
been successfully implemented by other communities, e.g., for fun-
gal ITS (Kõljalg etal., 2005), protist 18S rDNA (Berney etal., 2017),
and bacterial genomes (Chen etal., 2017), and a similar resource
would be invaluable for plants.
Taxonomic reconciliation is yet another signicant challenge that
emerges when integrating species data from multiple sources. For ex-
ample, whereas molecular databases such as GenBank use the NCBI
taxonomy, trait databases (e.g., BIEN) and geographical archives
(e.g., GBIF) may use other taxonomies. Each of these recognizes
their own sets of synonyms, alternative spellings, and taxon con-
cepts. Taxonomic reconciliation is the process of navigating this het-
erogeneity for purposes of data integration. Several web services (e.g.
iPlant TNRS, GlobalNames, TaxoSaurus) and “meta- taxonomies”
(e.g., the Open Tree of Life taxonomy) exist to support this process
(Rees and Cranston, 2017). Nevertheless, a modular infrastructure
for periodically rebuilding the plant tree of life, as proposed here,
would benet from a pre- computed taxonomic mapping of input
data sources, which would be both a more ecient approach than
accessing web resources each time, and a community- based product
that can itself be released, critiqued, corrected, and annotated.
Looking forward, the plant phylogenetics community can partly
preempt data integration problems by converging on common sets
of molecular loci, thus maximizing overlap among data sets. Such
convergence has happened in the past, when a small set of loci (e.g.,
rbcL, matK, ITS) was widely sequenced and used for phylogenetic
reconstruction and barcoding (CBOL Plant Working Group, 2009).
ese loci facilitated large phylogenetic analyses that spanned all
plants, but we now know that, for several reasons, additional data
sets are needed. For example, genomic analyses have exposed the
underlying complexity of phylogenetic conict, concordance, and
gene and genome duplication (Jarvis et al., 2014; Wickett etal.,
2014; Shen etal., 2017). Our data collection strategies need to re-
ect the reality of these patterns and processes. Common loci have
yet to emerge for the genomic age: for example, recently developed
marker sets for Asteraceae, Arecaceae and Detarioideae (Mandel
etal., 2014; Heyduk etal., 2016; M. de la Estrella, Royal Botanic
Gardens, Kew, unpublished data), each containing hundreds of loci,
only have ve loci in common. However, initiatives like PAFTOL
and GoFlag are now developing toolkits that will isolate a dened
TABLE 1. Major desiderata, challenges, and opportunities for global plant phylogenetic synthesis.
The tree of life should be: Challenge Opportunities
Integrated Synthetic trees are currently produced in an uncoordinated
way, using diverse methods with dierent limitations and
sampling. Additionally, trees are often generated in isolation
from related research communities, e.g., palaeontology.
Implementation of modular pipelines, common data standards
and application programming interfaces (APIs) would allow
multiple research groups to contribute to a central and exible
tree- building platform to serve dierent tree use applications
and better facilitate cross- community coordination.
Up to date Trees are usually static products that are out of date as soon
as they are published since new genetic data are constantly
produced. They have no specied routine for updates.
Phylogeny reconstruction can be scripted with minimal or no user
intervention, allowing scripts to be rerun automatically at regular
High quality Quality controls on data in public repositories are weak,
which reduces condence in synthetic phylogenies that use
New data should be generated to rigorous quality standards,
supported by the major repositories. Existing data can be cleaned
with automated algorithms, and problematic data should be
clearly marked. User feedback can improve data quality.
Open Not all methods and pipelines are open source, preventing
the community from fully using them, limiting
Well- established platforms such as GitHub, Dryad, FigShare,
and others allow sharing and customization of code, data, and
Reproducible Phylogeny reconstruction often involves manual editing, and
not all steps are fully documented. Thus, analyses cannot
readily be veried or re- run with updated input data.
Phylogeny reconstruction can be scripted to run without any user
intervention. Scripts and intermediate data (e.g., alignments) can
be archived and provided together with trees.
Sustainable Tree of Life research is often hampered by short project
lifetimes and funding cycles. No individual or organisation
has responsibility for maintaining a dynamic tree of life.
Institutions and data repositories could collaborate, pooling
complementary resources to create a sustainable service to the
4 • American Journal of Botany
set of several hundred orthologous loci across land plants. Data
generated in this way could play a similar role in the future that rbcL
and other popular loci have done in the past, but one that reects
the lessons we have gained from analyzing genomes and transcrip-
tomes over the last decade.
Any phylogenetic analysis at the scale of the plant tree of life will
challenge standard approaches for multiple sequence alignment
and phylogenetic inference. As the number of species and/or genes
increases, the accuracy of likelihood- based phylogenetic methods
can decrease, in particular when more taxa but not more genes are
added. Meanwhile, running times will always increase with increas-
ing data. As a concrete example, concatenation analyses using max-
imum likelihood (ML) are the most common approach for species
tree estimation, and existing parallel implementations (e.g., Kozlov
etal., 2015; Nguyen etal., 2015) can analyse data sets comprising
dozens to hundreds of whole genomes or transcriptomes (Jarvis
etal., 2014; Peters etal., 2017). However, no current ML method
scales in reasonable time to enable analyses of data sets with tens
of thousands of species and loci. For example, inferring a tree on
1600 insect transcriptomes (including bootstraps) would still take
an estimated 70 million CPU hours. e development of ever more
ecient and accurate methods for multiple sequence alignment
and phylogeny estimation is driven by the “arms race” between the
rapidly growing sequencing capacity on the one side and computa-
tional capacity and phylogenetic algorithms on the other side.
e biological realism of phylogenetic models (e.g., models of
sequence evolution) is another important challenge to accurate
phylogenetic reconstruction. Perhaps most importantly, recent
genomic and transcriptomic studies (e.g., Wickett etal., 2014; Sun
etal., 2015; Shen et al., 2017) have exposed considerable amounts
of gene tree discordance that need to be modeled appropriately.
Discordance had typically been considered to be the result of noise
and error, but these new data suggest that widespread discordance
is likely due, at least in part, to biological processes (e.g., incomplete
lineage sorting, hybridization, gene duplication and loss). is chal-
lenge is being addressed by species tree methods, which is an area of
rapid methodological development (e.g., Ané etal., 2007; Liu etal.,
2007; Heled and Drummond, 2010; Boussau etal., 2013; Chifman
and Kubatko, 2014; Mirarab etal., 2014). In spite of these promising
advances, several problems remain. Most species tree methods only
address a single source of discordance, and some sources remain
dicult to address, such as hybridization and allopolyploid specia-
tion (but see Yu etal., 2014; Yu and Nakhleh, 2015; Solís- Lemus and
Ané, 2016), which are particularly frequent in plants (Wood etal.,
2009; Van de Peer etal., 2017). In addition, it is not known how
accurate species tree approaches are for large numbers of species,
although some methods now scale to 10,000 species (Zhang etal.,
2017). Also, while it may be dicult to reconstruct reliable gene
trees due to lack of phylogenetic signal, techniques such as weighted
statistical binning can be helpful (Bayzid etal., 2014; Mirarab etal.,
2014), though additional developments that address this problem
may be necessary. In addition to discordance, heterogeneity in the
process of molecular evolution (e.g., lineage specic rate shis,
compositional evolution) may also complicate phylogenetic recon-
struction (Li etal., 2014; De La Torre etal., 2017). Researchers con-
tinue to address this complexity and comprehensive phylogenetic
reconstruction of plants should incorporate these developments
where possible (Foster etal., 2009; Cox etal., 2014).
Missing data are a notorious feature of phylogenetic analyses that
synthesize partly overlapping data from multiple sources, i.e., not all
loci are sampled for all taxa. Such analyses may be susceptible to er-
rors or analytical issues associated with missing data (e.g., Sanderson
etal., 2015). Projects such as PAFTOL and GoFlag that are expand-
ing the number of orthologous regions sequenced, in addition to
continuing genomic and transcriptomic eorts, will, at least in part,
address this problem. However, methodological developments that
tackle phylogenetic reconstruction with a “divide and conquer” ap-
proach may also overcome these issues by reducing the phylogenetic
problem to data matrices that have less missing data (e.g., Smith and
FIGURE 1. Schematic representation of a pipeline for building and dis-
seminating an integrated, up- to- date, high quality, open, reproducible,
and sustainable tree of life for plants. Colors refer to the sections in the
text: blue, gathering the data; yellow, phylogenetic reconstruction; pur-
ple, storing the data; green, disseminating the tree of life.
2018, Volume 105 • Eiserhardt etal.—Roadmap for global synthesis of the plant tree of life • 5
Brown, 2018, in this issue). ese methods can then be combined
with other developments in supertree construction to gra these
subtrees into a comprehensive tree (Akanni etal., 2015; Lafond etal.,
2017; Redelings and Holder, 2017; Vachaspati and Warnow, 2017).
Many of the phylogenetic challenges that face the reconstruction
of a comprehensive plant tree will require new developments in phy-
logenetic methods, but are common to the reconstruction of other
parts of the tree of life. e alignments and data sets compiled as
part of an eort to construct a comprehensive plant phylogeny would
serve the phylogenetics community in driving the development of
new methods. ese new methods could then be used to reconstruct
a more accurate and useful comprehensive plant phylogeny.
Assembling the tree of life is fundamentally a big data problem: not
only does it produce large quantities of results in an iterative process,
FIGURE 2. A phylogeny of seed plants, Smith and Brown (2018, this issue), where the color of each branch corresponds to the proportion of species
from that clade that are represented in public sequence databases. Red branches are missing all or nearly all species, blue branches have a high pro-
portion of species sampled, and yellow and green branches have from one to three thirds of species sampled.
6 • American Journal of Botany
but each data object produced is large and complex. Consider that if
the tree of all plant species were oriented horizontally and the species
labels printed in 9- point font, the tree would extend twice the height
of the tallest human- made structure in the world, the Burj Khalifa in
Dubai (i.e., 830 m). us, not only is it a challenge to manage each
iteration of the pipeline, but also the trees themselves are too big for
any kind of meaningful visual inspection as a whole. Furthermore,
multiple sequence alignments are even larger than the trees. Also,
given the wide- ranging set of techniques and data sets available for
phylogenetic reconstruction, there will likely be multiple alternative
resolutions for many parts of the plant tree of life. To help users of
phylogenetic trees to make sense of such discordances requires eec-
tive ways of storing, comparing, and summarizing alternative resolu-
tions. For ecient management, quality control, and data output, we
require a scalable database, designed and optimized for the purpose.
Fundamentally, the database module of a tree of life pipeline is
responsible for tracking the provenance of input data, alignments,
metadata about the analysis, and phylogenetic results, and is also
essential for ensuring transparency and reproducibility (Leebens-
Mack etal., 2006). A key challenge is to establish the appropriate
balance between allowing exibility, and thereby future- proong
the assembly pipeline, while on the other hand fully normalizing
the data model to provide data integrity and query eciency for
core components (McTavish etal., 2015). e Open Tree of Life uses
a git- based system for tree storage, called Phylesystem (McTavish
etal., 2015). is system allows for versioning and metadata to be
attached. Furthermore, it allows for easy replication by other re-
searchers. is provides a potential model for future decentralized
Importantly, a database for storing phylogenetic trees must not
be developed in isolation. e demand to combine phylogenetic in-
formation with additional biological and abiotic data is increasing,
and any tree of life database should thus be compatible with global
common data standards (Panahiazar etal., 2013), allowing links to
initiatives that deliver, for example, plant distribution or trait data
(e.g. Kattge etal., 2011; Enquist etal., 2016; Maitner etal., 2018).
DISSEMINATING THE TREE OF LIFE
e use of phylogenetic information is crucial for solving pure and
applied problems in biology (Brooks and McLennan, 1991; Faith,
1992; Magurran, 2013) and has enormous potential for outreach
and education (Jenkins, 2009; MacDonald and Wiley, 2012). us, a
central challenge for developing a phylogenetic workow and serv-
ing big trees is to anticipate correctly a plethora of use cases (see
Box 1) and to develop a general cyberinfrastructure accordingly
(Go etal., 2011; Stoltzfus etal., 2013). As outlined above, this exi-
bility relies on an appropriate database structure, but the actual user
interface is equally important.
Publicly depositing phylogenetic trees in an editable electronic
format is largely standard practice nowadays (but see Stoltzfus etal.,
2012; Drew etal., 2013), allowing researchers to access a wealth of
phylogenetic information online (e.g., https://treebase.org/, https://
tree.opentreeoife.org/). Online storage would be particularly im-
portant for frequently updated trees that might not be associated
with a traditional, static publication. In this instance, proper ver-
sioning is essential, and care must be taken that each version of the
tree is citable (e.g., using a digital object identier). If alternative
phylogenetic methods were employed, the user should be enabled
to make an informed choice about the dierent resulting trees.
Special care must also be taken to communicate uncertainty (e.g.,
support values) in an understandable way. It should be noted that
trees stored in databases such as TreeBASE (Piel etal., 2009) are not
necessarily readily navigated by non- expert audiences, and more ac-
cessible interfaces can greatly increase the impact (e.g., OneZoom:
Rosindell and Harmon, 2012; and the Open Tree of Life).
In addition to an easily accessible means for interacting with
the tree or set of trees, any associated metadata need to be avail-
able. For example, sequence metadata (e.g., voucher, reference),
including both data stored in the repositories that the sequences
were obtained from, and data that cannot be stored in such repos-
itories (e.g., digital images of voucher specimens) should be linked
and made available where possible. is information contributes to
Box 1 An outline of general uses of global phylogenetic trees.
e following use cases together help dene and guide short and
long- term goals for a phylogenetic cyberinfrastructure.
(1) Applied user. A plant breeder may ask, does a given spe-
cies have the potential to be selected for certain traits (e.g.,
drought tolerance)? To answer this question, they will want
to input a taxon name and see a list of close relatives, ideally
annotated with the trait of interest.
(2) Educator: A botanic garden educator may want to make a
panel showing the phylogenetic relationships among some
species growing in the garden. ey will want to input a
short list of species (usually less than a 100) or identify a
clade of interest (e.g., Rosaceae) and download a phylogeny
of those species in a format that can be easily turned into a
visually appealing gure.
(3) Conservationist: A conservation biologist may want to
compare the phylogenetic diversity of a set of areas (e.g.,
forest fragments) to prioritize conservation eorts. ey
will want to calculate phylogenetic diversity using statis-
tical packages such as PICANTE (Kembel et al., 2010) or
Biodiverse (Laan et al., 2010), ideally without having to
choose and handle a phylogenetic tree.
(4) Comparative biologist: A comparative biologist may want
to test the relationship between climate and leaf traits across
a set of species. ey will want to run a phylogenetic re-
gression model that uses the most up-to-date phylogenetic
relationships, ideally without having to choose and handle
a phylogenetic tree (although they may have an opinion
on phylogenetic methods and appreciate getting to choose
among several alternative trees).
(5) Phylogeneticist: An experienced phylogeneticist may want
to build a tree using a specic combination of methods,
and potentially even modify/customize some of them. ey
would fork the phylogenetic pipeline, modify it, and poten-
tially run it on their own computational infrastructure.
(6) Senior biodiversity scientist: A principal investigator writ-
ing a grant application may wonder where phylogenetic
knowledge gaps are, where most sequencing eort is cur-
rently focused, and where additional eort would yield the
highest returns. ey would want to see a tree annotated
with data gaps (Fig.2), and ideally also with planned and
ongoing sequencing projects run by other groups.
2018, Volume 105 • Eiserhardt etal.—Roadmap for global synthesis of the plant tree of life • 7
future- proong the tree, as for example, taxonomic changes can be
applied retrospectively, and errors can be rectied. More generally,
users conducting phylogenetic analyses oen discover issues with
particular sequences, such as probable misidentications, unlikely
divergent sequences within species, and overly short, long, or gappy
sequences. ere should be a mechanism allowing users to high-
light issues with the database in terms of sequences, alignments, or
tree errors. e Open Tree of Life interface allows for the curation
and comment of input trees and data sources as well as the synthetic
tree (Hinchli etal., 2015). is functionality could be expanded to
include more specic information about alignments and sequences.
If presented in an appropriate way, a synthetic plant tree of life
has the potential to make the generation of new data more ecient
by highlighting clades and regions that should be prioritized to in-
crease total phylogenetic sampling. For example, the Open Tree of
Life synthetic tree browser allows users to explore which primary
phylogenetic studies any edge is derived from. While currently only
implemented in a supertree framework, this approach could be ex-
tended to sequence data. We envision a dynamic interface where
users can easily identify clades and regions that are poorly sampled
taxonomically and/or genetically. Such an interface should show
where species are missing, as well as reect the amount of data un-
derpinning the inferred relationships (Hinchli etal., 2015). e in-
terface could also allow users to annotate planned sequencing eorts,
i.e., which taxa and loci they plan to sequence, when, where, and con-
tact information for the project. is way, unnecessary duplication of
work could be reduced, scientic collaboration increased, and logis-
tics associated with eldwork and permit applications facilitated.
Besides viewing and downloading the entire tree, perhaps the
most central need is to provide tools to extract custom subtrees
from the plant tree of life, based on a list of taxa of relevance to a
specic research context. Methods such as Phylomatic (Webb and
Donoghue, 2005) and Phylotastic (Stoltzfus etal., 2013) have already
demonstrated the broad interest in such an application. Easy access
to custom subtrees would require tools and algorithms to generate
partial views of user- dened regions of larger trees. Importantly,
such tools would need to include a service for name reconciliation
(e.g., Boyle etal., 2013), allowing for taxonomic dierences between
the user input and the tree.
Although some generic uses are readily anticipated, perhaps the
most important way of serving the plant tree of life is through exi-
ble soware interfaces. For example, integration with the R (https://
www.r-project.org/) or Biopython (http://biopython.org/) soware
environments would allow the plant tree of life to be used in a wide
range of biostatistics and bioinformatics applications. More gener-
ally, the development of application programming interfaces (APIs)
is essential for ensuring a wide use of the tree, which could range
from websites and educational apps to stand- alone soware. APIs
allow external users to formally query and download data, opening
the door to an almost unlimited number of uses.
CONCLUSIONS AND CALL TO ACTION
Providing science and society with an integrated, up- to- date, high
quality, open, reproducible and sustainable plant tree of life would
be a huge service that is coming within reach. Technological and
methodological advances have paved the way for this synthesis,
but putting it into practice requires a concerted eort by the sci-
entic community. Here, we call on the community to embrace the
following actions, which would materially advance our quest for
global phylogenetic synthesis in plants:
1. Unite behind the collective goal of an integrated, up-to-date,
high quality, open, reproducible and sustainable tree of life for
2. Agree on an open framework for a tree of life pipeline with dis-
crete, interchangeable modules, drawing on the wealth of exist-
ing tools (Fig.1).
3. Encourage computer scientists and soware developers to ad-
dress priority analytical problems requiring innovative solutions.
4. Commit to computing trees at regular intervals (e.g., yearly,
monthly), ensuring that an up-to-date plant tree of life is always
5. Establish a sustainable infrastructure for long-term storage and
distribution of the resulting trees and associated metadata.
6. Create web tools that allow trees to be easily explored, queried,
and downloaded by diverse audiences, ranging from experts to
7. Create application programming interfaces (API) that allow
trees to be integrated in external soware.
8. Engineer a mechanism for community feedback on data quality,
which also feeds back to the original public source (e.g., NCBI
9. Provide a mechanism for identifying and prioritizing knowledge
gaps through dynamic cross-matching trees with public data
In this call to action, we emphasize the importance of community
coordination and institutional responsibility. Building and main-
taining pipelines that perform optimally at all steps discussed in this
paper is beyond the skills and resources of most individual research
labs. Similarly, within the constraints of standard research grants, a
rm commitment to regular tree updates, indeterminate storage of
trees and metadata, and actively maintained interfaces is near im-
possible. us, we need to build a collaborative, community- driven
platform that allows many individuals, groups, and institutions to
contribute according to their scientic strengths and resources. e
recently founded PhyloSynth network (https://phylosynth.github.
io/) aims to facilitate the development of such a platform, paving
the way toward an integrated, up- to- date, high quality, open, repro-
ducible and sustainable tree of life for plants. By embracing this call
to action, our community would extend its impact beyond the ivory
tower of pure comparative plant biology research, broadening its
societal reach and bringing tree of life research to bear on the global
challenges facing humanity today.
e authors thank Douglas E. Soltis and two anonymous reviewers
for helpful feedback on the manuscript and Olivier Maurin, Tuula
Niskanen, Beata Klejevskaja, and William Pearse for thought-
ful discussion. is work was partly supported by grants from
the Calleva Foundation, the Gareld Weston Foundation and the
Sackler Trust to the Royal Botanic Gardens, Kew. Part of this work
was funded by the Klaus Tschira Foundation to A.S.; U.S. National
Science Foundation grant ABI- 1458652 to T.W. and ABI-1458466
8 • American Journal of Botany
and AVATOL-1207915 to S.A.S.; Yale- NUS grants IG15- SI101 and
R- 607- 265- 200- 121 to W.H.P.
Akanni, W. A., M. Wilkinson, C. J. Creevey, P. G. Foster, and D. Pisani. 2015.
Implementing and testing Bayesian and maximum- likelihood supertree
methods in phylogenetics. Royal Society Open Science 2: 140436.
Ané, C., B. Larget, D. A. Baum, S. D. Smith, and A. Rokas. 2007. Bayesian esti-
mation of concordance among gene trees. Molecular Biology and Evolution
Angiosperm Phylogeny Group. 2016. An update of the Angiosperm Phylogeny
Group classication for the orders and families of owering plants: APG IV.
Botanical Journal of the Linnean Society 181: 1–20.
Antonelli, A., H. Hettling, F. L. Condamine, K. Vos, R. H. Nilsson, M. J.
Sanderson, H. Sauquet, et al. 2017. Toward a self- updating platform for es-
timating rates of speciation and migration, ages, and relationships of taxa.
Systematic Biology 66: 152–166.
Bayzid, M. S., T. Hunt, and T. Warnow. 2014. Disk covering methods improve
phylogenomic analyses. BMC Genomics 15(supplement 6): S7.
Berney, C., A. Ciuprina, S. Bender, J. Brodie, V. Edgcomb, E. Kim, J. Rajan, et al.
2017. UniEuk: time to speak a common language in protistology!. Journal of
Eukaryotic Microbiology 64: 407–411.
Boussau, B., G. J. Szöllosi, L. Duret, M. Gouy, E. Tannier, and V. Daubin. 2013.
Genome- scale coestimation of species and gene trees. Genome Research 23:
Boyle, B., N. Hopkins, Z. Lu, J. A. Raygoza Garay, D. Mozzherin, T. Rees, N.
Matasci, et al. 2013. e taxonomic name resolution service: an online tool
for automated standardization of plant names. BMC Bioinformatics 14: 16.
Brooks, D. R., and D. A. Mclennan. 1991. Phylogeny, ecology, and behavior:
a research program in comparative biology. University of Chicago Press,
Chicago, IL, USA.
CBOL Plant Working Group. 2009. A DNA barcode for land plants. Proceedings
of the National Academy of Sciences, USA 106: 12794–12797.
Chen, I. M. A., V. M. Markowitz, K. Chu, K. Palaniappan, E. Szeto, M. Pillay, A.
Ratner, et al. 2017. IMG/M: integrated genome and metagenome compara-
tive data analysis system. Nucleic Acids Research 45: D507–D516.
Chifman, J., and L. Kubatko. 2014. Quartet inference from SNP data under the
coalescent model. Bioinformatics 30: 3317–3324.
Cox, C. J., B. Li, P. G. Foster, T. M. Embley, and P. Civán. 2014. Conicting phy-
logenies for early land plants are caused by composition biases among syn-
onymous substitutions. Systematic Biology 63: 272–279.
De La Torre, A. R., Z. Li, Y. Van De Peer, and P. K. Ingvarsson. 2017. Contrasting
rates of molecular evolution and patterns of selection among gymnosperms
and owering plants. Molecular Biology and Evolution 34: 1363–1377.
Dodsworth, S., A. R. Leitch, and I. J. Leitch. 2015. Genome size diversity in an-
giosperms and its inuence on gene space. Current Opinion in Genetics and
Development 35: 73–78.
Drew, B. T., R. Gazis, P. Cabezas, K. S. Swithers, J. Deng, R. Rodriguez, L. A. Katz, et
al. 2013. Lost branches on the tree of life. PLoS Biology 11: e1001636.
Dunn, C. W., M. Howison, and F. Zapata. 2013. Agalma: an automated phylog-
enomics workow. BMC Bioinformatics 14: 330.
Enquist, B. J., R. Condit, R. K. Peet, M. Schildhauer, and B. M. iers. 2016.
Cyberinfrastructure for an integrated botanical information network to in-
vestigate the ecological impacts of global climate change on plant biodiver-
sity. PeerJ Preprints e2615v2.
Faith, D. P. 1992. Conservation evaluation and phylogenetic diversity. Biological
Conservation 61: 1–10.
Felsenstein, J. 1985. Phylogenies and the comparative method. American
Naturalist 125: 1–15.
Folk, R. A., M. Sun, P. S. Soltis, S. A. Smith, D. E. Soltis, and R. P. Guralnick. 2018.
Wrestling with Rosids: Challenges of comprehensive taxon sampling in com-
parative biology. American Journal of Botany 105 (in press).
Foster, P. G., C. J. Cox, and T. M. Embley. 2009. e primary divisions of life:
a phylogenomic approach employing composition- heterogeneous methods.
Philosophical Transactions of the Royal Society of London, B, Biological
Sciences 364: 2197–2207.
Freyman, W. A. 2015. SUMAC: Constructing phylogenetic supermatrices and as-
sessing partially decisive taxon coverage. Evolutionary Bioinformatics Online
Gitzendanner, M. A., P. S. Soltis, G. K.-S. Wong, B. R. Ruhfel, and D. E. Soltis.
2018. Plastid phylogenomic analysis of green plants: a billion years of evolu-
tionary history. American Journal of Botany 105.
Go, S. A., M. Vaughn, S. Mckay, E. Lyons, A. E. Stapleton, D. Gessler, N.
Matasci, et al. 2011. e iPlant Collaborative: cyberinfrastructure for plant
biology. Frontiers in Plant Science 2: 34.
Gratton, P., S. Marta, G. Bocksberger, M. Winter, E. Trucchi, and H. Kühl. 2017.
A world of sequences: Can we use georeferenced nucleotide databases for
a robust automated phylogeography? Journal of Biogeography 44: 475–486.
Heled, J., and A. J. Drummond. 2010. Bayesian inference of species trees from
multilocus data. Molecular Biology and Evolution 27: 570–580.
Hennig, W. 1950. Grundzüge einer eorie der phylogenetischen Systematik.
Deutscher Zentralverlag, Berlin, Germany.
Heyduk, K., D. W. Trapnell, C. F. Barrett, and J. Leebens-Mack. 2016.
Phylogenomic analyses of species relationships in the genus Sabal
(Arecaceae) using targeted sequence capture. Biological Journal of the
Linnean Society of London 117: 106–120.
Hinchli, C. E., and S. A. Smith. 2014. Some limitations of public sequence data
for phylogenetic inference (in plants). PLoS One 9: e98986.
Hinchli, C. E., S. A. Smith, J. F. Allman, J. G. Burleigh, R. Chaudhary, L. M.
Coghill, K. A. Crandall, et al. 2015. Synthesis of phylogeny and taxonomy
into a comprehensive tree of life. Proceedings of the National Academy of
Sciences, USA 112: 12764–12769.
Jarvis, E. D., S. Mirarab, A. J. Aberer, B. Li, P. Houde, C. Li, S. Y. W. Ho, et al. 2014.
Whole- genome analyses resolve early branches in the tree of life of modern
birds. Science 346: 1320–1331.
Jenkins, K. P. 2009. Evolution in biology education: sparking imaginations and
supporting learning. Evolution: Education and Outreach 2: 347–348.
Jetz, W., J. Cavender-Bares, R. Pavlick, D. Schimel, F. W. Davis, G. P. Asner, R.
Guralnick, et al. 2016. Monitoring plant functional diversity from space.
Nature Plants 2: 16024.
Joppa, L. N., B. O’Connor, P. Visconti, C. Smith, J. Geldmann, M. Homann, J. E.
M. Watson, et al. 2016. Big data and biodiversity. Filling in biodiversity threat
gaps. Science 352: 416–418.
Kattge, J., S. Díaz, S. Lavorel, I. C. Prentice, P. Leadley, G. Bönisch, E. Garnier, et al. 2011.
TRY – a global database of plant traits. Global Change Biology 17: 2905–2935.
Kembel, S. W., P. D. Cowan, M. R. Helmus, W. K. Cornwell, H. Morlon, D. D.
Ackerly, S. P. Blomberg, and C. O. Webb. 2010. Picante: R tools for integrat-
ing phylogenies and ecology. Bioinformatics 26: 1463–1464.
Kõljalg, U., K.-H. Larsson, K. Abarenkov, R. H. Nilsson, I. J. Alexander, U.
Eberhardt, S. Erland, et al. 2005. UNITE: a database providing web- based
methods for the molecular identication of ectomycorrhizal fungi. New
Phytologist 166: 1063–1068.
Kozlov, A. M., A. J. Aberer, and A. Stamatakis. 2015. ExaML version 3: a tool for
phylogenomic analyses on supercomputers. Bioinformatics 31: 2577–2579.
Kozlov, A. M., J. Zhang, P. Yilmaz, F. O. Glöckner, and A. Stamatakis. 2016.
Phylogeny- aware identication and correction of taxonomically mislabeled
sequences. Nucleic Acids Research 44: 5022–5033.
Kumar, S., G. Stecher, M. Suleski, and S. B. Hedges. 2017. TimeTree: A re-
source for timelines, timetrees, and divergence times. Molecular Biology and
Evolution 34: 1812–1819.
Laan, S. W., E. Lubarsky, and D. F. Rosauer. 2010. Biodiverse, a tool for the
spatial analysis of biological and related diversity. Ecography 33: 643–647.
Lafond, M., C. Chauve, N. El-Mabrouk, and A. Ouangraoua. 2017. Gene tree
construction and correction using supertree and reconciliation. IEEE/ACM
Transactions on Computational Biology and Bioinformatics, early online.
Leebens-Mack, J., T. Vision, E. Brenner, J. E. Bowers, S. Cannon, M. J. Clement,
C. W. Cunningham, et al. 2006. Taking the rst steps towards a standard
for reporting on phylogenies: Minimum Information About a Phylogenetic
Analysis (MIAPA). OMICS 10: 231–237.
2018, Volume 105 • Eiserhardt etal.—Roadmap for global synthesis of the plant tree of life • 9
Li, B., J. S. Lopes, P. G. Foster, T. M. Embley, and C. J. Cox. 2014. Compositional
biases among synonymous substitutions cause conict between gene and pro-
tein trees for plastid origins. Molecular Biology and Evolution 31: 1697–1709.
Liu, L., D. K. Pearl, and T. Buckley. 2007. Species trees from gene trees: recon-
structing Bayesian posterior distributions of a species phylogeny using esti-
mated gene tree distributions. Systematic Biology 56: 504–514.
Liu, X., M. Liang, R. S. Etienne, Y. Wang, C. Staehelin, and S. Yu. 2012.
Experimental evidence for a phylogenetic Janzen–Connell eect in a sub-
tropical forest. Ecology Letters 15: 111–118.
MacDonald, T., and E. O. Wiley. 2012. Communicating phylogeny: evolutionary
tree diagrams in museums. Evolution: Education and Outreach 5: 14–28.
Magurran, A. E. 2013. Measuring biological diversity. John Wiley, Chichester, UK.
Maitner, B. S., B. Boyle, N. Casler, R. Condit, J. Donoghue, S. M. Durán, D.
Guaderrama, et al. 2018. e bien r package: A tool to access the Botanical
Information and Ecology Network (BIEN) database. Methods in Ecology and
Evolution 9: 373–379.
Mandel, J. R., R. B. Dikow, V. A. Funk, R. R. Masalia, S. E. Staton, A. Kozik, R. W.
Michelmore, et al. 2014. A target enrichment method for gathering phyloge-
netic information from hundreds of loci: an example from the Compositae.
Applications in Plant Sciences 2: 1300085.
McTavish, E. J., B. T. Drew, B. Redelings, and K. A. Cranston. 2017. How and why
to build a unied tree of life. BioEssays 39: 1700114.
McTavish, E. J., C. E. Hinchli, J. F. Allman, J. W. Brown, K. A. Cranston, M.
T. Holder, J. A. Rees, and S. A. Smith. 2015. Phylesystem: a git- based data
store for community- curated phylogenetic estimates. Bioinformatics 31:
Mirarab, S., R. Reaz, M. S. Bayzid, T. Zimmermann, M. S. Swenson, and T.
Warnow. 2014. ASTRAL: genome- scale coalescent- based species tree esti-
mation. Bioinformatics 30: i541–548.
Nguyen, L.-T., H. A. Schmidt, A. Von Haeseler, and B. Q. Minh. 2015. IQ- TREE:
a fast and eective stochastic algorithm for estimating maximum- likelihood
phylogenies. Molecular Biology and Evolution 32: 268–274.
Nilsson, R. H., M. Ryberg, E. Kristiansson, K. Abarenkov, K.-H. Larsson, and U.
Kõljalg. 2006. Taxonomic reliability of DNA sequences in public sequence
databases: a fungal perspective. PLoS One 1: e59.
Normile, D. 2017. Plant scientists plan massive eort to sequence 10,000 genomes.
Panahiazar, M., A. P. Sheth, A. Ranabahu, R. A. Vos, and J. Leebens-Mack.
2013. Advancing data reuse in phyloinformatics using an ontology- driven
Semantic Web approach. BMC Medical Genomics 6: S5.
Pennisi, E. 2017. Biologists propose to sequence the DNA of all life on Earth.
Peters, R. S., L. Krogmann, C. Mayer, A. Donath, S. Gunkel, K. Meusemann, A.
Kozlov, et al. 2017. Evolutionary history of the hymenoptera. Current Biology
Piel, W., L. Chan, M. Dominus, J. Ruan, R. Vos, and V. Tannen. 2009. TreeBASE
v. 2: a database of phylogenetic knowledge.
PPG, I. 2016. A community- derived classication for extant lycophytes and
ferns. Journal of Systematics and Evolution 54: 563–603.
Proença, V., L. J. Martin, H. M. Pereira, M. Fernandez, L. McRae, J. Belnap, M.
Böhm, et al. 2017. Global biodiversity monitoring: from data sources to es-
sential biodiversity variables. Biological Conservation 213: 256–263.
RBG Kew. 2016. e state of the world’s plants report 2016. Royal Botanic
Gardens, Kew, Richmond, Surrey, UK. Available at https://stateoheworld-
RBG Kew. 2017. e state of the world’s plants report 2017. Royal Botanic
Gardens, Kew, Richmond, Surrey, UK. Available at https://stateoheworld-
Redelings, B. D., and M. T. Holder. 2017. A supertree pipeline for summarizing
phylogenetic and taxonomic information for millions of species. PeerJ 5: e3058.
Rees, J. A., and K. Cranston. 2017. Automated assembly of a reference taxonomy
for phylogenetic data synthesis. Biodiversity Data Journal 5: e12581.
Rosindell, J., and L. J. Harmon. 2012. OneZoom: a fractal explorer for the tree of
life. PLoS Biology 10: e1001406.
Ruhfel, B. R., M. A. Gitzendanner, P. S. Soltis, D. E. Soltis, and J. G. Burleigh.
2014. From algae to angiosperms- inferring the phylogeny of green plants
(Viridiplantae) from 360 plastid genomes. BMC Evolutionary Biology 14: 23.
Rulik, B., J. Eberle, L. Von Der Mark, J. ormann, M. Jung, F. Köhler, W.
Apfel, et al. 2017. Using taxonomic consistency with semi- automated data
pre- processing for high quality DNA barcodes. Methods in Ecology and
Evolution 8: 1878–1887.
Sanderson, M. J., M. M. McMahon, A. Stamatakis, D. J. Zwickl, and M. Steel.
2015. Impacts of terraces on phylogenetic inference. Systematic Biology 64:
Shen, X.-X., C. T. Hittinger, and A. Rokas. 2017. Contentious relationships in phy-
logenomic studies can be driven by a handful of genes. Nature Ecology and
Evolution 1: 126.
Smith, S. A., and J. W. Brown. 2018. Constructing a comprehensive seed plant
phylogeny. American Journal of Botany 105 (in press).
Solís-Lemus, C., and C. Ané. 2016. Inferring phylogenetic networks with maxi-
mum pseudolikelihood under incomplete lineage sorting. PLoS Genetics 12:
Stoltzfus, A., B. O’Meara, J. Whitacre, R. Mounce, E. L. Gillespie, S. Kumar, D. F.
Rosauer, and R. A. Vos. 2012. Sharing and re- use of phylogenetic trees (and
associated data) to facilitate synthesis. BMC Research Notes 5: 574.
Stoltzfus, A., H. Lapp, N. Matasci, H. Deus, B. Sidlauskas, C. M. Zmasek, G. Vaidya,
et al. 2013. Phylotastic! Making tree- of- life knowledge accessible, reusable
and convenient. BMC Bioinformatics 14: 158.
Strauss, S. Y., C. O. Webb, and N. Salamin. 2006. Exotic taxa less related to native
species are more invasive. Proceedings of the National Academy of Sciences,
USA 103: 5841–5845.
Sun, M., D. E. Soltis, P. S. Soltis, X. Zhu, J. G. Burleigh, and Z. Chen. 2015. Deep
phylogenetic incongruence in the angiosperm clade Rosidae. Molecular
Phylogenetics and Evolution 83: 156–166.
Vachaspati, P., and T. Warnow. 2017. FastRFS: fast and accurate Robinson–Foulds
Supertrees using constrained exact optimization. Bioinformatics 33: 631–639.
Van De Peer, Y., E. Mizrachi, and K. Marchal. 2017. e evolutionary signi-
cance of polyploidy. Nature Reviews Genetics 18: 411–424.
Walker, J. F., Y. Yang, T. Feng, A. Timoneda, J. Mikenas, V. Hutchinson, C.
Edwards, et al. 2018. From cacti to carnivores: improved phylotranscrip-
tomic sampling and hierarchical homology inference provide further insight
to the evolution of Caryophyllales. American Journal of Botany 105.
Webb, C. O., and M. J. Donoghue. 2005. Phylomatic: tree assembly for applied
phylogenetics. Molecular Ecology Notes 5: 181–183.
Weitemier, K., S. C. K. Straub, R. C. Cronn, M. Fishbein, R. Schmickl, A.
McDonnell, and A. Liston. 2014. Hyb- Seq: Combining target enrichment
and genome skimming for plant phylogenomics. Applications in Plant
Sciences 2: 1400042.
Wickett, N. J., S. Mirarab, N. Nguyen, T. Warnow, E. Carpenter, N. Matasci, S.
Ayyampalayam, et al. 2014. Phylotranscriptomic analysis of the origin and
early diversication of land plants. Proceedings of the National Academy of
Sciences, USA 111: E4859–E4868.
Wood, T. E., N. Takebayashi, M. S. Barker, I. Mayrose, P. B. Greenspoon, and L.
H. Rieseberg. 2009. e frequency of polyploid speciation in vascular plants.
Proceedings of the National Academy of Sciences, USA 106: 13875–13879.
Yang, Y., and S. A. Smith. 2014. Orthology inference in nonmodel organisms using
transcriptomes and low- coverage genomes: improving accuracy and matrix
occupancy for phylogenomics. Molecular Biology and Evolution 31: 3081–3092.
Yu, Y., J. Dong, K. J. Liu, and L. Nakhleh. 2014. Maximum likelihood inference
of reticulate evolutionary histories. Proceedings of the National Academy of
Sciences, USA 111: 16448–16453.
Yu, Y., and L. Nakhleh. 2015. A maximum pseudo- likelihood approach for phy-
logenetic networks. BMC Genomics 16(supplement 10): S10.
Zanne, A. E., D. C. Tank, W. K. Cornwell, J. M. Eastman, S. A. Smith, R. G.
Fitzjohn, D. J. McGlinn, et al. 2014. ree keys to the radiation of angio-
sperms into freezing environments. Nature 506: 89–92.
Zhang, C., E. Sayyari, and S. Mirarab. 2017. ASTRAL-III: increased scalability
and impacts of contracting low support branches. In J. Meidanis, and L.
Nakleh [eds.], Comparative genomics, RECOMB-CG 2017. Lecture Notes
in Computer Science, vol. 10562, 53–75. Springer, Cham, Switzerland.