Available via license: CC BY 4.0
Content may be subject to copyright.
1
Interrogating genomic-scale data to resolve recalcitrant nodes in the
Spider Tree of Life
Siddharth Kulkarni,1,2,* Robert J. Kallal,2 Hannah Wood,2 Dimitar Dimitrov,3 Gonzalo Giribet,4
and Gustavo Hormiga1
1Department of Biological Sciences, The George Washington University, Washington, D.C.
2Department of Entomology, National Museum of Natural History, Washington, D.C.
3Department of Natural History, University Museum of Bergen, University of Bergen, Bergen,
Norway
4Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology,
Harvard University, Cambridge, MA
*Corresponding author: E-mail: sskspider@gwmail.gwu.edu
©TheAuthor(s)2020.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution.
ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense
(http://creativecommons.org/licenses/by/4.0/),whichpermitsunrestrictedreuse,distribution,andreproductioninanymedium,providedthe
originalworkisproperlycited.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
2
Abstract
Genome-scale data sets are converging on robust, stable phylogenetic hypotheses for many
lineages; however, some nodes have shown disagreement across classes of data. We use
spiders (Araneae) as a system to identify the causes of incongruence in phylogenetic signal
between three classes of data: exons (as in phylotranscriptomics), non-coding regions (included
in ultraconserved elements [UCE] analyses), and a combination of both (as in UCE analyses).
Gene orthologs, coded as amino acids and nucleotides (with and without third codon positions),
were generated by querying published transcriptomes for UCEs, recovering 1,931 UCE loci
(codingUCEs). We expected that congeners represented in the codingUCE and UCEs data
would form clades in the presence of phylogenetic signal. Non-coding regions derived from UCE
sequences were recovered to test the stability of relationships. Phylogenetic relationships
resulting from all analyses were largely congruent. All nucleotide data sets from transcriptomes,
UCEs, or a combination of both recovered similar topologies in contrast with results from
transcriptomes analyzed as amino acids. Most relationships inferred from low occupancy data
sets, containing several hundreds of loci, were congruent across Araneae, as opposed to high
occupancy data matrices with fewer loci, which showed more variation. Furthermore, we found
that low occupancy data sets analyzed as nucleotides (as is typical of UCE data sets) can result
in more congruent relationships than high occupancy data sets analyzed as amino acids (as in
phylotranscriptomics). Thus, omitting data, through amino acid translation or via retention of
only high occupancy loci, may have a deleterious effect in phylogenetic reconstruction.
Key words: Araneae, non-coding regions, phylogeny, target-capture, transcriptomics
Introduction
Massive parallel sequencing and the exponential increase in the size of data sets have enabled
researchers to use a variety of genomic data types (whole genomes, transcribed gene regions,
introns, fast/slow evolving loci, etc.) to address specific evolutionary questions. These data sets
have rapidly dwarfed Sanger sequencing-based studies in terms of amounts of data (Mardis
2011), however, they have proven to be challenging to analyze. Once celebrated as the gold
standard for inferring evolutionary histories (Rokas et al. 2003; Gee 2003), it is now clear that
sheer quantity of data will not unequivocally resolve all problematic nodes in a phylogeny.
Conflicting but highly supported phylogenetic relationships have emerged in many data sets,
even when containing hundreds or thousands of loci.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
3
Furthermore, the objective quantification of branch support is obfuscated by widespread
reliance on the bootstrap support metric (in a maximum likelihood framework), among a few
others like posterior probability in a Bayesian framework. Bootstrap values are often inflated
when comparable numbers of sites indicate conflicting relationships for a given branch
(Felsenstein 1985). Such conflicts are common among large-scale data sets and therefore
bootstrap values are generally high. This conundrum has impacted phylogenetic studies of
many groups of organisms, including birds (Jarvis et al. 2014; Prum et al. 2015; Walker et al.
2018; Cloutier et al. 2019), placental mammals (Morgan et al. 2013; Romiguier et al. 2013),
extant angiosperms (Zanis et al. 2002; Wickett et al. 2014; Xi et al. 2014) and arachnids (e.g.,
Sharma et al. 2014; Ballesteros and Sharma 2019; Lozano-Fernández et al. 2019). In the
present study, we focus on the nature of the systematic conflict (with high bootstrap support for
alternative hypotheses) across genomic data sets addressing a yet to be satisfactorily resolved
problem in spider phylogenetics.
In recent studies on the spider tree of life, phylogenies resulting from the analysis of
either transcriptomes or ultraconserved elements (UCEs) have largely converged on similar
topologies (e.g., Garrison et al. 2016; Fernández et al. 2018; Kulkarni et al. 2020; Dimitrov &
Hormiga 2021; Kallal et al. in press). However, incongruence persists in some recalcitrant
nodes, receiving high support for contradicting hypotheses. Some of these incongruences, in
the context of spider systematics, include: (a) the placement of the RTA Clade (a group of
spiders characterized by the presence of a retrolateral tibial apophysis in the male palp–the
appendage that male spiders use for copulation) with respect to the “UDOH grade” (an
assemblage containing the spider families Uloboridae, Deinopidae, Oecobiidae and Hersiliidae);
(b) the placement of Nicodamoidea with respect to Araneoidea (the ecribellate orb weavers);
and, (c) the interfamilial relationships of the miniature orb weaving families–a group informally
known as “symphytognathoids.” The “symphytognathoids” (Griswold et al. 1998) include the
families Anapidae, Mysmenidae, Theridiosomatidae, and Symphytognathidae (which includes
smallest adult spider in the world, Patu digua; Forster and Platnick 1977). Few studies have
found support for the monophyly of “symphytognathoids”, and a particular study suggests that
Synaphridae also belongs to this group (Lopardo et al. 2011). Here, we focus on the
relationships of the “symphytognathoid” families as a major area of conflict in the spider tree of
life by comparing a diversity of approaches and data classes and their effects on this particular
topology.
The monophyly of “symphytognathoid” families has been supported, although not
formalized as a taxon, by morphological and behavioral characters (Griswold et al. 1998; Schutt
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
4
2003; Lopardo and Hormiga 2008; Lopardo et al. 2011; Hormiga and Griswold 2014), but these
families have appeared as either paraphyletic or polyphyletic in molecular phylogenies based on
standard Sanger markers (Dimitrov et al. 2017; Wheeler et al. 2017) or transcriptomes
(Fernández et al. 2018; Kallal et al. in press). Lopardo et al.’s (2011) extensive Sanger-based
data set supported “symphytognathoid” monophyly only when the nucleotide data were
analyzed in combination with phenotypic data. Recently, an analysis using target enrichment
methods to capture ultraconserved elements (UCEs) provided the first molecular support for the
monophyly of “symphytognathoids” (ultrafast bootstrap >95), although only with the analyzed
low occupancy data sets (Kulkarni et al. 2020). This result was surprising, given the lack of
support for symphytognathoid monophyly in all prior molecular analyses, including
phylogenomic data sets analyzed as amino acid data in a maximum likelihood framework (Kallal
et al. in press). In that study, the parsimony analysis of the amino acid data set recovered
Theridiosomatidae as the sister group of Araneidae, with the remaining “symphytognathoids”
forming a monophyletic group (Kallal et al.in press).
The paradox of highly supported but incongruent relationships requires a critical
assessment of the nature of the data being analyzed, in our case, in the context of the high
bootstrap support for both, the monophyly or polyphyly of “symphytognathoids” in different
analyses. The phylogenetic relationships of the miniature orb-weavers offer an excellent system
to explore the nature of conflict between these two types of genomic data sets. One possible
approach, albeit unexplored up to this point, is to identify the phylogenetic signal common to
transcriptomic and UCE data sets. Transcriptomes, which are sequenced from mRNA, are often
analyzed as amino acids, and include only exonic regions. UCEs on the other hand are
sequenced from the genome and are typically analyzed as nucleotides, and include both exons
and non-coding regions. The possibility of combining the vast data sets of UCEs and
transcriptomes would enable not only an expanded taxon sampling, but also allow reconciliation
of the existing UCE and transcriptome data sets (e.g., Bossert et al. 2019). Furthermore,
because a recent study has shown that currently sequenced UCEs in Arachnida are mostly
exonic (Hedin et al. 2019) it should be possible to combine UCEs and transcriptomes in a
meaningful manner (Bossert et al. 2019; Hedin et al. 2019).
The present study aims to identify the causes of incongruence amongst transcriptome-
based and UCE-based sequences in phylogenetic analyses of spiders by leveraging data from
recent studies (e.g., Garrison et al. 2016; Fernández et al. 2018; Kulkarni et al. 2020; Kallal et
al. in press). Our approach was to reconstruct phylogenies using sequences from
transcriptomes, UCEs, and a combination of data sources, at both the amino acid and
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
5
nucleotide level. We then analyzed these data sets using different phylogenetic methods at
different occupancy levels, while also exploring the phylogenetic signal of non-coding regions,
something rarely attempted in this kind of phylogenetic analyses.
First, we hypothesize that transcriptomes contain ultraconserved regions. On targeting
these coding ultraconserved regions using the Spider2Kv1 probe set (Kulkarni et al. 2020), we
reconstruct a phylogeny to resolve a number of selected recalcitrant nodes. The efficacy of the
transcriptome-derived UCEs for resolving phylogenetic relationships is tested by adding multiple
congeneric or confamilial taxa that represent coding UCEs, UCEs from previous studies and
UCEs obtained from genomes. We hypothesize that analyzing data as amino acids versus
nucleotides can influence the inferred phylogenetic relationships. To test this, we reconstruct
and compare phylogenies using nucleotide and amino acid data sets from sequences derived
from both transcriptomes and ultraconserved regions of the genome. We found that nucleotide
data sets converge on a similar topology – including the recovery of the symphytognathoid
representatives as a clade – while amino acid data sets did not. This outcome suggests that
reducing the number of characters included in nucleotide data sets via translation to amino
acids is detrimental to the topological stability of phylogenetic inference.
Results and Discussion
Statistics for all analyzed data sets are listed in Supplementary Table 1. A few clarifications are
provided here.
CodingUCEs
With the current taxon sample, 2,019 loci were obtained (before occupancy filtering), out of
which 1,931 UCEs were recovered from the transcriptomes analyzed in Fernández et al. (2018).
This means that the transcriptomic analysis of Fernández et al. (2018) contained at least 1,931
coding UCE regions, out of the 2,021 possible UCEs targeted by the spider probe set of
Kulkarni et al. (2020) (95.5%), making both data sets nearly identical in gene composition, and
thus straightforward to combine. The number of UCEs recovered from individual transcriptomes
(i.e. taxon-wise) ranged between 62–897 (µ=436.18) (Supplementary Table 2). Two taxa out of
a total of six non-spider outgroup taxa, Phrynus marginemaculatus and Limulus polyphemus,
yielded too few UCE loci, so they were omitted from the final data set.
AllUCEs
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
6
This data set included a combination of the taxon sample of UCEs recovered from the
transcriptomes (Fernández et al. 2018) and UCEs (Kulkarni et al. 2020). Three ingroup species
(Amaurobius ferox, Deinopis longipes and Nesticus cooperi) were removed from the AllUCEs50
data set because they did not have any locus represented in the final alignment. This data set
(AllUCEs50), with only 21 loci, resulted in a phylogeny in which many families were polyphyletic
and thus we have excluded this tree topology (see Supplementary trees) from our further
analyses and discussion.
Non-coding
Six terminals (Bothriurus keyserlingi, Centruroides sculpturatus, Sofanapis antillanca, Euryopis
sp., Nesticus gertschi and Chediminae sp.) were likewise removed from the phylogenetic
analyses because they were represented by very few (less than 30) non-coding regions.
Efficiency of the spider probes in capturing codingUCEs
Out of 248 taxa in the AllUCEs data set, 40 genera had multiple representatives obtained from
transcriptomes or UCEs. Although the UCE sequences were mapped to the spider probe set,
their library preparations were enriched with either the same (Kulkarni et al. 2020) or the
Arachnida probe set of Starrett et al. (2017) and Wood et al. (2018). All such genera were
monophyletic, except Segestria (Segestriidae) and Novanapis (Anapidae), which were
paraphyletic.
Phylogenetic relationships
The AllUCEs data sets had the highest taxon representation of all data sets, including 88 out of
120 known spider families (World Spider Catalog 2020). Topology tests were conducted
between different occupancies of the AllUCEs set. AllUCEs25 was significantly rejected
(Supplementary Table 3) and thus we base our discussion mainly on the AllUCEs10 data set
(Figure 1, Supplementary Figure 2) and highlight relevant aspects of other topologies briefly
below, except for non-coding regions which are discussed in a separate section. The nodal
support values –SH-aLRT and ultrafast bootstrap (UFBoot) replicates are respectively
mentioned in parentheses for each relationship. For gene and site concordance factors, refer to
Figure 1 and Supplementary figure 2.
All data sets (except non-coding) included a unanimously strong UFBoot support (>95%)
for the major Araneae lineages such as Mesothelae, Opisthothelae, Mygalomorphae and
Araneomorphae (Figures 1, 2, Supplementary Table 2). Within Araneomorphae, conflicting
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
7
relationships were recovered within the family Leptonetidae and the relationships among the
UDOH families, and with Araneoidea and the RTA Clade (Figure 1, 2, Supplementary Table 2,
see Suppl. trees). To briefly describe these conflicts, the UDOH families formed a clade with
AllAAUCEs, but constituted a grade in the analyses of all other data sets. Araneoidea was
recovered as the sister group to Nicodamoidea plus Eresidae in the analyses of all the data sets
except AllUCEs10 and its amino acid data sets (Figure 2). The placement of the long
Senoculidae branch varied across analyses from nesting within the RTA Clade to a sister group
to the Araneae branch. This recalcitrance may be indicative of a poor sequence quality.
Phylogenomic data as amino acids vs. nucleotides
Phylogenies resulting from the transcriptome data analyzed as amino acids (Fernández et al.
2018; Figure 3A of this study) and as nucleotide sequences (nucT67 data set, Figure 3B) at an
occupancy of 67% were congruent at many nodes. Notable differences were found among the
UDOH families and in the internal arrangement of Araneoidea. Although Deinpoidae was sister
group to the RTA Clade in both trees, Hersiliidae was either the sister group of Oecobiidae
(amino acid data) or the sister group to Oecobiidae plus Uloboridae (nucleotide data; Figure 3).
Within Araneoidea, Theridiidae plus Anapidae formed a clade sister group to all remaining
araneoid families with amino acid data, however with nucleotides, Theridiiidae was the sister
group of a clade that included all the remaining araneoid families. This latter placement is
consistently recovered with all other data sets (see supplementary files).
In recently published phylogenomic analyses using amino acid data (Fernández et al.
2018; Michalik et al. 2019), Leptonetidae was recovered as monophyletic with all the amino acid
data sets, that is the AAUCE, AllAAUCE and also in Fernández et al. (2018), but the family was
paraphyletic with the nucleotide data sets (Figure 2, 3 and Supplementary figure 3). This is
notable given that Archoleptoneta species are cribellate while all other leptonetids, including
other archoleptonetines (namely, Darkoneta), are ecribellate (Ledford and Griswold 2010). A
recent UCE study (analyzed as nucleotides) using a dense sample of leptonetids also recovered
diphyly with Archoleptonetinae separate from Leptonetinae (Ramírez et al. 2020).
The linyphioids (Linyphiidae and Pimoidae) were monophyletic with nucT data sets
(>95% UFBoot), codingUCEs (>95% UFBoot) and AAUCEs10 (<95% UFBoot), however other
data sets obtained paraphyly of linyphioids, but the pertinent nodes were poorly supported. The
monophyly of linyphioids has been supported with morphology (Hormiga 1994, 2008; Hormiga
and Tu 2008), six standard Sanger markers (Arnedo et al. 2009; Wheeler et al. 2016; Dimitrov
et al. 2017) and transcriptomes (Fernández et al. 2018).
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
8
Gnaphosidae was paraphyletic in both Fernández et al. (2018) (Supplementary figure
3A) and the current study (Figure1, 3B, Supplementary figure 3B). In the current study,
Lamponidae nested within Gnaphosidae whereas in Fernández et al. (2018), Trachelidae,
Liocranidae and Lamponidae nested within Gnaphosidae. Optimized taxon sampling in this part
of the tree would be required to stabilize these relationships.
Removal of third codon positions
Including third codon positions in phylogenetic analyses may influence inferred relationships
due to saturation of synonymous nucleotide substitutions and rate heterogeneity, therefore
explaining differences between analyzing data as amino acids and nucleotides, and thus some
authors recommend exclusion of saturated third codon positions (e.g., Breinholt and Kawahara
2013; O’Connor et al. 2014). In our study, the trees resulting from the analyses with
(codingUCEs and nucT data sets) and without (3RcodingUCEs and 3RnucT data sets) third
codon positions were congruent at most nodes. The differences were as follows: the
3RcodingUCEs10 data set yielded Eresidae as the sister group of Uloboridae whereas in all the
other data sets with the third codon positions removed, Eresidae was sister group to
Nicodamoidea and the 3RcodingUCEs50 data set yielded a paraphyletic Palpimanoidea.
Non-coding regions
All spider families were monophyletic with good support (>95% UFBoot), however most
interfamilial relationships and deeper nodes received poor support (see Supplementary trees).
Many groups that were corroborated with all other data sets were recovered differently when
non-coding regions were analyzed alone. For example, mygalomorphs were the sister group of
a paraphyletic Synspermiata that included Hypochilidae, and the austrochiloids were nested
within Palpimanoidea and polyphyletic UDOH families (Figure 2). These unusual relationships
could be an artifact due to the overall small amount of data included in this data set; a similar
pattern was also observed when analyzing high occupancy (>70%) coding region data sets
(Supplementary file). The high variability in sizes of non-coding regions between distantly
related taxa also requires an evaluation of the potential effect of alignment schemes on resulting
relationships. Analyzing them together with exons, as in AllUCEs, could be a useful strategy
since the conserved coding regions may alleviate the effects of alignment procedures. The use
of appended exonic regions to align non-coding regions needs further exploration. HybPiper
recovers non-exonic regions which may also include intergenic regions in addition to non-coding
regions, which are difficult to parse.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
9
Monophyly of the miniature orb weavers
The “symphytognathoids” were monophyletic in the trees resulting from the analyses of the
codingUCEs, AAUCEs, AllUCEs, AllAAUCEs and nucT, except AAUCEs50 and nucT67 which
recovered Theridiosomatidae as sister group to Araneidae while the remaining
“symphytognathoids” formed a clade. In the AllUCEs tree, this clade included the families
Anapidae, Mysmenidae, Symphytognathidae, Synaphridae and Theridiosomatidae (100/100
UFBoot/SH-aLRT for the whole clade), while the codingUCEs included all these families except
Symphytognathidae (not sampled). The family Synaphridae was sister group to Mysmenidae in
AllUCEs (100/100%), whereas it was sister group to Anapidae in codingUCEs phylogenies.
Only 2.29% of loci (~24 loci) and 29.5% of sites (~68,655 sites) support the monophyly of
“symphytognathoids” in the AllUCEs10 data set (Figure 1), meaning that the remaining sites and
loci support alternative relationships in lower fractions. In the trees resulting from the analyses
of the other data sets, AllUCEs, AllAAUCEs, codingUCEs and nucT, Theridiosomatidae was the
sister group of the remaining “symphytognathoids” with two exceptions of high occupancies, as
mentioned above (AAUCEs50 and nucT67). The AllAAUCEs recovered Theridiosomatidae as
sister group to Synaphridae plus Mysmenidae and this clade was sister group to
Symphytognathidae plus Anapidae (see supplementary files). The removal of third codon
positions from the transcriptomes analysed as nucleotides (3RnucT data sets) supported
“symphytognathoid” monophyly at occupancies of 10, 25 and 50%, whereas at 67% occupancy,
Theridiosomatidae was the sister group of Araneidae and the other “symphytognathoid” families
formed a clade. The removal of third codon positions from UCEs derived from transcriptomes
(3RcodingUCEs data sets) rendered the “symphytognathoid” families polyphyletic (Table 2;
Supplementary trees).
The inclusion of Synaphridae within “symphytognathoids” had been suggested before
(Lopardo and Hormiga 2008; Lopardo et al. 2011), although these studies were cautious about
such placement due to the absence of Cyatholipidae representatives in their analyses.
Fernández et al. (2018) found Synaphridae to be the sister group of the linyphioid clade.
Because Kulkarni et al. (2020) did not include any synaphrid, its position using strictly UCE data
could not be tested. We included a synaphrid exemplar, Cepheia longiseta (from Fernández et
al. 2018), and our results corroborate the placement of Synaphridae within the
“symphytognathoid” clade.
The monophyly of “symphytognathoids'' is supported by several morphological
synapomorphies (Lopardo et al. 2011). While morphology and UCEs support the monophyly of
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
10
“symphytognathoids”, six-gene Sanger-based data and sequences from transcriptomes
analyzed as amino acids do not support “symphytognathoid” monophyly (Lopardo et al. 2011;
Dimitrov et al. 2012, 2017; Wheeler et al. 2017; Fernández et al. 2018; Kallal et al. in press).
Unstable and conflicting “symphytognathoid” familial relationships hinder addressing questions
about the evolution of their unique diversity of web architectures, transformations in female
pedipalps (reduction and loss) and transformations of their respiratory systems. For example,
although referred to as miniature “orb weavers”, anapid web architecture is quite variable as
they are known to build typical orb webs and their modifications, sheet webs or, theridiid-like
cobwebs. Most mysmenids build spherical or planar orbs, symphytognathids build a two-
dimensional horizontal orb web, at least some synaphrids build sheet or irregular webs, and
theridiosomatids build orb webs, some of them highly modified (e.g., sticky lines connected to
water surface) (Coddington and Valerio 1980; Eberhard 1987; Rix and Harvey 2010; Lopardo et
al. 2011). In each of these “symphytognathoid” families (except Synaphridae), there is at least
one genus with a kleptoparasitic lifestyle accompanied by loss of the foraging web in all its
constituent species. Adult anapid females have either reduced segments in the pedipalp, a
knob-like protuberance, or have lost the palp entirely, like their putative sister family
Symphytognathidae. Female pedipalps in the remaining “symphytognathoid” families bear all
the segments, like all other spiders.
Our results and those from Kulkarni et al. (2020) indicate that “symphytognathoids” are
monophyletic when analyzed as nucleotide data and when about a hundred or more loci are
available. There is also a clear tradeoff between occupancy and phylogenetic signal. Low
occupancy data matrices contain more missing data than high occupancy data sets, and
missing data can influence the outcome of phylogenetic analyses, both topologically and in
branch lengths (Lemmon et al. 2009). In the case of “symphytognathoids”, a high occupancy
data set of 70% with 433 loci (“500Spid_70” data set of Kulkarni et al. 2020) also supported
“symphytognathoid” monophyly, suggesting that miniature orb weaving spiders are indeed a
lineage.
Unstable nodes in the Spider Tree of Life
The phylogenetic relationships of the UDOH group of families relative to the RTA Clade and the
interfamilial relationships of Araneoidea vary across analytical conditions, depending on the type
(coding or coding plus non-coding) and amounts of data. For example, in the case of
Araneoidea, coding data (codingUCE, AAUCE, nucT) exclusively recover this clade as sister
group to Nicodamoidea plus Eresidae. However, when combined with non-exonic data,
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
11
Araneoidea is sister group to a clade consisting of Nicodamoidea plus Eresidae, the RTA Clade
and the UDOH families–with the exception of the AllUCEs25 data set. The UDOH grade
consists of Uloboridae, Deinopidae, Oecobiidae and Hersiliidae, of which the first two families
are the only cribellate orb weaving groups, while all remaining orb weaving spider families are
ecribellate and placed within Araneoidea. On the other hand, exploration of molecular data
across a variety of analytical treatments has shown that many nodes in the spider tree of life are
stable across different occupancies. For example, the sister group relationship of Nicodamoidea
and Eresidae, the Hypochilidae plus Filistatidae clade, the monophyly of Synspermiata, and the
“symphytognathoid” clade are all robust hypotheses.
Nodal support values
Overall, we found that the gene concordance and site concordance factor values were
correlated (Supplementary figure 1a, c). The UFBoot was 100% for most nodes and the SH-
aLRT was mostly above 85% (Figure 1, Supplementary figure 2). Both concordance factors
were above 50% for congeneric taxa (Figure 1), meaning that more than 50% of the sites and
loci support the monophyly of those genera. Gene and site concordance values ranged between
1 and 95%. These values were generally >50% for congeneric taxa and were lower between
families and deeper nodes (Figure 1). Several alternative placements, including that of
leptonetids, nicodamoids with respect to Araneoidea and the UDOH families, had high UFBoot
within our trees (see Suppl. files) and also compared to the trees of Fernández et al. (2018).
Occupancy and missing data
Our results show that high occupancy data sets may yield unstable relationships due to the
small number of genes often represented in such data sets (Figure 2, Supplementary Table 2,
Supplementary trees). A similar phenomenon of unusual relationships at high occupancies was
observed in phylogenetic analyses of spider transcriptomes (Kallal et al. in press). Low
occupancy data sets contain larger amounts of data but also contain larger amounts of missing
data. An increase in the proportion of missing data is known to increase the risk of systematic
error (Roure et al. 2013). However, recent empirical studies with genome scale data have
shown that excluding genes with high amounts of missing data may weaken the resolution and
consistency of the resulting tree (Prasanna et al. 2020). Chan et al. (2020) found that different
data classes such as UCEs, exons and introns contain different phylogenetic signal; however,
an unfiltered combination (low occupancy) of such data converged on a similar topology. One
study suggests that if by allowing more missing data, taxon and gene sampling can be
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
12
improved, the lower occupancy matrices should be preferred (Streicher et al. 2016). In addition,
allowing missing data may allow to detect gene gains/losses specific to certain lineages. Such
information may be lost in high occupancy data sets due to the exclusion of genes present in
some clade versus sequencing failures. CAT+Γ models may alleviate systematic error (Roure et
al. 2013) but this was not tested in the present study. Evaluation of model adequacy (Ripplinger
and Sullivan 2010; Duchêne et al. 2018) may be a potential next step to further improve the
phylogenetic inference of the evolutionary history of spiders, but our goal here was to evaluate
for the first time the use of amino acids versus DNA.
Conclusions
We have used spiders (Araneae) as a study system to address incongruence among different
classes of genomic data in phylogenetic analyses. We scrutinized sequence data from different
sources (i.e., mRNA and DNA) and analyzed the protein coding regions either as amino acids or
as nucleotides, with and without third codon positions; we also analyzed non-coding regions. All
data sets, except the non-coding data, converged upon a similar pattern of phylogenetic
relationships, which was also similar to the trees derived from low occupancy matrices resulting
from the analysis of UCEs from genomic data (Kulkarni et al. 2020). It is clear that lower
amounts of data either due to amino acid translation, increasing matrix occupancy or both, can
cause topological conflicts at some nodes in the spider tree of life and with the sequencing
strategies employed here. Although a threshold cannot be established as to how much data are
optimal to resolve such topological conflicts, at least 500 loci seem necessary, based on our
results. Our results suggest that using nucleotide data and/or low occupancies to analyze
thousands of loci may prove to be a better strategy for studying higher level phylogenetic
relationships than using amino acids and high occupancies which would yield a much smaller
data set.
Conflicting results are more difficult to interpret when mutually exclusive alternative
relationships are highly supported, particularly when using bootstrapping as a measure of
support on large data sets. Hence, alternative branch support measures that are
computationally tractable for genome-scale data sets, like concordance factors, need to be
further explored.
In the interest of spider systematics, we demonstrate that phylogenetic incongruences
can be reduced by analyzing genome-scale nucleotide data sets, especially at low occupancies.
Some of the contentious hypotheses, such as the phylogeny of “symphytognathoids”, were
impacted by the data class, composition and taxon sampling used. We recovered a congruent
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
13
support for their monophyly across a range of low occupancy data sets. This robustly supported
hypothesis on the phylogenetic relationships of the miniature orb weaving families will provide
an opportunity to unravel the evolutionary history of foraging webs.
Materials and Methods
Taxon sampling
The ultraconserved sequences (UCEs) for this study were obtained from a series of studies
focusing on arachnids, including Starrett et al. (2017), Wood et al. (2018) and Kulkarni et al.
(2020). Transcriptomes were obtained from Bond et al. (2014), Fernández et al. (2014, 2018),
Garrison et al. (2016), Sharma et al. (2014), and Zhao et al. (2014). Ultraconserved loci were
also retrieved from publicly available spider genomes of Latrodectus hesperus (Theridiidae; i5k
Consortium 2013), Loxosceles reclusa (Sicariidae; i5k Consortium 2013), Trichonephila clavipes
(Araneidae; Babb et al. 2017), Parasteatoda tepidariorum (Theridiidae; Schwager et al. 2017)
and Stegodyphus mimosarum (Eresidae; Sanggaard et al. 2014). Outgroups include the
horseshoe crab Limulus polyphemus and Tachypleus tridentatus (Xiphosura); the scorpions
Bothriurus keyserlingi, Centruroides sculpturatus, Chaerilus celebensis and Pandinus imperator
(Scorpiones); the whip-spiders Damon variegatus, Damon sp. and Phrynus marginemaculatus
(Amblypygi); the vinegaroon Mastigoproctus giganteus (Uropygi) and the short-tailed whip-
scorpion Stenochrus portoricensis (Schizomida). The analysis was rooted using Xiphosura
since it is the only member outside Arachnopulmonata, irrespective of whether we follow the
traditional hypothesis of Xiphosura being an outgroup to Arachnida (e.g., Lozano-Fernández et
al. 2019), or the alternative hypothesis placing them within Arachnida (see Ballesteros and
Sharma 2019; Ballesteros et al. 2019).
Transcriptome Assembly
Raw sequences were corrected for read errors using Rcorrector (Song and Florea 2015). Low
quality reads and adapters were trimmed with Trim Galore! 0.2.6
(www.bioinformatics.babraham.ac.uk) by setting the quality parameter to 30 and a phred cut-off
to 33; reads shorter than 25 bp were discarded. Ribosomal RNA was filtered using the default
settings in Bowtie 2.9.9 (Langmead and Salzberg 2012). De novo strand-specific assemblies
were generated using Trinity 2.0.6 (Grabherr et al. 2011; Haas et al. 2013) with a path
reinforcement set to 75. Redundancy reduction was done using CD-HIT-EST (Fu et al. 2012)
with 95% global similarity. Assemblies were completed using the Colonial One High
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
14
Performance Computing Cluster at The George Washington University and the Smithsonian
Institution High Performance Cluster at the Smithsonian Institution. Unlike in previous
phylotranscriptomic analyses of spiders (Bond et al. 2014; Fernández et al. 2014, 2018;
Garrison et al. 2016; Sharma et al. 2014; Zhao et al. 2014), the final DNA sequences were not
translated to amino acids.
Recovering UCEs from Transcriptomes
The FASTA files of transcriptomes resulting from CD-HIT-EST were converted to 2bit format
using faToTwoBit, (Kent et al. 2002). Then, in the PHYLUCE environment (publicly available at
https://phyluce.readthedocs.io/en/latest/tutorial-three.html), we created a temporary relational
database to summarize probe to assembly match using:
phyluce_probe_run_multiple_lastzs_sqlite function on the 2bit files.
The ultraconserved loci were recovered by the
phyluce_probe_slice_sequence_from_genomes command. The resulting FASTA files were
treated as contigs and used to match the reads to the Spider2Kv1 probes.
Analyzing UCEs as amino acids
The nucleotide reads from UCE and transcriptome contigs were assembled, aligned, trimmed
and processed to obtain selected loci with taxon occupancies of 10, 25 and 50 percentages
using PHYLUCE. All locus files in nexus format were converted to fasta form and translated to
amino acids using seqmagick (https://seqmagick.readthedocs.io/en/latest/). These translated
UCE loci were concatenated using HybPiper (Johnson et al. 2016).
Analyzing transcriptomes as nucleotides
The FASTA files of transcriptomes resulting from CD-HIT-EST were translated to amino acids
using Transdecoder (Haas et al. 2013). Orthologs were recovered from the peptide reads using
BUSCO (Simão et al. 2015). Nucleotide data with ortholog indices and gene files were obtained
using NOrthGen (https://github.com/sskspider/NOrthGen; Supplementary figure 4). Gene files
were aligned using MAFFT v7 (Katoh and Standley 2013) and trimmed using trimAl v1.2
(Capella-Gutiérrez et al. 2009). All orthologs were concatenated using the HybPiper (Johnson et
al. 2016). Third codon positions were removed using rmThirdCodon
(https://github.com/iamciera/rmThirdCodon).
Obtaining non-coding regions
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
15
Non-coding regions were extracted from the raw UCE sequence files obtained from Starrett et
al. (2017), Wood et al. (2018) and Kulkarni et al. (2020). A target file database of exons was
compiled using UCEs extracted from the transcriptomes of Damon variegatus, Loxosceles
deserta, Nicodamidae sp.,Trichonephila clavipes, Hebestatis theveneti, Palpimanus gibbulus,
Kukulcania hibernalis, Stegodyphus mimosarum, Liphistius malayanus, Anahita punctulata and
Megahexura fulva from Fernández et al. (2018) and the genome of Parasteatoda tepidariorum
(Schwager et al. 2017). These taxa were chosen to represent Araneae-wide samples and their
closest relatives used as outgroups. HybPiper (Johnson et al. 2016) was run on the raw UCE
sequence files and matched against the target file. After exon matching was completed, we
used the retriever pipeline to extract the non-coding sequences from the raw UCE sequences.
Small sequences below 50 bp (taken as an arbitrary threshold) were deleted and the remaining
non-coding sequences were aligned using MAFFT v7 (Katoh and Standley 2013) and
concatenated using HybPiper (Johnson et al. 2016).
Phylogenomic analyses
The ultraconserved loci recovered from the transcriptomes are referred to as codingUCEs in the
following text. We built eight data sets (Supplementary Table 2), as follows. All data sets (Figure
5) were analyzed at different occupancies, for a total of 15 different analyses (Supplementary
Table 2):
1. codingUCEs data set: The ultraconserved elements recovered from transcriptomes and
analyzed as nucleotide sequences with all codon positions at occupancies of 10, 25 and 50
percentages. This data set contains only exons that are ultraconserved.
2. AAUCEs data set: Sequences from codingUCEs, above, were translated to amino acids and
analyzed at occupancies of 10, 25 and 50 percentages.
3. AllUCEs data set: The codingUCEs data set was combined with the UCEs from taxa included
in Kulkarni et al. (2020) analyzed at occupancies of 10, 25 and 50 percentages. This data set of
ultraconserved elements contains both exons as well as non-coding regions.
4. AllAAUCEs data set: The amino acid sequences for the taxon sampling similar to AllUCEs
data sets analyzed at occupancies of 10, 25 and 50 percentages. This data set contains only
exons that are ultraconserved.
5. nucT data set: Transcriptomes analyzed as nucleotides with all codon positions at
occupancies of 10, 25 and 50 and 67 percentages. This data set contains only exons that may
or may not be ultraconserved.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
16
6. non-coding regions data set: Non-coding regions obtained from the UCE data set of Kulkarni
et al. (2020).
7. 3RcodingUCEs data set: Third codon removed from the codingUCEs data set.
8. 3RnucT data set: Third codon removed from the nucT data set.
Contigs from all DNA sequences were matched to the Spider2Kv1 probe set (Kulkarni et al.
2020) at minimum coverage and minimum identity of 65 each. Phylogenetic analyses were
performed on the unpartitioned, concatenation of loci using IQ-TREE v.1.6.9 (Nguyen et al.
2015). Model selection was allowed for each data set using the TEST function of ModelFinder in
IQ-TREE (Kalyaanamoorthy et al. 2018; Hoang et al. 2018).
Nodal support was estimated via 1000 UFBoot replicates (Hoang et al. 2018) and
Shimodaira-Hasegawa-like approximate likelihood ratio test (SH-aLRT) (Guindon et al. 2010).
To reduce the risk of overestimating branch support with UFBoot due to model violations, we
appended the command -bnni. With this command, the UFBoot optimizes each bootstrap tree
using a hill-climbing nearest neighbor interchange (NNI) search based on the corresponding
bootstrap alignment (Hoang et al. 2018). We used concordance factors, a metric focusing on
whether the best tree represents the signal well, as implemented in IQ-TREE v1.7-betaX (Minh
et al. 2018). Gene concordance factor (gCF) indicates the percentage of gene trees containing
a given branch in the maximum likelihood tree and site concordance factor (sCF) indicates the
percentage of decisive alignment sites supporting a branch (Minh et al. 2018) and it provides
insights into incomplete lineage sorting which may be a cause for discordance between the sites
and the resulting trees (Zhang et al. 2019). We mapped the gCF against sCF with respect to
UFBoot and the SH-aLRT using R version 3.6.0 (R Core Team 2019).
We chose our preferred tree to guide the discussion of the results by conducting
topology tests, namely, approximately unbiased (AU), bootstrap proportion (BP), SH-aLRT,
Kishino-Hasegawa (KH), and expected likelihood weight (ELW) using 10,000 Resampling
estimated log-likelihoods (RELL) in IQ-TREE among the AllUCEs data set.
Supplementary Material
Supplementary data are available in the online version of this study.
Data availability
Sequences from the data sets of Fernández et al. (2018) and Kulkarni et al. (2020) were
analyzed in this study. No new data were generated in support of this research. The scripts for
NOrthGen are available at https://github.com/sskspider/NOrthGen
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
17
Acknowledgements
All analyses were conducted on the Colonial One High Performance Computing Facility at The
George Washington University. SK was supported by a Weintraub Fellowship and by the Harlan
Research Fund. This study was supported by the US National Science Foundation grants (DEB
1457300, 1457539) to GH and GG and through multiple Putnam Expedition Grants from the
Museum of Comparative Zoology to GG. Additional support was provided by US National
Science Foundation grants DEB 1754289, 1754278, and DEB 1754262 to GH, GG and Sarah
Boyer. The authors are grateful to Silas Bossert, Amey Uchgaonkar, Nicolas Hazzi, Ligia
Benavides and Rosa Fernández for discussions. Authors would like to thank Martín J. Ramírez
and two anonymous reviewers as well as the Editors for their time and effort reviewing this
manuscript.
Author Contributions
All authors contributed to designing the study and writing the manuscript. S.K. and R.J.K.
conducted the analyses.
References
Arnedo MA, Hormiga G, Scharff N. 2009. Higher-level phylogenetics of linyphiid spiders
(Araneae, Linyphiidae) based on morphological and molecular evidence. Cladistics 25:231–262.
Babb PL, Lahens NF, Correa-Garhwal SM, Nicholson DN, Kim EJ, Hogenesch JB, et al. 2017.
The Nephila clavipes genome highlights the diversity of spider silk genes and their complex
expression. Nat Genet. 49:895–903.
Babraham Bioinfomatics. Trim Galore!
http://www.bioinformatics.babraham.ac.uk/projects/trim_galore. Accessed in January 2020.
Ballesteros JA, Sharma PP. 2019. A critical appraisal of the placement of Xiphosura
(Chelicerata) with account of known sources of phylogenetic error. Syst Biol. 68:896–917.
https://doi.org/10.1093/sysbio/syz011.
Bond JE, Garrison NL, Hamilton CA, Godwin RL, Hedin M, Agnarsson I. 2014. Phylogenomics
resolves a spider backbone phylogeny and rejects a prevailing paradigm for orb web evolution.
Curr Biol. 24:1765–1771. https://doi.org/10.1016/j.cub.2014.06.034
Bossert S, Danforth BN. 2018. On the universality of target-enrichment baits for phylogenomic
research. Methods Ecol Evol. 9:1453–1460. https://doi.org/10.1111/2041- 210X.12988
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
18
Bossert S, Murray EA, Almeida EAB, Brady SG, Blaimer BB, Danforth BN. 2019. Combining
transcriptomes and ultraconserved elements to illuminate the phylogeny of Apidae. Mol
Phylogenet Evol. 130:121–131. https://doi.org/10.1016/j.ympev.2018.10.012
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL,
Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg
N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. 2019. Embracing
heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 7:e6399.
https://doi.org/10.7717/peerj.6399
Breinholt JW, Kawahara AY. 2013. Phylotranscriptomics: saturated third codon positions
radically influence the estimation of trees based on next-gen data. Genome Biol Evol. 5:2082–
2092. https://doi.org/10.1093/gbe/evt157
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973.
https://doi.org/10.1093/bioinformatics/btp348
Chan KO, Hutter CR, Wood PL Jr, Grismer LL, Brown RM. 2019. Larger, unfiltered datasets are
more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities
in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol. 151:106899.
https://doi.org/10.1016/j.ympev.2020.106899
Cloutier A, Sackton TB, Grayson P, Clamp M, Baker AJ, Edwards SV. 2019. Whole-genome
analyses resolve the phylogeny of flightless birds (Palaeognathae) in the presence of an
empirical anomaly zone. Syst Biol.68:937–955. https://doi.org/10.1093/sysbio/syz019
Coddington J, Valerio C. 1980. Observations on the web and behavior of Wendilgarda spiders
(Araneae: Theridiosomatidae). Psyche 87:93–105. https://doi.org/10.1155/1980/69153
Dimitrov D, Hormiga G. 2021. Spider Diversification Through Space and Time. Annu. Rev.
Entomol. 66:1 https://doi.org/10.1146/annurev-ento-061520-083414
Dimitrov D, Benavides LR, Arnedo MA, Giribet G, Griswold CE, Scharff N, Hormiga G. 2017.
Rounding up the usual suspects: a standard target-gene approach for resolving the interfamilial
phylogenetic relationships of ecribellate orb-weaving spiders with a new family- rank
classification (Araneae, Araneoidea). Cladistics 33:221–250. https://doi.org/10.1111/cla.12165
Duchêne DA, Sebastian D, Ho SYW. 2018. Differences in performance among test statistics for
assessing phylogenomic model adequacy. Genome Biol Evol. 10:375–1388.
https://doi.org/10.1093/gbe/evy094
Dunn CW, Giribet G, Edgecombe GD, Hejnol A. 2014. Animal phylogeny and its evolutionary
implications. Annu Rev Ecol Evol Syst. 45:371–395. https://doi.org/10.1146/annurev-ecolsys-
120213-091627
Eberhard WG. 1987. Web-building behavior of anapid, symphytognathid, and mysmenid spi-
ders. J Arachnol. 14:339–358.
Felsenstein J. 1985. Confidence limits on phylogenies: An approach using the bootstrap.
Evolution 39:783–791. https://doi.org/10.1111/j.1558-5646.1985.tb00420.x
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
19
Fernández R, Hormiga G, Giribet G. 2014. Phylogenomic analysis of spiders reveals
nonmonophyly of orb weavers. Curr Biol. 24:1772–1777.
https://doi.org/10.1016/j.cub.2014.06.03
Fernández R, Kallal RJ, Dimitrov D, Ballesteros JA, Arnedo M, Giribet G, Hormiga G. 2018.
Phylogenomics, diversification dynamics, and comparative transcriptomics across the spider
tree of life. Curr Biol. 28:1489–1497.e5 http://doi.org/10.1016/j.cub.2018.03.064
Forster RR, Platnick NI. 1977. A review of the spider family Symphytognathidae (Arachnida,
Araneae). Am Mus Novit. 2619:1–29.
Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation
sequencing data. Bioinformatics 28:3150–3152
Garb JE, Haney R, Schwager E, Gregorič M, Kuntner M, Agnarsson I, Blackledge T. 2019. The
transcriptome of Darwin’s bark spider silk glands predicts proteins contributing to dragline silk
toughness. Commun Biol. 2:275. https://doi.org/10.1038/s42003-019-0496-1
Garrison NL, Rodriguez J, Agnarsson I, Coddington JA, Griswold CE, Hamilton CA, Hedin M,
Kocot KM, Ledford JM, Bond JE. 2016. Spider phylogenomics: untangling the Spider Tree of
Life. PeerJ 4:e1719. https://doi.org/10.1016/j.cub.2018.03.064
Gee H. 2003. Ending incongruence. Nature 425:782. https://doi.org/10.1038/425782a
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson Da, Amit I, Adiconis X, Fan L,
Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA-Seq data
without a reference genome. Nat Biotechnol. 29:644–652. https://doi.org/10.1038/nbt.1883
Griswold CE, Coddington JA, Hormiga G, Scharff N. 1998. Phylogeny of the orb-web building
spiders (Araneae, Orbiculariae: Deinopoidea, Araneoidea). Zool J Linnean Soc. 123:1–99.
https://doi.org/10.1111/j.1096-3642.1998.tb01290.x
Guindon S, Dufayard J, Lefort V, Anisimova M, Hordijk W, Gascuel O. 2010. New algorithms
and methods to estimate maximum-likelihood phylogenies: Assessing the performance of
PhyML 3.0. Syst Biol. 59:307–321. https://doi.org/10.1093/sysbio/syq010
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. 2013. De novo
transcript sequence reconstruction from RNA-seq using the Trinity platform for reference
generation and analysis. Nat Protoc. 8:1494–1512. https://doi.org/10.1038/nprot.2013.084
Hedin M, Derkarabetian S, Alfaro A., Ramírez MJ, Bond, JE. 2019. Phylogenomic analysis and
revised classification of atypoid mygalomorph spiders (Araneae, Mygalomorphae), with notes on
arachnid ultraconserved element loci. PeerJ 7:e6864. http://doi.org/10.7717/peerj.6864
Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018. UFBoot2: Improving the
ultrafast bootstrap approximation. Mol Biol Evol. 35:518–522.
https://doi.org/10.1093/molbev/msx281
Hormiga G. 1994. Cladistics and the comparative morphology of linyphiid spiders and their
relatives (Araneae, Araneoidea, Linyphiidae). Zool J Linnean Soc. 111:1–71.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
20
Hormiga G. 2008. On the spider genus Weintrauboa (Araneae, Pimoidae), with a description of
a new species from China and comments on its phylogenetic relationships. Zootaxa 1814:1–20.
Hormiga G, Tu L. 2008. On Putaoa, a new genus of the spider family Pimoidae (Araneae) from
southern China, with a cladistic test of its monophyly and phylogenetic placement. Zootaxa
1792:1–21.
Hormiga G, Griswold CE. 2014. Systematics, phylogeny and evolution of orb-weaving spiders.
Annu Rev Entomol. 59:487–512.
i5K Consortium. 2013. The i5K Initiative: Advancing arthropod genomics for knowledge, human
health, agriculture, and the environment. J Hered. 104:595–600.
https://doi.org/10.1093/jhered/est050
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B,
Howard JT et al. 2014. Whole-genome analyses resolve early branches in the tree of life of
modern birds. Science 346:1320–1331.
Johnson MG, Gardner EM, Liu Y, Medina R, Goffinet B, Shaw AJ, Zerega NJC, Wickett NJ.
2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high‐throughput
sequencing reads using target enrichment. Appl Plant Sci. 4:1600016.
https://doi.org/10.3732/apps.1600016
Kallal R, Kulkarni S, Dimitrov D, Benavides LR, Arnedo M, Giribet G, Hormiga G. (in press)
Converging on the orb: denser taxon sampling elucidates spider phylogeny and new analytical
methods support repeated evolution of the orb web. Cladistics
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast
model selection for accurate phylogenetic estimates. Nat Meth. 14:587–589.
https://doi.org/10.1038/nmeth.4285
Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7:
Improvements in performance and usability. Mol Biol Evol. 30:772–780.
https://doi.org/10.1093/molbev/mst010
Kent WJ. 2002. BLAT—the BLAST‐like alignment tool. Genome Res. 12:656–664.
https://doi.org/10.1101/gr.229202
Kulkarni S, Wood H, Lloyd M, Hormiga G. 2020. Spider‐specific probe set for ultraconserved
elements offers new perspectives on the evolutionary history of spiders (Arachnida, Araneae).
Mol Ecol Resour. 20:185– 203. https://doi.org/10.1111/1755-0998.13099
Langmead B, Salzberg S. 2012. Fast gapped-read alignment with Bowtie 2. Nat Meth. 9:357–
359. https://doi.org/10.1038/nmeth.1923
Lozano-Fernández J, Tanner AR, Giacomelli M, Carton R, Vinther J, Edgecombe GD, Pisani D.
2019. Increasing species sampling in chelicerate genomic-scale data sets provides support for
monophyly of Acari and Arachnida. Nat Commun. 10:2295. https://doi.org/10.1038/s41467-019-
10244-7
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
21
Mardis ER. 2011. A decade’s perspective on DNA sequencing technology. Nature 470:198–
203.
Michalik P, Kallal R, Dederichs TM, Labarque, FM, Hormiga G, Giribet G, Ramírez, MJ. 2019.
Phylogenomics and genital morphology of cave raptor spiders (Araneae, Trogloraptoridae)
reveal an independent origin of a flow‐through female genital system. J Zool Syst Evol Res.
57:737–747. https://doi.org/10.1111/jzs.12315
Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Moriarty LE, Lemmon AR. 2015. A
comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing.
Nature 526:569–573.
Laumer CE, Rosa Fernández, Sarah Lemer, David Combosch, Kevin M. Kocot, Ana Riesgo,
Sónia C. S. Andrade, Wolfgang Sterrer, Martin V. Sørensen and Gonzalo Giribet 2019Revisiting
metazoan phylogeny with genomic sampling of all phyla. Proc R Soc B 286:20190831.
https://doi.org/10.1098/rspb.2019.0831
Ledford JM, Griswold CE. 2010. A study of the subfamily Archoleptonetinae (Araneae,
Leptonetidae) with a review of the morphology and relationships for the Leptonetidae. Zootaxa
2391:1–32.
Lemmon AR, Emme SA, Lemmon EM. 2012. Anchored hybrid enrichment for massively high-
throughput phylogenomics. Syst Biol. 61:727–744. https://doi.org/10.1093/sysbio/sys049
Lemmon AR, Brown JB, Stanger-Hall K, Lemmon EM. 2009. The effect of ambiguous data on
phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst Biol.
58:130–145.
Lopardo L, Hormiga G. 2008. Phylogenetic placement of the Tasmanian spider Acrobleps
hygrophilus (Araneae, Anapidae) with comments on the evolution of the capture web in
Araneoidea. Cladistics 24:1–33.
Lopardo L, Giribet G, Hormiga G. 2011. Morphology to the rescue: molecular data and the
signal of morphological characters in combined phylogenetic analyses—a case study from
mysmenid spiders (Araneae, Mysmenidae), with comments on the evolution of web
architecture. Cladistics 27:278–330.
Minh BQ, Hahn M, Lanfear R. 2020. New methods to calculate concordance factors for
phylogenomic datasets. Mol Biol Evol. msaa106 https://doi.org/10.1093/molbev/msaa106
Morgan CC, Foster PG, Webb AE, Pisani D, McInerney JO, O’Connell MJ. 2013.
Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol.
30:2145–2156.
O'Connor DL, Runions A, Sluis A, Bragg J, Vogel JP, Prusinkiewicz P, et al. 2014. A division in
PIN-mediated auxin patterning during organ initiation in grasses. PLoS Comput Biol. 10:
e1003447. https://doi.org/10.1371/journal.pcbi.1003447
Prasanna A, Gerber D, Kijpornyongpan T, Catherine Aime M, Doyle V, Nagy LG. 2020. Model
choice, missing data, and taxon sampling impact phylogenomic inference of deep
Basidiomycota relationships. Syst Biol. 69:17–37. https://doi.org/10.1093/sysbio/syz029
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
22
Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, Lemmon AR. 2015. A
comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing.
Nature 526:569–573.
R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ramírez MJ, Magalhaes ILF, Derkarabetian S, Ledford J, Griswold CE, Wood HW, Hedin M.
2020. Sequence-capture phylogenomics of true spiders reveals convergent evolution of
respiratory systems. Syst Biol. https://doi.org/10.1093/sysbio/syaa043
Ripplinger J, Sullivan J. 2010. Assessment of substitution model adequacy using frequentist and
Bayesian methods. Mol Biol Evol. 2712:2790–2803.
Rix M, Harvey M. 2010. The spider family Micropholcommatidae (Arachnida, Araneae,
Araneoidea): a relimitation and revision at the generic level. ZooKeys 36:1–321.
Rokas A, Williams B, King N, Carroll SB. 2003. Genome-scale approaches to resolving
incongruence in molecular phylogenies. Nature 425, 798–804.
https://doi.org/10.1038/nature02053
Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. 2013. Less is more in mammalian
phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental
mammals. Mol Biol Evol. 30:2134–2144.
Roure B, Baurain D, Philippe H. 2013. Impact of missing data on phylogenies inferred from
empirical phylogenomic data sets, Mol Biol Evol. 30:197–214.
https://doi.org/10.1093/molbev/mss208
Sanggaard KW, Bechsgaard JS, Fang X, Duan J, Dyrlund TF, Gupta V, Jiang X, Cheng L, Fan
D, Feng Y. 2014. Spider genomes provide insight into composition and evolution of venom and
silk. Nat Comm. 5:3765. http://doi.org/10.1038/ncomms4765
Schütt K. 2003. Phylogeny of Symphytognathidae s.l. (Araneae, Araneoidea). Zool Scr. 32:129–
151.
Schwager EE, Sharma PP, Clarke T, Leite DJ, Wierschin T, Pechmann M, Akiyama-Oda Y,
Esposito L, Bechsgaard J, Bilde T et al. 2017. The house spider genome reveals an ancient
whole-genome duplication during arachnid evolution. BMC Biol. 15:62.
https://doi.org/10.1186/s12915-017-0399-x
Sharma PP, Kaluziak S, Pérez-Porro AR, González VL, Hormiga G, Wheeler WC, Giribet G.
2014. Phylogenomic interrogation of Arachnida reveals systemic conflicts in phylogenetic signal.
Mol Biol Evol. 31:2963–2984. https://doi.org/10.1093/molbev/msu235.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM 2015. BUSCO:
assessing genome assembly and annotation completeness with single-copy orthologs.
Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351
Song L, Florea L. 2005. Rcorrector: efficient and accurate error correction for Illumina RNA-seq
reads. GigaScience 4:s13742–015–0089–y. https://doi.org/10.1186/s13742-015-0089-y
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
23
Starrett J, Derkarabetian S, Hedin M, Bryson RW, McCormack JE, Faircloth BC. 2017. High
phylogenetic utility of an ultraconserved element probe set designed for Arachnida. Mol Ecol
Res. 17:812–823. https://doi.org/10.1111/ 1755-0998.12621
Streicher JW, Schulte JA, Wiens JJ. 2016. How should genes and taxa be sampled for
phylogenomic analyses with missing data? An empirical study in iguanian lizards, Syst Biol.
65:128–145. https://doi.org/10.1093/sysbio/syv058
Walker JF, Brown JW, Smith SA. 2018. Analyzing contentious relationships and outlier genes in
phylogenomics. Syst Biol. 67:916–924. https://doi.org/10.1093/sysbio/syy043
Wheeler WC, Coddington JA, Crowley LM, Dimitrov D, Goloboff PA, Griswold CE, Hormiga G,
Prendini L, Ramírez MJ, Sierwald P, et al. 2017. The spider tree of life: phylogeny of Araneae
based on target‐gene analyses from an extensive taxon sampling. Cladistics 33:574–616.
https://doi.org/10.1111/cla.12182
Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S,
Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, et al. 2014.
Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad
Sci USA 111:E4859–E4868.
Wood HM, González V, Lloyd M, Coddington J, Scharff N. 2018. Next-generation museum
genomics: Phylogenetic relationships among palpimanoid spiders using sequence capture
techniques (Araneae: Palpimanoidea). Mol Phylogenet Evol. 127:907–918.
https://doi.org/10.1016/j.ympev.2018.06.038
World Spider Catalog (2020). World Spider Catalog. Version 20.5. Natural History Museum
Bern, online at http://wsc.nmbe.ch, accessed on 16 January, 2020. https://doi.org/10.24436/2
Xi Z, Liu L, Rest JS, Davis CC. 2014. Coalescent versus concatenation methods and the
placement of amborella as sister to water lilies. Syst Biol. 63:919–932.
Zanis MJ, Soltis DE, Soltis PS, Mathews S, Donoghue MJ. 2002. The root of the angiosperms
revisited. Proc Natl Acad Sci USA 99:6848–6853.
Zhao YJ, Zeng Y, Chen L, Dong Y, Wang W. 2014. Analysis of transcriptomes of three orb-web
spider species reveals gene profiles involved in silk and toxin. Insect Sci. 21:687–698.
https://doi.org/10.1111/1744-7917.12068.
Zhang MY, Williams JL, Lucky A. 2019. Understanding UCEs: a comprehensive primer on using
ultraconserved elements for arthropod phylogenomics. Insect Syst Div. 3:1–12.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
24
Figures
Fig. 1. Maximum likelihood phylogeny of spiders resulting from the AllUCEs10 data set
(occupancy 10, 1,060 loci) collapsed to family level. Paraphyly is indicated by violet bars. A. All
major lineages of spiders at family level except the RTA Clade and Araneoidea; B. RTA Clade;
C. All 17 families of superfamily Araneoidea. The rhombi at the nodes indicate four support
values: Shimodaira-Hasegawa-like approximate likelihood ratio test (left top), ultrafast bootstrap
(right top), gene concordance factor (gCF) (left bottom) and site concordance factor (sCF) (right
bottom). The numbers at the node indicate clades as described. Branch lengths are not to be
scaled. For the original sampled tree, see supplementary figure 2.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
SH-aLRT
UFBoot
gCF
sCF
UFBoot
gCF
sCF
SH-aLRT
1-30
31-60
61-94
95-100
1-25
26-50
50-75
75-100
1-25
26-50
50-75
75-100
1-30
31-60
61-94
95-100
A
Linyphiidae
Cyatholipidae
Pimoidae
Physoglenidae
Nesticidae
Araneidae
Synotaxidae
Tetragnathidae
Arkyidae
Mimetidae
Malkaridae
Anapidae
Symphytognathidae
Mysmenidae
Synaphridae
Theridiosomatidae
Theridiidae
11
12
Mecysmaucheniidae
Huttonidae
Stenochilidae
Archaeidae
C
Palpimanidae
Gradungulidae
Austrochilidae
Leptonetidae
Periegopidae
Drymusidae
Sicariidae
Scytodidae
Ochyroceratidae
Pholcidae
Diguetidae
Orsolobidae
Segestriidae
Dysderidae
Oonopidae
Segestriidae
Caponiidae
Filistatidae
Hypochilidae
Nemesiidae
Theraphosidae
Nemesiidae
Halonoproctidae
Euctenizidae
Idiopidae
Paratropidae
Macrothelidae
Porrhothelidae
Dipluridae
Hexathelidae
Antrodiaetidae
Megahexuridae
Atypidae
Liphistiidae
OUTGROUPS
1
4
2
5
6
7
8
3
B
Lamponidae
Gnaphosidae
Liocranidae
Trachelidae
Clubionidae
Trochanteriidae
Corinnidae
Selenopidae
Miturgidae
Salticidae
Lycosidae
Trechaleidae
Pisauridae
Senoculidae
Oxyopidae
Ctenidae
Thomisidae
Zoropsidae
Homalonychidae
Desidae
Stiphidiidae
Cycloctenidae
Hahniidae
Cybaeidae
Agelenidae
Amaurobiidae
Sparassidae
Zodariidae
Oecobiidae
Hersiliidae
Uloboridae
Deinopidae
Nicodamidae
Megadictynidae
Eresidae
9
UDOH grade
10
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
26
Fig. 2. Maximum likelihood phylogenies of spiders resulting from different data sets at various
occupancies. Each colored box indicates a data set corresponding to Supplementary Table 2.
The first and second rows represent phylogenies resulting from data analyzed as nucleotides
and amino acids respectively, of codingUCEs (outlined red) and AllUCEs (outlined blue).
Fig. 3. Comparison of phylogenetic relationships between A. transcriptomic phylogeny as
published by Fernández et al. (2018) using amino acids, and B. nucT (Fernández et al. 2018,
transcriptome data set analyzed as nucleotides). Both phylogenies were constructed using
occupancy of 67%. The highlighted blue box indicates Araneoidea families.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
27
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
28
Fig. 4. Comparison of interfamilial relationships of Araneoidea. A. AllAAUCEs tree, B. AllUCEs
tree. Occupancy of both phylogenies was 10 %. Coloured branches indicate family relationships
that are congruent in both trees.
Fig. 5. Schematic representation of data classes analyzed in this study in a maximum likelihood
framework. Squares indicate original data sets from Fernández et al. (2018) and Kulkarni et al.
(2020), and circles indicate matrices analyzed in our study. Circles with red outline indicate
amino acid data set, black outline indicates non-coding region data set and the circles with
outline indicate nucleotide data sets. Abbreviation: UCE–Ultraconserved elements.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
29
Supplementary files
Supplementary Fig. 1. Site concordance factors mapped against gene concordance factors
with respect to ultrafast bootstrap (A,C) and Shimodaira-Hasegawa-like approximate likelihood
ratio test (B,D). Figures A and B indicate plots for the AllUCEs10 data set, and C and D indicate
plots for the AllUCEs25 data set.
Supplementary Fig. 2. Maximum likelihood phylogeny of spiders resulting from the AllUCEs10
data set (occupancy 10, 1,060 loci). A. A phylogeny with 248 taxa with taxon names in black
indicating taxa from UCE studies,blue indicating UCEs recovered from Fernandez et al. (2018)
and red indicating UCEs recovered from genomes. B. A summary of tree A. The rhombi at the
nodes indicate four support values: Shimodaira-Hasegawa-like approximate likelihood ratio test
(left top), ultrafast bootstrap (right top), gene concordance factor (gCF) (left bottom) and site
concordance factor (sCF) (right bottom). The numbers at nodes indicate clades as described.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020
30
Supplementary Fig. 3. Comparison of phylogenetic relationships between A. transcriptomes as
published by Fernández et al. (2018), and B. codingUCEs (UCEs retrieved from the Fernández
et al. (2018) transcriptome data set, with occupancy of 10) analyzed as DNA. The highlighted
blue box indicates Araneoidea families. Note that the miniature orb weaving spider families
(“symphytognathoids”)- Anapidae, Mysmenidae, Synaphridae and Theridiosomatidae
(Symphytognathidae not sampled), are polyphyletic in the left tree, but monophyletic in the right
tree.
Supplementary Fig. 4. Schematic representation of the workflow of NOrthGen- Nucleotide
Ortholog Generator modules (https://github.com/sskspider/NOrthGen) used to obtain map
ortholog identifiers from amino acids to nucleotide data. ABC and DEF are taxon name
exemplars. Dotted arrows indicate processes and red arrows indicate mandatory input files for
using NOrthGen.
Supplementary Table 1. List of taxa with their source of data and the number of UCE loci
recovered using the spider probe set. Abbreviations in the column datatype mean as follows: G-
Genome, T- Transcriptomes and UCE- Ultraconserved elements.
Supplementary Table 2. Settings for different data sets used in phylogenetic analyses at
minimum identity 65 and minimum coverage 65 used to match the Spider 2Kv1 probe set
(Kulkarni et al. 2020) to contigs. Appended columns with black cells indicate relationships
written in newick format recovered from the respective data set and the white cells indicate an
alternative relationship.
Supplementary Table 3. Results of the topology tests conducted on the AllUCEs data set.
Supplementary Trees. Phylogenetic relationships obtained for all data sets.
Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa251/5912541 by guest on 08 November 2020