ArticlePDF Available


Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. I present some of the most notable new features and extensions of RAxML, such as, a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX, and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date, 50 page user manual covering all new RAxML options is available. The code is available under GNU GPL at
Vol. 30 no. 9 2014, pages 1312–1313
BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btu033
Phylogenetics Advance Access publication January 21, 2014
RAxML version 8: a tool for phylogenetic analysis and
post-analysis of large phylogenies
Alexandros Stamatakis
Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg and
Department of
Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
Associate Editor: Jonathan Wren
Motivation: Phylogenies are increasingly used in all fields of medical
and biological research. Moreover, because of the next-generation
sequencing revolution, datasets used for conducting phylogenetic
analyses grow at an unprecedented pace. RAxML (Randomized
Axelerated Maximum Likelihood) is a popular program for phylogen-
etic analyses of large datasets under maximum likelihood. Since the
last RAxML paper in 2006, it has been continuously maintained and
extended to accommodate the increasingly growing input datasets
and to serve the needs of the user community.
Results: I present some of the most notable new features and
extensions of RAxML, such as a substantial extension of substitution
models and supported data types, the introduction of SSE3, AVX and
AVX2 vector intrinsics, techniques for reducing the memory require-
ments of the code and a plethora of operations for conducting post-
analyses on sets of trees. In addition, an up-to-date 50-page user
manual covering all new RAxML options is available.
Availability and implementation: The code is available under GNU
GPL at
Supplementary information: Supplementary data are available at
Bioinformatics online.
Received on December 22, 2013; revised and accepted on
January 14, 2014
RAxML (Randomized Axelerated Maximum Likelihood) is a
popular program for phylogenetic analysis of large datasets
under maximum likelihood. Its major strength is a fast maximum
likelihood tree search algorithm that returns trees with good
likelihood scores. Since the last RAxML paper (Stamatakis,
2006), it has been continuously maintained and extended to ac-
commodate the increasingly growing input datasets and to serve
the needs of the user community. In the following, I will present
some of the most notable new features and extensions of RAxML.
2.1 Bootstrapping and support values
RAxML offers four different ways to obtain bootstrap support.
It implements the standard non-parametric bootstrap and also
the so-called rapid bootstrap (Stamatakis et al., 2008), which is a
standard bootstrap search that relies on algorithmic shortcuts
and approximations to speed up the search process.
It also offers an option to calculate the so-called SH-like
support values (Guindon et al., 2010). I recently implemented
a method that allows for computing RELL (Resampling
Estimated Log Likelihoods) bootstrap support as described by
Minh et al. (2013).
Apart from this, RAxML also offers a so-called bootstopping
option (Pattengale et al.,2010).Whenthisoptionisused,
RAxML will automatically determine how many bootstrap rep-
licates are required to obtain stable support values.
2.2 Models and data types
Apart from DNA and protein data, RAxML now also supports
binary, multi-state morphological and RNA secondary structure
data. It can correct for ascertainment bias (Lewis, 2001) for all of
the above data types. This might be useful not only for morpho-
logical data matrices that only contain variable sites but also for
alignments of SNPs.
The number of available protein substitution models has been
significantly extended and comprises a general time reversible
(GTR) model, as well as the computationally more complex
LG4M and LG4X models (Le et al., 2012). RAxML can also
automatically determine the best-scoring protein substitution
Finally, a new option for conducting a maximum likelihood
estimate of the base frequencies has become available.
2.3 Parallel versions
RAxML offers a fine-grain parallelization of the likelihood func-
tion for multi-core systems via the PThreads-based version and a
coarse-grain parallelization of independent tree searches via MPI
(Message Passing Interface). It also supports coarse-grain/fine-
grain parallelism via the hybrid MPI/PThreads version (Pfeiffer
and Stamatakis, 2010).
Note that, for extremely large analyses on supercomputer s,
using the dedicated sister program ExaML [Exascale Maximum
Likelihood (Stamatakis and Aberer, 2013)] is recommended.
2.4 Post-analysis of trees
RAxML offers a plethora of post-analysis functions for sets of
trees. Apart from standard statistical significance tests, it offers
efficient (and partially parallelized) operations for computing
ß The Author 2014. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial
re-use, please contact
Robinson–Foulds distances, as well as extended majority rule,
majority rule and strict consensus trees (Aberer et al., 2010).
Beyond this, it implements a method for identifying the so-
called rogue taxa (Pattengale et al., 2011), and I recently imple-
mented options for calculating the TC (Tree Certainty) and IC
(Internode Certainty) measures as introduced by Salichos and
Rokas (2013).
Finally, there is the new plausibility checker option (Dao et al.,
2013) that allows computing the RF distances between a huge phyl-
ogeny with tens of thousands of taxa and several smaller more
accurate reference phylogenies that contain a strict subset of the
taxa in the huge tree. This option can be used to automatically
assess the quality of huge trees that can not be inspected by eye.
2.5 Analyzing next-generation sequencing data
RAxML offers two algorithms for preparing and analyzing next-
generation sequencing data. A sliding-window approach (unpub-
lished) is available to assess which regions of a gene (e.g. 16S)
exhibit strong and stable phylogenetic signal to support decisions
about which regions to amplify. Apart from that, RAxML also
implements parsimony and maximum likelihood flavors of the
evolutionary placement algorithm [EPA (Berger et al., 2011)]
that places short reads into a given reference phylogeny obtained
from full-length sequences to determine the evolutionary origin
of the reads. It also offers placement support statistics for those
reads by calculating likelihood weights. This option can also be
used to place fossils into a given phylogeny (Berger and
Stamatakis, 2010) or to insert different outgroups into the tree
a posteriori, that is, after the inference of the ingroup phylogeny.
2.6 Vector intrinsics
RAxML uses manually inserted and optimized x86 vector intrin-
sics to accelerate the parsimony and likelihood calculations.
It supports SSE3, AVX and AVX2 (using fused multiply-add
instructions) intrinsics. For a small single-gene DNA alignment
using the model of rate heterogeneity, the unvectorized version
of RAxML requires 111.5 s, the SSE3 version 84.4 s and the
AVX version 66.22 s to complete a simple tree search on an
Intel i7-2620 M core running at 2.70 GHz under Ubuntu Linux.
The differences between AVX and AVX2 are less pronounced
and are typically below 5% run time improvement.
2.7 Saving memory
Because memory shortage is becoming an issue due to the grow-
ing dataset sizes, RAxML implements an option for reducing
memory footprints and potentially run times on large phyloge-
nomic datasets with missing data. The memory savings are pro-
portional to the amount of missing data in the alignment
(Izquierdo-Carrasco et al.,2011)
2.8 Miscellaneous new options
RAxML offers options to conduct fast and more superficial tree
searches on datasets with tens of thousands of taxa. It can also
compute marginal ancestral states and offers an algorithm for
rooting trees. Furthermore, it implements a sequential,
PThreads-parallelized and MPI-parallelized algorithm for com-
puting all quartets or a subset of quartets for a given alignment.
User support is provided via the RAxML Google group
at:¼en#!forum/raxml. The
RAxML source code contains a comprehensive manual and
there is a step-by-step tutorial with some basic commands avail-
able at
on.html. Further resources are available via the RAxML soft-
ware page at
Future work includes the continued maintenance of RAxML,
the adaptation to novel computer architectures and the implemen-
tation of novel models and datatypes, in particular codon models.
The author thank several colleagues for contributing code to
RAxML: Andre J. Aberer, Simon Berger, Alexey Kozlov, Nick
Pattengale, Wayne Pfeiffer, Akifumi S. Tanabe, David Dao and
Charlie Taylor.
Funding: This work was funded by institutional funding provided
by the Heidelberg Institute for Theoretical Studies.
Conflict of Interest: none declared.
Aberer,A.J. et al. (2010) Parallelized phylogenetic post-analysis on multi-core archi-
tectures. J. Comput. Sci., 1, 107–114.
Berger,S.A. and Stamatakis,A. (2010) Accuracy of morphology-based phylogenetic
fossil placement under maximum likelihood. In: International Conference on
Computer Systems and Applications (AICCSA), 2010 IEEE/ACS.IEEE,
New York, USA, pp. 1–9.
Berger,S.A. et al. (2011) Performance, accuracy, and web server for evolutionary
placement of short sequence reads under maximum likelihood. Syst. Biol., 60,
Dao,D. et al. (2013) Automated plausibility analysis of large phyolgenies. Technical
report. Karlsruhe Institute of Technology.
Guindon,S. et al. (2010) New algorithms and methods to estimate maximum-like-
lihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol., 59,
Izquierdo-Carrasco,F. et al. (2011) Algorithms, data structures, and numerics
for likelihood-based phylogenetic inference of huge trees. BMC
Bioinformatics, 12,470.
Le,S.Q. et al. (2012) Modeling protein evolution with several amino acid replace-
ment matrices depending on site rates. Mol. Biol. Evol., 29, 2921–2936.
Lewis,P.O. (2001) A likelihood approach to estimating phylogeny from discrete
morphological character data. Syst. Biol., 50, 913–925.
Minh,B.Q. et al. (2013) Ultrafast approximation for phylogenetic bootstrap. Mol.
Biol Evol., 30, 1188–1195.
Pattengale,N.D. et al. (2010) How many bootstrap replicates are necessary?
J. Comput. Biol., 17, 337–354.
Pattengale,N.D. et al. (2011) Uncovering hidden phylogenetic consensus in large
data sets. IEEE/ACM Trans. Comput. Biol. Bioinforma., 8, 902–911.
Pfeiffer,W. and Stamatakis,A. (2010) Hybrid mpi/pthreads parallelization of the
raxml phylogenetics code. In International Symposium on Parallel &
Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE.
IEEE, New York, USA, pp. 1–8.
Salichos,L. and Rokas,A. (2013) Inferring ancient divergences requires genes with
strong phylogenetic signals. Nature, 497, 327–331.
Stamatakis,A. (2006) Raxml-vi-hpc: maximum likelihood-based phylogenetic ana-
lyses with thousands of taxa and mixed models. Bioinformatics, 22
, 2688–2690.
Stamatakis,A. and Aberer,A. (2013) Novel parallelization schemes for large-scale
likelihood-based phylogenetic inference. In IEEE 27th International Symposium
on Parallel Distributed Processing (IPDPS), 2013. pp. 1195–1204.
Stamatakis,A. et al. (2008) A rapid bootstrap algorithm for the raxml web servers.
Syst. Biol., 57, 758–771.
RAxML version 8

Supplementary resource (1)

... For the two matrices, we used three distinct phylogenetic reconstructions. First, we ran two alternative maximum likelihood searches with identical parameters to find the maximum likelihood trees using RAxML v8 with the GTRGAMMA model (Stamatakis, 2014), and used the autoMRE function available in RAxML v8 (Stamatakis, 2014) to access non-parametric bootstrap support values. Second, we used the Bayesian framework with ExaBayes (Aberer et al., 2014) by running two independent runs (one cold and one heated chain each) for 10 6 generations (burn-in: 25%; thinning: 500) with default parameters. ...
... For the two matrices, we used three distinct phylogenetic reconstructions. First, we ran two alternative maximum likelihood searches with identical parameters to find the maximum likelihood trees using RAxML v8 with the GTRGAMMA model (Stamatakis, 2014), and used the autoMRE function available in RAxML v8 (Stamatakis, 2014) to access non-parametric bootstrap support values. Second, we used the Bayesian framework with ExaBayes (Aberer et al., 2014) by running two independent runs (one cold and one heated chain each) for 10 6 generations (burn-in: 25%; thinning: 500) with default parameters. ...
Full-text available
Tarumania walkerae is a rare fossorial freshwater fish species from the lower Rio Negro, Central Amazonia, composing the monotypic and recently described family Tarumaniidae. The family has been proposed as the sister group of Erythrinidae by both morphological and molecular studies despite distinct arrangements of the superfamily Erythrinoidea within Characiformes. Recent phylogenomic studies and time-calibrated analyses of characoid fishes have not included specimens of Tarumania in their analyses. We obtained genomic data for T. walkerae and constructed a phylogeny based on 1795 nuclear loci with 488,434 characters of ultraconserved elements (UCEs) for 108 terminals including specimens of all 22 characiform families. The phylogeny confirms the placement of Tarumaniidae as sister to Erythrinidae but differs from the morphological hypothesis in the placement of the two latter families as sister to the clade with Hemiodontidae, Cynodontidae, Serrasalmidae, Parodontidae, Anostomidae, Prochilodontidae, Chilodontidae, and Curimatidae. The phylogeny calibrated with five characoid fossils indicates that Erythrinoidea diverged from their relatives during the Late Cretaceous circa 90 Ma (108–72 Ma), and that Tarumania diverged from the most recent common ancestor of Erythrinidae during the Paleogene circa 48 Ma (66–32 Ma). The occurrence of the erythrinoid-like †Tiupampichthys in the Late Cretaceous–Paleogene formations of the El Molino Basin of Bolivia supports our hypothesis for the emergence of the modern Erythrinidae and Tarumaniidae during the Paleogene.
... A phylogenetic tree was constructed using multithreaded RAxML (ver. 8.2.12) (Stamatakis 2014), the PROTGAMMAWAG model, and 100 bootstrap replicates, and visualized by iTOL (Letunic and Bork 2019). The graphical representation of ClcA was depicted by IBS (ver. ...
Full-text available
The filamentous fungus Aspergillus fumigatus is the most important pathogenic fungus among Aspergillus species associated with aspergillosis. A. fumigatus is exposed to diverse environmental stresses in the hosts during infection such as an excess of essential metal copper. To gain further insights into copper homeostasis, we generated an A. fumigatus laboratory evolved strain with increased fitness in copper stress, and identified the mutation in a Zn2-Cys6 type transcription factor clcA. We examined the role of clcA using the evolved and ∆clcA strains. The ∆clcA strain exhibited defective growth on minimal medium, PDA and copper-repleted medium, and defective conidiogenesis and conidial pigmentation. We found that clcA was required for the expressions of genes involved in conidiogenesis, conidial pigmentation, and transporters cdr1B and mfsB related to azole resistance. clcA was dispensable for the virulence in silkworm infection model. We report here that clcA plays an important role in hyphal growth, conidiogenesis, and copper adaptation.
... The core genome alignment was performed with HarvestTools (42), taking as reference the Chinese strain ICDCCJ07001, which is the oldest strain related to GBS reported in the NCBI, and the SNPs were exported for downstream analysis. The phylogeny was inferred by the maximum likelihood method using RaxML v8.0 (43), applying the GTR1G model and 1,000 bootstrap. Removal of recombinant regions was performed using ClonalFrameML v1.11-3-g4f13f23 (44). ...
Full-text available
Campylobacter jejuni infection is considered the most frequent factor associated with Guillain-Barré syndrome (GBS). In 2019, a large outbreak of GBS was detected in Peru, being associated with C. jejuni detected in stool samples from these patients. The aim of this study was to determine the molecular epidemiology of C. jejuni strains (ST-2993) associated with a large GBS outbreak in Peru. In this study, 26 C. jejuni strains belonging to the ST-2293, obtained from 2019 to 2020, were sequenced using Illumina technology. Five low-quality sequences were removed using bioinformatics, and 21 genomes (17 clinical strains and 4 chicken strains) were considered in the phylogenetic analysis and comparative genomics. Phylogenetic reconstruction, including genomes from international databases, showed a connection between Peruvian and Chinese GBS strains, both of them having lipooligosaccharides (LOS) locus genes related to molecular mimicry with gangliosides in peripheral nerves. Also, ST-2993 was detected in Amazon strains recovered many years before the 2019 outbreak, but with no epidemiological connection with GBS. Besides, a close relationship between human and chicken C. jejuni strains indicated chicken as one of the probable reservoirs. Finally, comparative genomics revealed differences between Chinese and Peruvian strains, including the presence of a prophage inserted into the genome. In conclusion, C. jejuni ST-2993 strains recovered from the GBS outbreak are closely related to Peruvian Amazon strains. Moreover, ST-2993 has been circulated in Peru since 2003 in the Peruvian Amazonia, showing the necessity to reinforce the epidemiological surveillance of C. jejuni to improve the prevention and control of future GBS outbreaks.
... ML and BI were run through the CIPRES Science Gateway portal (, accessed on 25 June 2022) using RaxML-HPC2 on XSEDE 8.2.12 (Heidelberg, Germany) [13,14] and MrBayes on XSEDE 3.2.7a (Stockholm, Sweden), respectively [15][16][17]. ...
Full-text available
During our ongoing survey of dematiaceous hyphomycetes associated with dead branches in tropical forests, eight Acrodictys isolates were collected from Hainan, China. Morphology from the cultures and phylogeny based on partial small subunit (SSU), entire internal transcribed spacer regions with intervening 5.8S (ITS), partial large subunit (LSU) of rRNA gene, partial beta-tubulin (tub2), and partial RNA polymerase II second largest subunit (rpb2) genes were employed to identify these isolates. As a result, four new species, namely Acrodictys bawanglingensis sp. nov., A. diaoluoshanensis sp. nov., A. ellisii sp. nov., and A. pigmentosa sp. nov., are introduced. Illustrations and descriptions of the four taxa are provided, along with comparisons with closely related taxa in the genus. For facilitating relative studies, an updated key to all accepted species of this genus is also compiled.
... A VCF format file containing all SNP sites was used as the input file of DNAsp v5.10 (Librado and Rozas 2009) to calculate the nucleotide diversity (Pi). A phylogenetic tree was constructed using the maximum likelihood method in RAxML and multipoint data (Stamatakis 2014). A GTRGAMMA model was set for the ML analysis with 1000 bootstrap replicates. ...
The tapertail anchovy (Coilia nasus) is an economically important species, mainly distributed along the coast of the northwestern Pacific and associated freshwater bodies, including the Qiantang River, the Yangtze River, the Liaohe River, and the Yalu River of China, and further eastward to Korean Peninsula and Ariake Sound of Japan. There have been many studies on population genetics of C. nasus, but those were either focused on a few populations or used limited number of loci. The results are still controversial and the range-wide population structure of C. nasus is not resolved. This study is aimed to estimate genetic differences among populations from Japan, Korea and the major drainages of China using thousands of loci collected by exon-capture method. The reconstructed maximum likelihood tree, Network, and STRUCTURE analysis confirmed that the population from Lake Dongting should be considered as a separate species, C. brachygnathus, whereas the other populations were mixed together except that fish collected from the Shuangtaizi River of Liaohe drainage were grouped with the fish from Dongting. The AMOVA revealed that genetic variation was 6.36% among populations and 93.61% among individuals within population, when C. brachygnathus was excluded from the analysis. Pairwise F ST showed that genetic difference between C. brachygnathus and C. nasus was high (0.0889-0.7350) and difference among the other populations were low (0.0086-0.3127) but significant, suggesting imperfect natal homing of migratory C. nasus. According to our results, an integrated management strategy should be taken jointly by the countries of this region to protect the valuable fisheries resource of C. nasus.
Caves are home to unique and fragile biotas with high levels of endemism. However, little is known about how the biotic colonization of caves has developed over time, especially in caves from middle and low latitudes. Subtropical East Asia holds the world's largest karst landform with numerous ancient caves, which harbor a high diversity of cave-dwelling organisms and are regarded as a biodiversity hotspot. Here, we assess the temporal dynamics of biotic colonization of subtropical East Asian caves through a multi-taxon analysis with representatives of green plants, animals, and fungi. We then investigate the consequences of paleonviromental changes on the colonization dynamics of these caves in combination with reconstructions of vegetation, temperature, and precipitation. We discover that 88% of cave colonization events occurred after the Oligocene-Miocene boundary, and organisms from the surrounding forest were a major source for subtropical East Asian cave biodiversity. Biotic colonization of subtropical East Asian caves during the Neogene was subject to periods of acceleration and decrease, in conjunction with large-scale, seasonal climatic changes and evolution of local forests. This study highlights the long-term evolutionary interaction between surface and cave biotas; our climate-vegetation-relict model proposed for the subtropical East Asian cave biota may help explain the evolutionary origins of other mid-latitude subterranean biotas.
Full-text available
Microbial predators such as choanoflagellates are key players in ocean food webs. Choanoflagellates, which are the closest unicellular relatives of animals, consume bacteria and also exhibit marked biological transitions triggered by bacterial compounds, yet their native microbiomes remain uncharacterized. Here we report the discovery of a ubiquitous, uncultured bacterial lineage we name Candidatus Comchoanobacterales ord. nov., related to the human pathogen Coxiella and physically associated with the uncultured marine choanoflagellate Bicosta minor. We analyse complete ‘Comchoano’ genomes acquired after sorting single Bicosta cells, finding signatures of obligate host-dependence, including reduction of pathways encoding glycolysis, membrane components, amino acids and B-vitamins. Comchoano encode the necessary apparatus to import energy and other compounds from the host, proteins for host-cell associations and a type IV secretion system closest to Coxiella’s that is expressed in Pacific Ocean metatranscriptomes. Interactions between choanoflagellates and their microbiota could reshape the direction of energy and resource flow attributed to microbial predators, adding complexity and nuance to marine food webs.
Three species of Gentiana (Gentiana manshurica kitag., Gentiana scabra bunge., and Gentiana triflora pall.) were the main source for an important traditional Chinese medicine, "Longdan", which was first mentioned in " Shennong materia medica Sutra " 2000 years ago. Until recently, there were very few reports on taxonomic classification of these three traditional medicinal Gentiana species. In the current study, chloroplast genomes of the three Gentiana species were sequenced and the phylogenetic analyses were performed in combination with 31 NCBI downloaded Gentiana species sequences and two species of Swertia as outgroup. Based on the phylogenetic results, a new taxonomic classification for Gentiana was proposed, including 4 independent clades with 6 subdivisions (Group 1–Group 6). All the general features, SSR characteristics and gene composition of Gentiana chloroplast genomes strongly supported such a new classification system for Gentiana, which could lay a theoretical foundation for Gentiana in the molecular evolutionary research. Finally, phylogenetic analyisis also demonstrated that the three examined species from Gentiana could cluster together into one group (Group 6), which was far away from the evolutionary position of the medicinal species, Gentiana rigescens Franch, which was consistent with the traditional classification in traditional medicinal uses and taxonomy.
Full-text available
This paper provides an updated classification of the Kingdom Fungi (including fossil fungi) and fungus-like taxa. Five-hundred and twenty-three (535) notes are provided for newly introduced taxa and for changes that have been made since the previous outline. In the discussion, the latest taxonomic changes in Basidiomycota are provided and the classification of Mycosphaerellales are broadly discussed. Genera listed in Mycosphaerellaceae have been confirmed by DNA sequence analyses, while doubtful genera (DNA sequences being unavailable but traditionally accommodated in Mycosphaerellaceae) are listed in the discussion. Problematic genera in Glomeromycota are also discussed based on phylogenetic results.
Due to the peculiar combination of dental features characteristic for different squaliform families, the position of the Late Cretaceous genera Protoxynotus and Paraphorosoides within squaliform families has long been controversial. In this study, we revise these genera based on previously known fossil teeth and new dental material. The phylogenetic placement of Protoxynotus and Paraphorosoides among other extant and extinct squaliforms is discussed based on morphological characters combined with DNA sequence data for extant species. Our results suggest that Protoxynotus and Paraphorosoides should be included in the Somniosidae and that Paraphorosoides is a junior synonym of Protoxynotus. New dental material from the Campanian of Germany and the Maastrichtian of Austria enabled the description of a new species Protoxynotus mayrmelnhofi sp. nov. In addition, the evolution and origin of the characteristic squaliform tooth morphology are discussed, indicating that the elongated lower jaw teeth with erected cusp and distinct dignathic heterodonty of Protoxynotus represents a novel functional adaptation in its cutting-clutching type dentition among early squaliform sharks. Furthermore, the depositional environment of the tooth bearing horizons allows for an interpretation of the preferred habitat of this extinct dogfish shark, which exclusively occupied shelf environments of the Boreal- and northern Tethyan realms during the Late Cretaceous.
Full-text available
The increasing complexity of software systems coupled with the time-to-market constraints and condensed development budgets all have imposed a real challenge on the future of software development. It becomes a matter of survivability for a business to be able to deliver a highquality and cost-effective software product in a timely manner. This goal can be greatly precluded by the rapid advances in technology as well as the increasing pace of changes in market needs and customer requirements. Such changes while cannot be avoided; their impact on the system development should be alleviated. A system that requires a major redesign effort in order to adapt to new requirements and emerging technologies is considered to be unstable. A stable system, on the other hand, can handle changes to the system with minimal cost by avoiding unnecessarily changes when redesigning the system. However, developing systems that can evolve gracefully to accommodate necessarily changes without inducing unnecessarily cost is still a challenge in software community. The motivation of this workshop is to investigate both theoretical and practical aspects of accomplishing stability in the different levels of software development. In this workshop, we have 11 original contributions that highlight the state-of-the-art and practice in developing stable
Full-text available
To tackle incongruence, the topological conflict between different gene trees, phylogenomic studies couple concatenation with practices such as rogue taxon removal or the use of slowly evolving genes. Phylogenomic analysis of 1,070 orthologues from 23 yeast genomes identified 1,070 distinct gene trees, which were all incongruent with the phylogeny inferred from concatenation. Incongruence severity increased for shorter internodes located deeper in the phylogeny. Notably, whereas most practices had little or negative impact on the yeast phylogeny, the use of genes or internodes with high average internode support significantly improved the robustness of inference. We obtained similar results in analyses of vertebrate and metazoan phylogenomic data sets. These results question the exclusive reliance on concatenation and associated practices, and argue that selecting genes with strong phylogenetic signals and demonstrating the absence of significant incongruence are essential for accurately reconstructing ancient divergences.
Full-text available
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira–Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66–33.3) to 10.2 (range: 1.32–41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at to perform the UFBoot analysis with ML tree inference.
Conference Paper
Full-text available
The capability to conduct Maximum Likelihood based phylogenetic (evolutionary) analyses on datasets that contain both morphological, as well as molecular data partitions with programs such as RAxML, gives rise to new methodological questions. As we demonstrate on 5 real world datasets that comprise morphological as well as DNA data the trees inferred by separately using the morphological or molecular data partitions are highly incongruent. Since in typical current-day phylogenomic alignments, there is significantly more molecular than morphological data available, and hence the final tree shape in a concatenated analysis is dominated by molecular data, the question arises how morphological data can be used within this context. One important application lies in the phylogenetic placement of fossil taxa (for which only morphological data is available) into a fixed, given molecular or otherwise well-established reference tree. By using real and simulated datasets we conduct the first assessment of placement accuracy for fossil taxa under the Maximum Likelihood criterion. We demonstrate that, despite conflicting phylogenetic signals from the morphological and molecular partitions, the Maximum Likelihood criterion is powerful enough to yield accurate fossil placements. Moreover, we develop and make available a new morphological site weight calibration algorithm that yields an average improvement of fossil placement accuracy of 20% on more than 2,500 simulated datasets and of 25% on the 5 real-world datasets that all contain highly conflicting phylogenetic signal.
Conference Paper
Full-text available
Abstract-A hybrid MPI/Pthreads parallelization was implemented in the RAxML phylogenetics code. New MPI code was added to the existing Pthreads production code to exploit parallelism at two algorithmic levels simultaneously: coarse-grained with MPI and fine-grained with Pthreads. This hybrid, multi-grained approach is well suited for current high-performance computers, which typically are clusters of multicore, shared-memory nodes. The hybrid version of RAxML is especially useful for a comprehensive phylogenetic analysis, i.e., execution of many rapid bootstraps followed by a full maximum likelihood search. Multiple multi-core nodes can be used in a single run to speed up the computation and, hence, reduce the turnaround time. The hybrid code also allows more efficient utilization of a given number of processor cores. Moreover, it often returns a better solution than the stand-alone Pthreads code, because additional maximum likelihood searches are conducted in parallel using MPI. The comprehensive analysis algorithm involves four stages, in which coarse-grained parallelism continually decreases from stage to stage. The first three stages speed up well with MPI, while the last stage speeds up only with Pthreads. This leads to a tradeoff in effectiveness between MPI and Pthreads parallelization. The useful number of MPI processes increases with the number of bootstraps performed, but typically is limited to 10 or 20 by the parameters of the algorithm. The optimal number of Pthreads increases with the number of distinct patterns in the columns of the multiple sequence alignment, but is limited to the number of cores per node of the computer being used. For a benchmark problem with 218 taxa, 1,846 patterns, and 100 bootstraps run on the Dash computer at SDSC, the speedup of the hybrid code on 10 nodes (80 cores) was 6.5 compared to the Pthreads-only code on one node (8 cores) and 35 compared to the serial code. This run used 10 MPI processes with 8 Pthreads each. For an- - other problem with 125 taxa, 19,436 patterns, and 100 bootstraps, the speedup on the Triton PDAF computer at SDSC was 38 on two nodes (64 cores) compared to the serial code. This run used 2 MPI processes with 32 Pthreads each.
Conference Paper
The molecular data avalanche generated by novel wet-lab sequencing technologies allows for reconstructing phylogenies (evolutionary trees) using hundreds of complete genomes as input data. Therefore, scalable codes are required to infer trees on these data under likelihood-based models of molecular evolution. We recently introduced a checkpointable and scalable MPI-based code for this purpose called RAxML-Light and are currently using it for several real-world data analysis projects. It turned out that the scalability of RAxML-Light is nonetheless still limited because of the fork-join parallelization approach that is deployed. To this end, we introduce a novel, generally applicable, approach to computing the phylogenetic likelihood in parallel on whole-genome datasets and implement it in ExaML (Exascale Maximum Likelihood). ExaML executes up to 3.2 times faster than RAxML-Light because of the more efficient parallelization and communication scheme, while implementing exactly the same tree search algorithm. Moreover, the new parallelization approach exhibits lower code complexity and a more appropriate structure for implementing fault tolerance with respect to hardware failures.
Conference Paper
This workshop focuses on understanding the implications of accelerators on the architectures and programming environments of future systems. It seeks to ground accelerator research through studies of application kernels or whole applications on such systems, as well as tools and libraries that improve the performance or productivity of applications trying to use these systems. The goal of this workshop is to bring together researchers and practitioners who are involved in application studies for accelerators and other hybrid systems, to learn the opportunities and challenges in future design trends for HPC applications and systems.
This chapter introduces a new approach to assess the plausibility of large phylogenies by computing all pairwise topological Robinson-Foulds (RF) distances of a 55, taxon tree of plants, for instance, and a set containing a large number of substantially smaller reference trees. It first presents a naive and then an effective algorithm for inducing subtrees. The chapter then provides an experimental evaluation of both algorithms using simulated and real data from STBase. The chapter describes a novel method for speeding up the computation of induced subtrees from a given leaf set. The key idea is to root the large tree at an inner node and compute the lowest common ancestor (LCA) of each and every pair of leaves in the leaf set. Finally, the chapter presents a straightforward implementation of the plausibility-check algorithm. The authors have implemented the algorithm in C as part of RAxML.