ArticlePDF Available

MEGA11: Molecular Evolutionary Genetics Analysis Version 11

Authors:

Abstract and Figures

The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses, which will be supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. We have now added a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface (GUI) has been made more responsive and interactive for very big datasets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled GUI and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.
Content may be subject to copyright.
MEGA11: Molecular Evolutionary Genetics Analysis Version 11
Koichiro Tamura,
1,2
Glen Stecher,
3
and Sudhir Kumar *
,3,4,5
1
Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
2
Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
3
Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
4
Department of Biology, Temple University, Philadelphia, PA, USA
5
Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
*Corresponding author: E-mail: s.kumar@temple.edu.
Associate editor: Fabia Ursula Battistuzzi
Abstract
The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and
tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive
tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for
estimating divergence times and confidence intervals are implemented to use probability densities for calibration
constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options
for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor,and
an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary
probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the
autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood
analysis are reduced significantly through reprogramming, and the graphical user interface has been made more re-
sponsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results,
and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11
are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.
Key words: software, phylogenetics, timetrees, tip dating, neutrality.
Introduction
The Molecular Evolutionary Genetics Analysis (MEGA) soft-
ware has continuously grown to meet the need for sophisti-
cated evolutionary analysis to discover organismal and
genome evolutionary patterns and processes. It was first re-
leased in 1993 to offer the statistical methods of molecular
evolution through an interactive interface on the Microsoft
Disk Operating System (MS-DOS) (Kumar et al. 1993). For
more than 25 years, MEGA’s scope and usefulness have grown
through the addition of new methods, tools, and interfaces,
resulting in modern integrated software for comparative se-
quence analysis (Caspermeyer 2018). Initially, MEGA con-
tained distance-based and maximum parsimony methods
for molecular phylogenetic analysis (Kumar et al. 1994). The
data acquisition and integration of major approaches for
aligning sequences were introduced to expand MEGA’s scope
(Kumar et al. 2004). Afterward, the maximum likelihood (ML)
methods and Bayesian methods were added for molecular
evolutionary analyses (Tamura et al. 2011). MEGA now con-
tains methods for selecting the best-fit substitution model(s),
estimating evolutionary distances and divergence times,
reconstructing phylogenies, predicting ancestral sequences,
testing for selection, and diagnosing disease mutations
(Caspermeyer 2018).
With every new version, MEGA has evolved to harness
technological innovations and personal desktops’ computa-
tional power. MEGA’s interface evolved from its initial MS-
DOS character-based format (Kumar et al. 1993)toarich
graphical user interface (GUI) for Microsoft Windows oper-
ating system (Kumar et al. 2001). It was then redesigned to
become activity-driven (Tamura et al. 2011), followed by the
incorporation of web technologies to ensure a consistent use-
and-feel across Microsoft Windows and Linux operating sys-
tems (Kumar et al. 2018)andmacOS(Stecher et al. 2020).
MEGA GUI is now fully cross-platform running natively on
Windows, Linux, and macOS.
MEGA’s computational core (MEGA-CC) has undergone
extensive refactoring, hardening, and expansion over time. It
advanced from 16-bit to 32-bit (Kumar et al. 2001), became
multithreaded and incorporated multicore parallelization for
various calculations (Tamura et al. 2013),andsteppedupto
64-bit architecture (Kumar et al. 2016,2018). MEGA-CC was
released for use as a command-line program to address the
growing need for batch processing of many data sets and
integration into analysis workflows (Kumar et al. 2012;
Brief Communication
ßThe Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.
org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is
properly cited. Open Access
3022 Mol. Biol. Evol. 38(7):3022–3027 doi:10.1093/molbev/msab120 Advance Access publication April 23, 2021
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Stecher et al. 2020). With both 32- and 64-bit versions of
MEGA currently available for use on the command-line and
GUI, MEGA is now a suite of applications that responds to the
variety of computing environments currently used by
researchers in molecular evolution and phylogenetics. Here,
we present key methodological additions and technical
improvements in MEGA that comprise version 11.
Methodological Additions
Expansion of Relaxed-Clock Dating Facilities
Rapid relaxed-clock methods for estimating divergence times
are becoming popular because they are feasible and efficient
for large contemporary sequence alignments (Tao, Tamura,
Kumar, et al. 2020). MEGA6 first added methods and tools for
constructing evolutionary timetrees by implementing the
RelTime method, which does not assume a molecular clock
(Tamura et al. 2012,2013). RelTime is known to perform well
and has been used to build timetrees in hundreds of research
articles (Tao, Tamura, Kumar, et al. 2020). MEGA11 expands
on RelTime dating options by advancing the current imple-
mentation and adding new facilities for node-dating and tip-
dating needed to build timetrees of pathogens, species, and
gene families.
Calibrating the Clock Using Probability Densities on Node-
Constraints
Bayesian relaxed-clock methods have long allowed the use of
statistical probability distributions that capture prior knowl-
edge (or belief) about the true divergence times in clock cal-
ibration constraints on one or more nodes in the phylogeny.
Judicious use of these probability densities can make diver-
gence times more accurate and precise (Tao, Tamura, Mello,
et al. 2020). Researchers can now use such probability densi-
ties for node calibrations in RelTime estimation of divergence
times and confidence intervals (CIs). MEGA implements the
Tao, Tamura, Mello, et al. (2020) approach that estimates CIs
by simultaneously accounting for variance introduced by the
heterogeneity of evolutionary rate among lineages, estimation
of sequence divergence using substitution models, and prob-
ability densities for node-calibration constraints. This method
produces CIs that contain correct times with a high proba-
bility, making them much more suitable for biological hy-
pothesis testing than other rapid methods (Tao, Tamura,
Kumar, et al. 2020;Tao, Tamura, Mello, et al. 2020).
For RelTime analyses in MEGA11, ML and distance-based
approachescanbeusedtobuildatimetreeforagivenphy-
logeny and multiple sequence alignment. One may also use
only a phylogeny with branch lengths, which extends the
usefulness of relaxed-clock methods for phylogenies inferred
from nonmolecular data or statistical methodologies not
available in MEGA. When a phylogeny with branch lengths
is used, the CIs will be narrower because the variance associ-
ated with branch length estimation cannot be generated
without the original data set used to produce the phylogeny
and branch lengths. Nevertheless, these CIs will incorporate
variance introduced due to rate variation among lineages and
clock calibrations’ uncertainty.
A calibration density selector has been added to the Node
Calibration Editor that provides an option to select normal,
lognormal, uniform, or exponential density (fig. 1). The user
can also specify a minimum or a maximum time bound on a
node. The calibration text file format has been extended to
specify density information and use calibration densities in
MEGA-CC. The Node Calibration Editor also includes new
functionality to specify a fixed evolutionary rate or a known
node time to calibrate the molecular clock. Such assumptions
are often used by investigators when independent calibration
information is unknown (Hipsley and Mu¨ller 2014;Tao,
Tamura, Kumar, et al. 2020).
Tip-Dating for Sequences with Sampling Times
MEGA now implements a method to estimate timetrees us-
ing sampling dates for molecular sequences. They are often
used to infer the origin and diversification of pathogens that
generally evolve fast enough to track the evolutionary change
over months and years (Tao, Tamura, Kumar, et al. 2020). Tip-
dating methods are also useful for analyzing ancient molec-
ular sequences. MEGA implements a rapid tip-dating
method, RelTime with Dated Tips (RTDT), that produces
divergencetimesandCIs(Miura et al. 2018). One may use
ML or distance-based approaches for a given phylogeny and
multiple sequence alignment for tip-dating, or a phylogeny
with branch lengths and tip dates can be given as the input.
An enhanced Timetree Wizard system (fig. 2)walkstheuser
through many steps needed to configure tip-dating analyses,
such as loading sequence and tree files, specifying the out-
groups, adding sequence sample times, and selecting the anal-
ysis options. Sequence sampling times can be specified in
multiple ways. MEGA will automatically extract them on-
demand when they are included in the sequence name.
Spatiotemporal information can also be presented in the input
alignment files as meta tags (see description below) or loaded
using specially formatted calibration text files. Once computed,
thetimetreeisdisplayedintheTree Explorer that has been
extensively revamped and updated (fig. 3). It now has many
more formatting tools, including exporting the timetree, indi-
vidual divergence times, and CI estimates in a tabular format.
Detecting Autocorrelation of Evolutionary Rates
MEGA now contains a facility for detecting autocorrelation of
evolutionary rates among branches, which is important for
understanding molecular evolution patterns and useful as a
clock rate prior in Bayesian relaxed-clock analyses. MEGA
implements the CorrTest method developed using machine
learning, which is accurate and computationally efficient (Tao
et al. 2019). The CorrTest implementation in MEGA requires
a phylogeny with sequence alignment (or branch lengths)
and is accessed through an easy-to-use wizard. This test’s final
output is a CorrScore between 0 and 1 and a P-value, where a
high CorrScore and low P-value indicates that branch rates
among lineages are likely correlated.
Calculating Neutral Evolutionary Probabilities
According to the neutral theory of molecular evolution, most
differences in molecular sequences across species are expected
MEGA11 .doi:10.1093/molbev/msab120 MBE
3023
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
to have little to no impact on fitness (Kimura 1983). Therefore,
multispecies sequence alignments have been used to estimate
neutral evolutionary probabilities (EP) of observing alternative
alleles (amino acid residues or nucleotides) in a species, con-
tingent on the given species timetree (Liu et al. 2016). MEGA
implements an advanced option for this Bayesian approach in
which the species timetree containing relative times is com-
puted automatically by using RelTime (Patel and Kumar
2019). Alleles with EP less than 0.05 are nonneutral, whereas
evolutionary permissible (neutral) alleles show much higher
EPs. Disease-associated amino acid variants in human popu-
lations have EP <0.05 and are rarely found in the population
(Liu et al. 2016). Many human adaptive variants in populations
also have low EPs, that is, nonneutral from an evolutionary
perspective, but they show high allele frequencies (Patel et al.
2018). Therefore, one may use EPs to diagnose disease muta-
tions and detect candidate adaptive variants. An EP wizard
system walks the user through the steps required to set up the
analysis. The first sequence in the alignment is used automat-
ically as the focal taxon of interest (one can rearrange sequen-
ces in the Sequence Data Explorer). EP values for all possible
bases (4 for nucleotides and 20 for amino acids) at each po-
sition in the input sequence alignment are reported in a
spreadsheet or text format.
Technological Advances
Although some new user interface elements have already
been mentioned above (figs. 1–3), additional technical advan-
ces in MEGA11 are as follows.
Expanded Group Designations
MEGA has long supported a “group” tag for sequences and
other operational taxonomic units (OTUs). Using the sequence
“group” tags, MEGA offered a group-wise exploration of input
data, selection of data subsets, and computational analyses
(Kumar 2001). Support for two new tags (“population” and
“species”) was added in MEGA7, with the species tags used to
mark duplicate genes in multigene family phylogenies (Kumar
et al. 2016). In MEGA11, sequences can now be tagged to
provide information on the continent, country, city, year,
month, day, and time. This spatiotemporal information can
be used in tip-dating analyses.
In MEGA11, we have made a MEGA-wide change to use
any meta tag to define groups. For example, if one selects the
“Year” meta tag for use as a group, they could estimate av-
erage diversity within and between sequences sampled in
different years (Distance menu). In the Sequence Data
Explorer, one can select/unselect sequences of certain years
FIG.1. Calibration points for MEGA’SRelTime method are chosen in the Node Calibration Editor window (A), accessed via the Timetree Wizard
system (see fig. 2A). The Node Calibration Editor displays the phylogeny where individual node calibrations and probability densities can be chosen
by clicking the calibration button on the top toolbar for the selected node. A dropdown menu (B) with several calibration density types is
displayed. The Node Calibration Editor then prompts the user for required distribution parameters, depending on the distribution selected: normal
distribution (mean and standard deviation), lognormal (offset, mean and standard deviation), exponential (offset and decay parameter), uniform
(min and max) (C).
Tamura et al. .doi:10.1093/molbev/msab120 MBE
3024
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
for phylogenetic analyses. Also, the display of years would be
automatically enabled in the Tree Explorer, and the feature to
collapse sequence clusters will be done by years. Additionally,
sequences can be sorted based on years in all the input data
and result explorer displays. Therefore, a dynamic designation
of groups based on the desired meta tag will enable data
exploration and analysis more efficiently.
Memory Efficient ML Analyses
ML methods are widely used for phylogenetic inference but
place high demands on computer memory, becoming in-
creasingly burdensome for bigger sequence alignments
analyzed these days. In MEGA11, we have now completed
a long-overdue refactoring of ML calculations by adding a
step to identify common site configurations, that is, sites
where all sequences have the same bases as at some other
sites, to utilize computer memory more efficiently. The mem-
ory requirements of Maximum Likelihood and Maximum
parsimony analysis are reduced (approximately) by the factor
of m/Lwhen there are mdistinct site configurations in a
sequence alignment containing Lsites. The memory saving
can be substantial for multigene and genome-scale align-
ments. For example, the memory saving was 660 MB (209
vs. 870 MB) for a sequence alignment of 229 birds with 2,728
sites (Claramunt and Cracraft 2015)and4.5GB(2.3vs.
6.8 GB) for an alignment of 162 mammals with 11,010
sites (Meredith et al. 2011). This memory saving does not
have any detrimental impact on phylogenetic estimates
and computational times because identical site
configurations have the same likelihood value. The total
log-likelihood is simply the sum of site-configuration log-like-
lihoods weighted by their frequencies. However, this upgrade
required refactoring many different parts of MEGA’s calcula-
tion engine, including functions for phylogeny construction
and model selection.
Enhanced GUI for Exploring Large Data Sets
Using a large multiple sequence alignment containing 68,000
genomes and 30,000 bases each, we assessed MEGA GUI’s
responsiveness during input data file reading, execution of
functions in the Sequence Data Explorer,estimationofpair-
wise distances, and building of distance-based phylogenies.
We found the GUI to become intermittently unresponsive
for such large data sets, which are now common due
to resequencing and population sequencing efforts.
Consequently, we have moved all potentially long-running
operations out of the main GUI thread to background
threads in a major overhaul of the source code. Now, large
input data files are read rapidly, and calculations of pairwise
distance matrices, selection tests, and phylogeny construction
for distance-based methods are performed in a background
thread. The Sequence Data Explorer has been reprogrammed
to enable more efficient highlighting of variable sites, and
navigation of the sequence alignment has been improved.
Also added are options to automatically label sites based
on attributes, which annotates sites by providing a one-
character label and then using desired labeled sites to subset
data for any molecular phylogenetic analysis desired.
FIG.2. The Tip Dating Wizard (A) guides the user through the steps required to set up the RTDT analysis. Once a sequence alignment and/or a tree
is provided, the user is prompted to specify the outgroup by selecting a node in the Tree Explorer or specifying outgroup taxa by name (not shown).
Next, sample times are specified using the Tip Dates Editor (B) with facilities for parsing tip dates (C) encoded in taxa names, importing tip dates
from a text file, and manually entering the dates. In the next step, the Analysis Preferences dialog (not shown) is displayed, allowing the user to set
analysis options to estimate branch lengths used by RTDT. The estimated timetree is displayed in the Tree Explorer (see fig. 3).
MEGA11 .doi:10.1093/molbev/msab120 MBE
3025
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Conclusions
Version 11 of MEGA adds many methods and tools to keep
pace with researchers’ growing needs. The addition of evolu-
tionary dating methods in MEGA make it easier to estimate
species and strain divergence times by using more informative
node calibrations and sampling times. The new CorrTest and
EP calculations will enable a more robust evaluation of
assumptions about biological characteristics of molecular
data. The reduction in memory needs of ML-based compu-
tations will allow users to analyze much larger data sets than
before. The refactoring of distance-based methods’ calcula-
tion to run in threads independent of the main graphical
interface and other GUI enhancements greatly improve
MEGA usability for very large data sets.
Acknowledgments
We thank our laboratory members and many beta testers for
providing invaluable feedback and bug reports. This study
was supported in part by research grants from the National
Institutes of Health (R35GM139504-01), National Science
Foundation (DEB-2034228, DBI-1661218), and Japan Society
for the Promotion of Science (JSPS) grants-in-aid for scientific
research (DB5) to K.T.
Data Availability
Thesoftwareanditssourcecodeareavailablefromwww.
megasoftware.net.
References
Caspermeyer J. 2018. MEGA software celebrates silver anniversary. Mol
Biol Evol. 35(6):1558–1560.
Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history’s
imprint on the evolution of modern birds. Sci Adv. 1(11):e1501005.
Hipsley CA, Mu¨ller J. 2014. Beyond fossil calibrations: realities of molec-
ular clock practices in evolutionary biology. Front Genet.5:138.
Kimura M. 1983. The neutral theory of molecular evolution. New York:
Cambridge University Press.
FIG.3.MEGA’s Tree Explorer (A) is a feature-rich, versatile viewer of phylogenies that provides many interactive exploration and customization
facilities. In MEGA11, the new side toolbar of Tree Explorer makes formatting, rearrangement, and tree exploration tools more accessible and
intuitive. Instead of a thin toolbar with nameless buttons, we have opted for a wide toolbar with text labels identifying each tool. The toolbar can be
moved to either side of the window, and it can be toggled in and out of view. To organize related tools bygroups and accommodate limited vertical
space, collapsible panels are used. With the new toolbar, formatting tools previously displayed in external dialogs are readilyaccessible, and formats
are applied instantly instead of after the user closes the external dialog. In addition to the updated toolbar, there are now options for auto-
collapsing of nodes containing clusters of taxa belonging to the same group, user-specified cluster size, or by the branch length difference. For very
large trees with many similar sequences, this feature can greatly facilitate the visualization of evolutionary events at a glance. An option has been
added to export pairwise patristic distances between taxa to a text file for phylogenies and timetrees. For maximum likelihood and maximum
parsimony trees where ancestral sequences are present, an option has been added to navigate through sites where a change in the estimated
ancestral state differs between the parent and child on the currently selected branch. The tree information box (B) has been updated for timetrees
to show branch- and node-specific information, such as earliest and latest sample times in the currently selected subtree, days elapsed between the
divergence time for a selected node and the latest sample time, the nearest and furthest tip from a selected node, clade size and clade taxa, and
spatiotemporal information if available.
Tamura et al. .doi:10.1093/molbev/msab120 MBE
3026
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: molecular
Evolutionary Genetics Analysis across computing platforms. Mol Biol
Evol. 35(6):1547–1549.
Kumar S, Stecher G, Peterson D, Tamura K. 2012. MEGA-CC: com-
puting core of molecular evolutionary genetics analysis program
for automated and iterative data analysis. Bioinformatics
28(20):2685–2686.
Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular Evolutionary
Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol.
33(7):1870–1874.
Kumar S, Tamura K, Jakobsen I, Nei M. 2001. MEGA2: molecular
evolutionary genetics analysis software. Bioinformatics
17(12):1244–1245.
Kumar S, Tamura K, Nei M. 1993. MEGA: Molecular Evolutionary
Genetics Analysis version 1.01. University Park (PA): The
Pennsylvania State University.
Kumar S, Tamura K, Nei M. 1994. MEGA—molecular evolutionary ge-
netics analysis software for microcomputers. Comput Appl Biosci.
10(2):189–191.
Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for
Molecular Evolutionary Genetics Analysis and sequence alignment.
Brief Bioinform. 5(2):150–163.
Liu L, Tamura K, Sanderford MD, Gray V, Kumar S. 2016. A molecular
evolutionary reference or the human variome. MolBiolEvol.
33(1):245–254.
Meredith RW, Jane
cka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC,
Goodbla A, Eizirik E, Sim~
ao TLL, Stadler T, et al. 2011. Impacts of the
cretaceous terrestrial revolution and KPg extinction on mammal
diversification. Science 334(6055):521–524.
Miura S, Tamura K, Tao Q, Huuki LA, Pond SLK, Priest J, Deng J, Kumar S.
2018. A new method for inferring timetrees from temporally sam-
pled molecular sequences. PLoS Comput Biol. 16:24.
Patel R, Kumar S. 2019. On estimating evolutionary probabilities of pop-
ulation variants. BMC Evol Biol. 19(1):133 (14 pp.).
Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A,
Glicksberg BS, Xu K, Dudley JT, Kumar S. 2018. Adaptive landscape of
protein variation in human exomes. Mol Biol Evol. 35(8):2015–2025.
Stecher G, Tamura K, Kumar S. 2020. Molecular Evolutionary Genetics
Analysis (MEGA) for macOS. MolBiolEvol. 37(4):1237–1239.
Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S.
2012. Estimating divergence times in large molecular phylogenies.
Proc Natl Acad Sci USA. 109(47):19333–19338.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011.
MEGA5: molecular evolutionary genetic analysis using maximum
likelihood, evolutionary distance, and maximum parsimony meth-
ods. Mol Biol Evol. 28(10):2731–2739.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6:
molecular evolutionary genetics analysis version 6.0. MolBiolEvol.
30(12):2725–2729.
Tao Q, Tamura K, Battistuzzi F, Kumar S. 2019. A machine learning
method for detecting autocorrelation of evolutionary rates in large
phylogenies. Mol Biol Evol. 36(4):811–824.
Tao Q, Tamura K, Kumar S. 2020. Efficient methods for dating evolu-
tionary divergences. In: Ho SYW, editor. The molecular evolutionary
clock. Switzerland: Springer Nature. p. 197–219.
Tao Q, Tamura K, Mello B, Kumar S. 2020. Reliable confidence intervals
for RelTime estimates of evolutionary divergence times. Mol Biol
Evol. 37(1):280–290.
MEGA11 .doi:10.1093/molbev/msab120 MBE
3027
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
... The degree of sequence similarity search was conducted using the BLAST program (NCBI). All sequence analyses were conducted in MEGA11 [28]. Multiple sequence alignment was performed using ClustalW [28]. ...
... All sequence analyses were conducted in MEGA11 [28]. Multiple sequence alignment was performed using ClustalW [28]. Phylogenetic relationships among the current Av. ...
Article
Full-text available
Background Avibacterium paragallinarum is a causative agent of infectious coryza (IC), a disease that affects the upper respiratory tracts and paranasal sinuses of chickens, resulting significant economic losses in the poultry industry. The objective of this study was to isolate and identify Av. paragallinarum using bacteriological and molecular methods between February 2022 and April 2024. A total of 74 swab samples were collected from chickens showing ocular and nasal discharges and swelling of the infraorbital sinuses. Method Clinical samples were collected from chickens showing symptoms of IC from six locations of Ethiopia for the isolation and identification of the causative agent. Swab samples from the nasal cavity and cheesy material from the infraorbital sinus were screened using conventional PCR and inoculated onto chocolate agar enriched with 5% sheep blood. Colonies suspected of being Av. paragallinarum were transferred to brain heart agar supplemented with horse serum. Gram staining was used to examine the morphology of bacteria in pure colonies grown on chocolate and brain heart infusion agar. Results The isolation of Av. paragallinarum on chocolate and brain heart infusion agar resulted in the observation of small, translucent, dewdrop-shaped colonies after 24 h of incubation at 37 °C in a 5% CO 2 incubator. A smear prepared from a single colony of revealed Gram-negative, short rod-shaped or coccobacilli Av. paragallinarum bacteria. Biochemical tests conducted on this isolate yielded negative results for catalase, oxidase, urease, indole, Isolation, molecular detection, and sequence analysis of Avibacterium paragallinarum from suspected cases of infectious coryza infected chickens from different areas of Ethiopia, 2022-2024
... The samples were ran on a 3500 genetic analyzer (Applied Biosystems) and the resulting sequences were analyzed in Molecular Evolutionary Genetic Analysis (MEGA) v. 11.0.13 (Tamura et al., 2021) and then compared with sequences from NCBI using BLAST (Clark et al., 2016;Sayers et al., 2022) to determine the species of origin. DNA sequencing alone cannot differentiate between Odocoileus spp. ...
Article
Full-text available
This study compares the decision-making processes and workflows of complex and simple wildlife forensic cases at the Wyoming Game and Fish Wildlife Forensic Laboratory. To highlight the varied processes involved in analyzing cases at the laboratory, a complex case, consisting of eighteen different animals and a simpler case consisting of only two animals will be discussed. Both cases highlight several decision making points throughout to determine the number of samples to collect, if the samples contain biological material, the extraction methods to be used, and how to proceed with downstream analyses. These decision points are notably more numerous in the complex case. Both cases cover the process of subsampling, extraction methods, test methods, and results. At the time of the complex case, sanger sequencing, used for species identification of the deer species did not allow for the differentiation between the closely related white-tailed deer ( Odocoileus virginianus ) and mule deer ( Odocoileus hemionus ) and a protein analysis was used to differentiate them. A new procedure, population assignment in conjunction with sequencing, validated after the complex case and prior to the simple case made the differentiation easier and more efficient. This change in species identification emphasizes the need for continual validation of new procedures. Results of wildlife forensic cases are not only dependent on the analyses performed, but also on the decisions made by the analyst throughout the process.
... BLAST (Altschul et al., 1990) searching highly similar sequences in core nucleotide database was used to determine the nearest match of the analyzed sequences with an emphasis on the reference strains. The ITS of the strains CCALA 975 and K16 were aligned by hand (Iteman et al., 2000;Boyer et al., 2001;Johansen et al., 2021) and their p-distance was inferred using MEGA 11 (Tamura et al., 2021). The 16S p-distance was calculated as well, but also for the strains KP06, KP11, KP31B, and K37. ...
Article
Algae and cyanobacteria have been studied in two so far phycologically unexplored show caves (Chýnov and Koněprusy) in the Czech Republic. Taxa were identified morphologically using cultivation and subsequent light microscopy, while problematic cyanobacterial species were verified by sequencing the genes for 16S rRNA and 16S–23S internal transcribed spacer. A total of 13 cyanobacterial taxa were found, two Bacillariophyceae, three Xanthophyceae, four Chlorophyceae, six Trebouxiophyceae, 11 Streptophyta, and three Euglenida; many of these species were found for the first time in the caves. The darker parts of the caves were dominated by cyanobacteria, and green algae were found predominantly right next to light sources. This community associated with artificial light sources is referred to as “lampenflora” and is often a detrimental factor to speleothems and cave paintings.
... (CodonCode Corporation, Centerville, MA, USA) to verify the absence of pseudogenes using standard detection methods (Haran et al. 2015). The genetic distances between species were calculated pairwise using the Kimura-2-Parameter (K2P) model (Kimura 1980) in MEGA v.11.0.13 (Tamura et al. 2021), with the "pairwise deletion of gaps" option. Phylogenetic trees were constructed to visualise the observed genetic divergence between species using the Neighbour-joining (NJ) method (Saitou and Nei 1987) with PhyML 3.0 (Guindon et al. 2010). ...
Article
Full-text available
Swollen Shoot is a viral disease affecting cocoa trees, transmitted by several species of mealybugs (Insecta, Hemiptera, Sternorrhyncha, Pseudococcidae). These insects maintain trophobiotic relationships with a complex and species-rich assemblage of ants protecting them and natural enemies controlling their populations. Here, we provide a curated DNA barcode database to characterise this insect community. Systematic observation of 7,500 cocoa trees was conducted, coupled with the collection of mealybug colonies and associated insect communities (parasitoids, predators and ants). Natural enemies were reared from mealybug colonies collected from 1,430 cocoa trees. Specimens were identified morphologically and sequenced for fragments of the standard DNA barcode region of the COI. We recovered 17 species of mealybugs from the family Pseudococcidae. Amongst these species, eight are new to the Ivorian cocoa orchard: Dysmicoccus neobrevipes Beardsley, Ferrisia dasylirii (Cockerell), Maconellicoccus ugandae (Laing), Paracoccus marginatus Williams & Granara de Willink, Phenacoccus solenopsis Tinsley, Planococcus minor (Maskell), Pseudococcus concavocerarii James and Pseudococcus occiduus De Lotto. Three of these species were identified for the first time in cocoa orchards in Africa: D. neobrevipes , Fe. dasylirii and Ph. solenopsis . A total of 54 ant species were identified and represented the first record of these species associated with mealybug colonies in cocoa in Côte d’Ivoire. Amongst the species associated with the mealybugs, 22 primary parasitoids, eight hyperparasitoids, 11 ladybirds beetles (Coccinellidae), seven gall midges (Cecidomyidae), one predatory lepidopteran species and four spider species were identified. Nine species of mealybugs parasitoids are newly recorded in the African cocoa orchards: Acerophagus aff. dysmicocci, Aloencyrtus sp., Anagyrus kamali , Anagyrus aff. pseudococci, Aenasius advena , Clausenia aff. corrugata, Gyranusoidea aff. tebygi, Zaplatycerus aff. natalensis (Encyrtidae) and Coccophagus pulvinariae (Aphelinidae) and one hyperparasitoid, Pachyneuron muscarum (Pteromalidae). For Côte d’Ivoire in particular, besides the previously mentioned nine parasitoids and one hyperparasitoid, five additional species are recorded for the first time, including four primary parasitoids, Blepyrus insularis (Encyrtidae), Clausenia corrugata (Encyrtidae), Clausenia sp. (Encyrtidae), and Coccidoctonus pseudococci (Encyrtidae) and one hyperparasitoid, Cheiloneurus cyanonotus (Encyrtidae). These results significantly enhance the knowledge of the diversity of the entomofauna associated with Swollen Shoot disease and pave the way for developing control methods based on the natural regulation of its mealybug (Pseudococcidae) vectors.
... Evolutionary analysis was performed in MEGA 11 [22]. A phylogenetic tree including three samples, a reference genome, and genomes of different SARS-CoV-2 lineages was constructed using the Neighbor-Joining method [23]. ...
Article
Full-text available
Prompt determination of the etiological agent is important in an outbreak of pathogens with pandemic potential, particularly for dangerous infectious diseases. Molecular genetic methods allow for arriving at an accurate diagnosis, employing timely preventive measures, and controlling the spread of the disease-causing agent. In this study, whole-genome sequencing of three SARS-CoV-2 strains was performed using the Sanger method, which provides high accuracy in determining nucleotide sequences and avoids errors associated with multiple DNA amplification. Complete nucleotide sequences of samples, KAZ/Britain/2021, KAZ/B1.1/2021, and KAZ/Delta020/2021 were obtained, with sizes of 29.751 bp, 29.815 bp, and 29.840 bp, respectively. According to the COVID-19 Genome Annotator, 127 mutations were detected in the studied samples compared to the reference strain. The strain KAZ/Britain/2021 contained 3 deletions, 7 synonymous mutations, and 27 non-synonymous mutations, the second strain KAZ/B1.1/2021 contained 1 deletion, 5 synonymous mutations, and 31 non-synonymous mutations, and the third strain KAZ/Delta020/2021 contained 1 deletion, 5 synonymous mutations, and 37 non-synonymous mutations, respectively. The variations C241T, F106F, P314L, and D614G found in the 5′ UTR, ORF1ab, and S regions were common to all three studied samples, respectively. According to PROVEAN data, the loss-of-function mutations identified in strains KAZ/Britain/2021, KAZ/B1.1/2021, and KAZ/Delta020/2021 include 5 mutations (P218L, T716I, W149L, R52I, and Y73C), 2 mutations (S813I and Q992H), and 8 mutations (P77L, L452R, I82T, P45L, V82A, F120L, F120L, and R203M), respectively. Phylogenetic analysis showed that the strains studied (KAZ/Britain/2021, KAZ/B1.1/2021, and KAZ/Delta020/2021) belong to different SARS-CoV-2 lineages, which are closely related to samples from Germany (OU141323.1 and OU365922.1), Mexico (OK432605.1), and again Germany (OV375251.1 and OU375174.1), respectively. The nucleotide sequences of the studied SARS-CoV-2 virus strains were registered in the Genbank database with the accession numbers: ON692539.1, OP684305, and OQ561548.1.
... The gaps in these two shorter sequences were filled by the fifth nucleotide "N". Corrected pairwise distance was calculated based on the Kimura 2-parameter model (K2P; Kimura 1980) by MEGA v.11 (Tamura et al. 2021). A maximum-likelihood (ML) tree was constructed using the IQ-TREE web server (Trifinopoulos et al. ...
Article
Full-text available
Penaeid shrimps belonging to the Parapenaeopsis cornuta (Kishinouye, 1900) species group hold significant commercial value in the Indo-West Pacific, but their taxonomy has been problematic. A taxonomic revision of this group, supported by molecular genetic analysis using the barcoding gene COI, confirmed the validity of all four species within the group. Their distinguishing characteristics are redefined and illustrated, and a key for identifying the four species in the “ P. cornuta ” group is provided.
Article
Full-text available
Arabinogalactan proteins (AGPs) are complex proteoglycans present in plant cell walls across the kingdom. They play crucial roles in biological functions throughout the plant life cycle. In this study, we identified 43 gene members of the AG peptide (an AGP subfamily) within the rice genome, detailing their structure, protein-conserved domains, and motif compositions for the first time. We also examined the expression patterns of these genes across 18 tissues and organs, especially the different parts of the flower (anthers, pollen, pistil, sperm cells, and egg cells). Interestingly, the expression of some AG peptides is mainly present in the pollen grain. Transcription data and GUS staining confirmed that OsAGP6P—a member of the AG peptide gene family—is expressed in the stamen during pollen development stages 11–14, which are critical for maturation as microspores form after meiosis of pollen mother cells. It became noticeable from stage 11, when exine formation occurred—specifically at stage 12, when the intine began to develop. The overexpression of this gene in rice decreased the seed-setting rate (from 91.5% to 30.5%) and plant height (by 21.9%) but increased the tillering number (by 34.1%). These results indicate that AGP6P contributes to the development and fertility of pollen, making it a valuable gene target for future genetic manipulation of plant sterility through gene overexpression or editing.
Article
Full-text available
The P-type pentatricopeptide repeat (PPR) proteins are crucial for RNA editing and post-transcriptional regulation in plant organelles, particularly mitochondria. This study investigates the role of OsPPR674 in rice, focusing on its function in mitochondrial RNA editing. Using CRISPR/Cas9 technology, we generated ppr674 mutant and examined its phenotypic and molecular characteristics. The results indicate that ppr674 exhibits reduced plant height, decreased seed-setting rate, and poor drought tolerance. Further analysis revealed that in the ppr674 mutant, RNA editing at the 299th nucleotide position of the mitochondrial ccmC gene (C-to-U conversion) was abolished. REMSAs showed that GST-PPR674 specifically binds to RNA probes targeting this ccmC-299 site, confirming its role in this editing process. In summary, these results suggest that OsPPR674 plays a pivotal role in mitochondrial RNA editing, emphasizing the significance of PPR proteins in organelle function and plant development.
Chapter
Full-text available
Reliable estimates of divergence times are crucial for biological studies to decipher temporal patterns of macro- and microevolution of genes and organisms. Molecular sequences have become the primary source of data for estimating divergence times. The sizes of molecular data sets have grown quickly due to the development of inexpensive sequencing technology. To deal with the increasing volumes of molecular data, many efficient dating methods are being developed. These methods not only relax the molecular clock and offer flexibility to use multiple clock calibrations, but also complete calculations much more quickly than Bayesian approaches. Here, we discuss the theoretical and practical aspects of these non-Bayesian approaches and present a guide to using these methods effectively. We suggest that the computational speed and reliability of non-Bayesian relaxed-clock methods offer opportunities for enhancing scientific rigour and reproducibility in biological research for large and small data sets.
Article
Full-text available
We present the latest version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, MEGA has been optimized for use on 64-bit computing systems for analyzing bigger datasets. Researchers can now explore and analyze tens of thousands of sequences in MEGA. The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit MEGA is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OSX. The command line MEGA is available as native applications for Windows, Linux, and Mac OSX. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.
Article
Full-text available
Pathogen timetrees are phylogenies scaled to time. They reveal the temporal history of a pathogen spread through the populations as captured in the evolutionary history of strains. These timetrees are inferred by using molecular sequences of pathogenic strains sampled at different times. That is, temporally sampled sequences enable the inference of sequence divergence times. Here, we present a new approach (RelTime with Dated Tips [RTDT]) to estimating pathogen timetrees based on a relative rate framework underlying the RelTime approach that is algebraic in nature and distinct from all other current methods. RTDT does not require many of the priors demanded by Bayesian approaches, and it has light computing requirements. In analyses of an extensive collection of computer-simulated datasets, we found the accuracy of RTDT time estimates and the coverage probabilities of their confidence intervals (CIs) to be excellent. In analyses of empirical datasets, RTDT produced dates that were similar to those reported in the literature. In comparative benchmarking with Bayesian and non-Bayesian methods (LSD, TreeTime, and treedater), we found that no method performed the best in every scenario. So, we provide a brief guideline for users to select the most appropriate method in empirical data analysis. RTDT is implemented for use via a graphical user interface and in high-throughput settings in the newest release of cross-platform MEGA X software, freely available from http://www.megasoftware.net.
Article
Full-text available
The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on Apple computers. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.
Article
Full-text available
Confidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in these analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density (HPD) intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, i.e., the actual time is contained within the CIs with a 95% probability. These developments will encourage broader use of computationally-efficient RelTime approaches in molecular dating analyses and biological hypothesis testing.
Article
Full-text available
Background: The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method. Results: We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups. Conclusion: We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.
Article
Full-text available
New species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack the power to detect autocorrelated rates. Here, we present a machine learning method, CorrTest, to detect the presence of rate autocorrelation in large phylogenies. CorrTest is computationally efficient and performs better than the available state-of-the-art method. Application of CorrTest reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, parasitic protozoans, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and nonmolecular evolutionary patterns, and they will foster unbiased and precise dating of the tree of life. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Article
Full-text available
The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored non-adaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many non-adaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and that hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of non-neutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
Article
Full-text available
The molecular evolutionary genetics analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.