Content uploaded by Sudhir Kumar
Author content
All content in this area was uploaded by Sudhir Kumar on Oct 16, 2021
Content may be subject to copyright.
Available via license: CC BY 4.0
Content may be subject to copyright.
MEGA11: Molecular Evolutionary Genetics Analysis Version 11
Koichiro Tamura,
1,2
Glen Stecher,
3
and Sudhir Kumar *
,3,4,5
1
Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
2
Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
3
Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
4
Department of Biology, Temple University, Philadelphia, PA, USA
5
Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
*Corresponding author: E-mail: s.kumar@temple.edu.
Associate editor: Fabia Ursula Battistuzzi
Abstract
The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and
tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive
tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for
estimating divergence times and confidence intervals are implemented to use probability densities for calibration
constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options
for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor,and
an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary
probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the
autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood
analysis are reduced significantly through reprogramming, and the graphical user interface has been made more re-
sponsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results,
and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11
are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.
Key words: software, phylogenetics, timetrees, tip dating, neutrality.
Introduction
The Molecular Evolutionary Genetics Analysis (MEGA) soft-
ware has continuously grown to meet the need for sophisti-
cated evolutionary analysis to discover organismal and
genome evolutionary patterns and processes. It was first re-
leased in 1993 to offer the statistical methods of molecular
evolution through an interactive interface on the Microsoft
Disk Operating System (MS-DOS) (Kumar et al. 1993). For
more than 25 years, MEGA’s scope and usefulness have grown
through the addition of new methods, tools, and interfaces,
resulting in modern integrated software for comparative se-
quence analysis (Caspermeyer 2018). Initially, MEGA con-
tained distance-based and maximum parsimony methods
for molecular phylogenetic analysis (Kumar et al. 1994). The
data acquisition and integration of major approaches for
aligning sequences were introduced to expand MEGA’s scope
(Kumar et al. 2004). Afterward, the maximum likelihood (ML)
methods and Bayesian methods were added for molecular
evolutionary analyses (Tamura et al. 2011). MEGA now con-
tains methods for selecting the best-fit substitution model(s),
estimating evolutionary distances and divergence times,
reconstructing phylogenies, predicting ancestral sequences,
testing for selection, and diagnosing disease mutations
(Caspermeyer 2018).
With every new version, MEGA has evolved to harness
technological innovations and personal desktops’ computa-
tional power. MEGA’s interface evolved from its initial MS-
DOS character-based format (Kumar et al. 1993)toarich
graphical user interface (GUI) for Microsoft Windows oper-
ating system (Kumar et al. 2001). It was then redesigned to
become activity-driven (Tamura et al. 2011), followed by the
incorporation of web technologies to ensure a consistent use-
and-feel across Microsoft Windows and Linux operating sys-
tems (Kumar et al. 2018)andmacOS(Stecher et al. 2020).
MEGA GUI is now fully cross-platform running natively on
Windows, Linux, and macOS.
MEGA’s computational core (MEGA-CC) has undergone
extensive refactoring, hardening, and expansion over time. It
advanced from 16-bit to 32-bit (Kumar et al. 2001), became
multithreaded and incorporated multicore parallelization for
various calculations (Tamura et al. 2013),andsteppedupto
64-bit architecture (Kumar et al. 2016,2018). MEGA-CC was
released for use as a command-line program to address the
growing need for batch processing of many data sets and
integration into analysis workflows (Kumar et al. 2012;
Brief Communication
ßThe Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.
org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is
properly cited. Open Access
3022 Mol. Biol. Evol. 38(7):3022–3027 doi:10.1093/molbev/msab120 Advance Access publication April 23, 2021
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Stecher et al. 2020). With both 32- and 64-bit versions of
MEGA currently available for use on the command-line and
GUI, MEGA is now a suite of applications that responds to the
variety of computing environments currently used by
researchers in molecular evolution and phylogenetics. Here,
we present key methodological additions and technical
improvements in MEGA that comprise version 11.
Methodological Additions
Expansion of Relaxed-Clock Dating Facilities
Rapid relaxed-clock methods for estimating divergence times
are becoming popular because they are feasible and efficient
for large contemporary sequence alignments (Tao, Tamura,
Kumar, et al. 2020). MEGA6 first added methods and tools for
constructing evolutionary timetrees by implementing the
RelTime method, which does not assume a molecular clock
(Tamura et al. 2012,2013). RelTime is known to perform well
and has been used to build timetrees in hundreds of research
articles (Tao, Tamura, Kumar, et al. 2020). MEGA11 expands
on RelTime dating options by advancing the current imple-
mentation and adding new facilities for node-dating and tip-
dating needed to build timetrees of pathogens, species, and
gene families.
Calibrating the Clock Using Probability Densities on Node-
Constraints
Bayesian relaxed-clock methods have long allowed the use of
statistical probability distributions that capture prior knowl-
edge (or belief) about the true divergence times in clock cal-
ibration constraints on one or more nodes in the phylogeny.
Judicious use of these probability densities can make diver-
gence times more accurate and precise (Tao, Tamura, Mello,
et al. 2020). Researchers can now use such probability densi-
ties for node calibrations in RelTime estimation of divergence
times and confidence intervals (CIs). MEGA implements the
Tao, Tamura, Mello, et al. (2020) approach that estimates CIs
by simultaneously accounting for variance introduced by the
heterogeneity of evolutionary rate among lineages, estimation
of sequence divergence using substitution models, and prob-
ability densities for node-calibration constraints. This method
produces CIs that contain correct times with a high proba-
bility, making them much more suitable for biological hy-
pothesis testing than other rapid methods (Tao, Tamura,
Kumar, et al. 2020;Tao, Tamura, Mello, et al. 2020).
For RelTime analyses in MEGA11, ML and distance-based
approachescanbeusedtobuildatimetreeforagivenphy-
logeny and multiple sequence alignment. One may also use
only a phylogeny with branch lengths, which extends the
usefulness of relaxed-clock methods for phylogenies inferred
from nonmolecular data or statistical methodologies not
available in MEGA. When a phylogeny with branch lengths
is used, the CIs will be narrower because the variance associ-
ated with branch length estimation cannot be generated
without the original data set used to produce the phylogeny
and branch lengths. Nevertheless, these CIs will incorporate
variance introduced due to rate variation among lineages and
clock calibrations’ uncertainty.
A calibration density selector has been added to the Node
Calibration Editor that provides an option to select normal,
lognormal, uniform, or exponential density (fig. 1). The user
can also specify a minimum or a maximum time bound on a
node. The calibration text file format has been extended to
specify density information and use calibration densities in
MEGA-CC. The Node Calibration Editor also includes new
functionality to specify a fixed evolutionary rate or a known
node time to calibrate the molecular clock. Such assumptions
are often used by investigators when independent calibration
information is unknown (Hipsley and Mu¨ller 2014;Tao,
Tamura, Kumar, et al. 2020).
Tip-Dating for Sequences with Sampling Times
MEGA now implements a method to estimate timetrees us-
ing sampling dates for molecular sequences. They are often
used to infer the origin and diversification of pathogens that
generally evolve fast enough to track the evolutionary change
over months and years (Tao, Tamura, Kumar, et al. 2020). Tip-
dating methods are also useful for analyzing ancient molec-
ular sequences. MEGA implements a rapid tip-dating
method, RelTime with Dated Tips (RTDT), that produces
divergencetimesandCIs(Miura et al. 2018). One may use
ML or distance-based approaches for a given phylogeny and
multiple sequence alignment for tip-dating, or a phylogeny
with branch lengths and tip dates can be given as the input.
An enhanced Timetree Wizard system (fig. 2)walkstheuser
through many steps needed to configure tip-dating analyses,
such as loading sequence and tree files, specifying the out-
groups, adding sequence sample times, and selecting the anal-
ysis options. Sequence sampling times can be specified in
multiple ways. MEGA will automatically extract them on-
demand when they are included in the sequence name.
Spatiotemporal information can also be presented in the input
alignment files as meta tags (see description below) or loaded
using specially formatted calibration text files. Once computed,
thetimetreeisdisplayedintheTree Explorer that has been
extensively revamped and updated (fig. 3). It now has many
more formatting tools, including exporting the timetree, indi-
vidual divergence times, and CI estimates in a tabular format.
Detecting Autocorrelation of Evolutionary Rates
MEGA now contains a facility for detecting autocorrelation of
evolutionary rates among branches, which is important for
understanding molecular evolution patterns and useful as a
clock rate prior in Bayesian relaxed-clock analyses. MEGA
implements the CorrTest method developed using machine
learning, which is accurate and computationally efficient (Tao
et al. 2019). The CorrTest implementation in MEGA requires
a phylogeny with sequence alignment (or branch lengths)
and is accessed through an easy-to-use wizard. This test’s final
output is a CorrScore between 0 and 1 and a P-value, where a
high CorrScore and low P-value indicates that branch rates
among lineages are likely correlated.
Calculating Neutral Evolutionary Probabilities
According to the neutral theory of molecular evolution, most
differences in molecular sequences across species are expected
MEGA11 .doi:10.1093/molbev/msab120 MBE
3023
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
to have little to no impact on fitness (Kimura 1983). Therefore,
multispecies sequence alignments have been used to estimate
neutral evolutionary probabilities (EP) of observing alternative
alleles (amino acid residues or nucleotides) in a species, con-
tingent on the given species timetree (Liu et al. 2016). MEGA
implements an advanced option for this Bayesian approach in
which the species timetree containing relative times is com-
puted automatically by using RelTime (Patel and Kumar
2019). Alleles with EP less than 0.05 are nonneutral, whereas
evolutionary permissible (neutral) alleles show much higher
EPs. Disease-associated amino acid variants in human popu-
lations have EP <0.05 and are rarely found in the population
(Liu et al. 2016). Many human adaptive variants in populations
also have low EPs, that is, nonneutral from an evolutionary
perspective, but they show high allele frequencies (Patel et al.
2018). Therefore, one may use EPs to diagnose disease muta-
tions and detect candidate adaptive variants. An EP wizard
system walks the user through the steps required to set up the
analysis. The first sequence in the alignment is used automat-
ically as the focal taxon of interest (one can rearrange sequen-
ces in the Sequence Data Explorer). EP values for all possible
bases (4 for nucleotides and 20 for amino acids) at each po-
sition in the input sequence alignment are reported in a
spreadsheet or text format.
Technological Advances
Although some new user interface elements have already
been mentioned above (figs. 1–3), additional technical advan-
ces in MEGA11 are as follows.
Expanded Group Designations
MEGA has long supported a “group” tag for sequences and
other operational taxonomic units (OTUs). Using the sequence
“group” tags, MEGA offered a group-wise exploration of input
data, selection of data subsets, and computational analyses
(Kumar 2001). Support for two new tags (“population” and
“species”) was added in MEGA7, with the species tags used to
mark duplicate genes in multigene family phylogenies (Kumar
et al. 2016). In MEGA11, sequences can now be tagged to
provide information on the continent, country, city, year,
month, day, and time. This spatiotemporal information can
be used in tip-dating analyses.
In MEGA11, we have made a MEGA-wide change to use
any meta tag to define groups. For example, if one selects the
“Year” meta tag for use as a group, they could estimate av-
erage diversity within and between sequences sampled in
different years (Distance menu). In the Sequence Data
Explorer, one can select/unselect sequences of certain years
FIG.1. Calibration points for MEGA’SRelTime method are chosen in the Node Calibration Editor window (A), accessed via the Timetree Wizard
system (see fig. 2A). The Node Calibration Editor displays the phylogeny where individual node calibrations and probability densities can be chosen
by clicking the calibration button on the top toolbar for the selected node. A dropdown menu (B) with several calibration density types is
displayed. The Node Calibration Editor then prompts the user for required distribution parameters, depending on the distribution selected: normal
distribution (mean and standard deviation), lognormal (offset, mean and standard deviation), exponential (offset and decay parameter), uniform
(min and max) (C).
Tamura et al. .doi:10.1093/molbev/msab120 MBE
3024
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
for phylogenetic analyses. Also, the display of years would be
automatically enabled in the Tree Explorer, and the feature to
collapse sequence clusters will be done by years. Additionally,
sequences can be sorted based on years in all the input data
and result explorer displays. Therefore, a dynamic designation
of groups based on the desired meta tag will enable data
exploration and analysis more efficiently.
Memory Efficient ML Analyses
ML methods are widely used for phylogenetic inference but
place high demands on computer memory, becoming in-
creasingly burdensome for bigger sequence alignments
analyzed these days. In MEGA11, we have now completed
a long-overdue refactoring of ML calculations by adding a
step to identify common site configurations, that is, sites
where all sequences have the same bases as at some other
sites, to utilize computer memory more efficiently. The mem-
ory requirements of Maximum Likelihood and Maximum
parsimony analysis are reduced (approximately) by the factor
of m/Lwhen there are mdistinct site configurations in a
sequence alignment containing Lsites. The memory saving
can be substantial for multigene and genome-scale align-
ments. For example, the memory saving was 660 MB (209
vs. 870 MB) for a sequence alignment of 229 birds with 2,728
sites (Claramunt and Cracraft 2015)and4.5GB(2.3vs.
6.8 GB) for an alignment of 162 mammals with 11,010
sites (Meredith et al. 2011). This memory saving does not
have any detrimental impact on phylogenetic estimates
and computational times because identical site
configurations have the same likelihood value. The total
log-likelihood is simply the sum of site-configuration log-like-
lihoods weighted by their frequencies. However, this upgrade
required refactoring many different parts of MEGA’s calcula-
tion engine, including functions for phylogeny construction
and model selection.
Enhanced GUI for Exploring Large Data Sets
Using a large multiple sequence alignment containing 68,000
genomes and 30,000 bases each, we assessed MEGA GUI’s
responsiveness during input data file reading, execution of
functions in the Sequence Data Explorer,estimationofpair-
wise distances, and building of distance-based phylogenies.
We found the GUI to become intermittently unresponsive
for such large data sets, which are now common due
to resequencing and population sequencing efforts.
Consequently, we have moved all potentially long-running
operations out of the main GUI thread to background
threads in a major overhaul of the source code. Now, large
input data files are read rapidly, and calculations of pairwise
distance matrices, selection tests, and phylogeny construction
for distance-based methods are performed in a background
thread. The Sequence Data Explorer has been reprogrammed
to enable more efficient highlighting of variable sites, and
navigation of the sequence alignment has been improved.
Also added are options to automatically label sites based
on attributes, which annotates sites by providing a one-
character label and then using desired labeled sites to subset
data for any molecular phylogenetic analysis desired.
FIG.2. The Tip Dating Wizard (A) guides the user through the steps required to set up the RTDT analysis. Once a sequence alignment and/or a tree
is provided, the user is prompted to specify the outgroup by selecting a node in the Tree Explorer or specifying outgroup taxa by name (not shown).
Next, sample times are specified using the Tip Dates Editor (B) with facilities for parsing tip dates (C) encoded in taxa names, importing tip dates
from a text file, and manually entering the dates. In the next step, the Analysis Preferences dialog (not shown) is displayed, allowing the user to set
analysis options to estimate branch lengths used by RTDT. The estimated timetree is displayed in the Tree Explorer (see fig. 3).
MEGA11 .doi:10.1093/molbev/msab120 MBE
3025
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Conclusions
Version 11 of MEGA adds many methods and tools to keep
pace with researchers’ growing needs. The addition of evolu-
tionary dating methods in MEGA make it easier to estimate
species and strain divergence times by using more informative
node calibrations and sampling times. The new CorrTest and
EP calculations will enable a more robust evaluation of
assumptions about biological characteristics of molecular
data. The reduction in memory needs of ML-based compu-
tations will allow users to analyze much larger data sets than
before. The refactoring of distance-based methods’ calcula-
tion to run in threads independent of the main graphical
interface and other GUI enhancements greatly improve
MEGA usability for very large data sets.
Acknowledgments
We thank our laboratory members and many beta testers for
providing invaluable feedback and bug reports. This study
was supported in part by research grants from the National
Institutes of Health (R35GM139504-01), National Science
Foundation (DEB-2034228, DBI-1661218), and Japan Society
for the Promotion of Science (JSPS) grants-in-aid for scientific
research (DB5) to K.T.
Data Availability
Thesoftwareanditssourcecodeareavailablefromwww.
megasoftware.net.
References
Caspermeyer J. 2018. MEGA software celebrates silver anniversary. Mol
Biol Evol. 35(6):1558–1560.
Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history’s
imprint on the evolution of modern birds. Sci Adv. 1(11):e1501005.
Hipsley CA, Mu¨ller J. 2014. Beyond fossil calibrations: realities of molec-
ular clock practices in evolutionary biology. Front Genet.5:138.
Kimura M. 1983. The neutral theory of molecular evolution. New York:
Cambridge University Press.
FIG.3.MEGA’s Tree Explorer (A) is a feature-rich, versatile viewer of phylogenies that provides many interactive exploration and customization
facilities. In MEGA11, the new side toolbar of Tree Explorer makes formatting, rearrangement, and tree exploration tools more accessible and
intuitive. Instead of a thin toolbar with nameless buttons, we have opted for a wide toolbar with text labels identifying each tool. The toolbar can be
moved to either side of the window, and it can be toggled in and out of view. To organize related tools bygroups and accommodate limited vertical
space, collapsible panels are used. With the new toolbar, formatting tools previously displayed in external dialogs are readilyaccessible, and formats
are applied instantly instead of after the user closes the external dialog. In addition to the updated toolbar, there are now options for auto-
collapsing of nodes containing clusters of taxa belonging to the same group, user-specified cluster size, or by the branch length difference. For very
large trees with many similar sequences, this feature can greatly facilitate the visualization of evolutionary events at a glance. An option has been
added to export pairwise patristic distances between taxa to a text file for phylogenies and timetrees. For maximum likelihood and maximum
parsimony trees where ancestral sequences are present, an option has been added to navigate through sites where a change in the estimated
ancestral state differs between the parent and child on the currently selected branch. The tree information box (B) has been updated for timetrees
to show branch- and node-specific information, such as earliest and latest sample times in the currently selected subtree, days elapsed between the
divergence time for a selected node and the latest sample time, the nearest and furthest tip from a selected node, clade size and clade taxa, and
spatiotemporal information if available.
Tamura et al. .doi:10.1093/molbev/msab120 MBE
3026
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: molecular
Evolutionary Genetics Analysis across computing platforms. Mol Biol
Evol. 35(6):1547–1549.
Kumar S, Stecher G, Peterson D, Tamura K. 2012. MEGA-CC: com-
puting core of molecular evolutionary genetics analysis program
for automated and iterative data analysis. Bioinformatics
28(20):2685–2686.
Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular Evolutionary
Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol.
33(7):1870–1874.
Kumar S, Tamura K, Jakobsen I, Nei M. 2001. MEGA2: molecular
evolutionary genetics analysis software. Bioinformatics
17(12):1244–1245.
Kumar S, Tamura K, Nei M. 1993. MEGA: Molecular Evolutionary
Genetics Analysis version 1.01. University Park (PA): The
Pennsylvania State University.
Kumar S, Tamura K, Nei M. 1994. MEGA—molecular evolutionary ge-
netics analysis software for microcomputers. Comput Appl Biosci.
10(2):189–191.
Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for
Molecular Evolutionary Genetics Analysis and sequence alignment.
Brief Bioinform. 5(2):150–163.
Liu L, Tamura K, Sanderford MD, Gray V, Kumar S. 2016. A molecular
evolutionary reference or the human variome. MolBiolEvol.
33(1):245–254.
Meredith RW, Jane
cka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC,
Goodbla A, Eizirik E, Sim~
ao TLL, Stadler T, et al. 2011. Impacts of the
cretaceous terrestrial revolution and KPg extinction on mammal
diversification. Science 334(6055):521–524.
Miura S, Tamura K, Tao Q, Huuki LA, Pond SLK, Priest J, Deng J, Kumar S.
2018. A new method for inferring timetrees from temporally sam-
pled molecular sequences. PLoS Comput Biol. 16:24.
Patel R, Kumar S. 2019. On estimating evolutionary probabilities of pop-
ulation variants. BMC Evol Biol. 19(1):133 (14 pp.).
Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A,
Glicksberg BS, Xu K, Dudley JT, Kumar S. 2018. Adaptive landscape of
protein variation in human exomes. Mol Biol Evol. 35(8):2015–2025.
Stecher G, Tamura K, Kumar S. 2020. Molecular Evolutionary Genetics
Analysis (MEGA) for macOS. MolBiolEvol. 37(4):1237–1239.
Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S.
2012. Estimating divergence times in large molecular phylogenies.
Proc Natl Acad Sci USA. 109(47):19333–19338.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011.
MEGA5: molecular evolutionary genetic analysis using maximum
likelihood, evolutionary distance, and maximum parsimony meth-
ods. Mol Biol Evol. 28(10):2731–2739.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6:
molecular evolutionary genetics analysis version 6.0. MolBiolEvol.
30(12):2725–2729.
Tao Q, Tamura K, Battistuzzi F, Kumar S. 2019. A machine learning
method for detecting autocorrelation of evolutionary rates in large
phylogenies. Mol Biol Evol. 36(4):811–824.
Tao Q, Tamura K, Kumar S. 2020. Efficient methods for dating evolu-
tionary divergences. In: Ho SYW, editor. The molecular evolutionary
clock. Switzerland: Springer Nature. p. 197–219.
Tao Q, Tamura K, Mello B, Kumar S. 2020. Reliable confidence intervals
for RelTime estimates of evolutionary divergence times. Mol Biol
Evol. 37(1):280–290.
MEGA11 .doi:10.1093/molbev/msab120 MBE
3027
Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by Temple University user on 28 June 2021