Koichiro Tamura’s research while affiliated with Tokyo Metropolitan University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


R3F: An R package for evolutionary dates, rates, and priors using the relative rate framework
  • Preprint
  • File available

February 2025

·

21 Reads

Qiqing Tao

·

·

Koichiro Tamura

·

The relative rate framework (RRF) can estimate divergence times from branch lengths in a phylogeny, which is the theoretical basis of the RelTime method frequently applied, a relaxed clock approach for molecular dating that scales well for large phylogenies. The use of RRF has also enabled the development of computationally efficient and accurate methods for testing the autocorrelation of lineage rates in a phylogeny (CorrTest) and selecting data-driven parameters of the birth-death speciation model (ddBD), which can be used to specify priors in Bayesian molecular dating. We have developed R3F, an R package implementing RRF to estimate divergence times, infer lineage rates, conduct CorrTest, and build a ddBD tree prior for Bayesian dating in molecular phylogenies. Here, we describe R3F functionality and explain how to interpret and use its outputs in other visualization software and packages, such as MEGA, ggtree, and FigTree. Ultimately, R3F is intended to enable the dating of the Tree of Life with greater accuracy and precision, which would have important implications for studies of organism evolution, diversification dynamics, phylogeography, and biogeography. Availability and Implementation: The source codes and related instructions for installing and implementing R3F are available from GitHub (https://github.com/cathyqqtao/R3F).

Download

Figure 1 . The paired species contrast (PSC) design. A : An example phylogeny with one set of selected species (solid blue and red lines). Extraneous lineages (black dotted lines) and shared evolutionary history (gray dotted lines). B : A schematic depiction of the four species selected for ESL-PSC analysis. In the ESL experiment, the response variable refers to the binary phenotype, where +1 represents the convergent trait, and -1 represents the ancestral trait.
Figure 3 Heat map of Model Fit Scores. 20 values for each inclusion penalty (site and protein) were sampled from a logspace ranging from 1-99% of the maximum non-trivial penalty. A higher MFS suggests a higher risk of overfitting. Models with the best (lowest) 5% of MFS are included in predictive ensembles (Fig. 4, 5).
Evolutionary sparse learning with paired species contrast reveals the shared genetic basis of convergent traits

January 2025

·

15 Reads

·

John Benjamin Allard

·

·

[...]

·

Glenn Stephen Gerhard

Cases abound in which nearly identical traits have appeared in distant species facing similar environments. These unmistakable examples of adaptive evolution offer opportunities to gain insight into their genetic origins and mechanisms through comparative analyses. Here, we present a novel comparative genomics approach to build genetic models that underlie the independent origins of convergent traits using evolutionary sparse learning. We test the hypothesis that common genes and sites are involved in the convergent evolution of two key traits: C4 photosynthesis in grasses and echolocation in mammals. Genetic models were highly predictive of independent cases of convergent evolution of C4 photosynthesis. These results support the involvement of sequence substitutions in many common genetic loci in the evolution of convergent traits studied. Genes contributing to genetic models for echolocation were highly enriched for functional categories related to hearing, sound perception, and deafness (P << 0.01); a pattern that has eluded previous efforts applying standard molecular evolutionary approaches. We conclude that phylogeny-informed machine learning naturally excludes apparent molecular convergences due to shared species history, enhances the signal-to-noise ratio for detecting molecular convergence, and empowers the discovery of common genetic bases of trait convergences.


MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing

December 2024

·

40 Reads

·

5 Citations

Molecular Biology and Evolution

We introduce the 12th version of the Molecular Evolutionary Genetics Analysis (MEGA12) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also links-in an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer. MEGA12 can be downloaded from https://www.megasoftware.net.


Figure 1. Main Graphical User Interface (GUI) of MEGA12. (a) The main toolbar provides 55 access to various analytical capabilities organized in drop-down menus. (b) One of the drop-56 down menus is shown on the main window. (c) The AppTile provides access to the linked 57 DrPhylo application, which is also accessible from the Tree Explorer window (see Fig. 4a). (d) 58 The OutputTile provides access to results from DrPhylo analysis via a drop-down menu (e). 59 (f) Clicking the Prototype button allows for the building of a '.mao' analysis configuration file 60 for the command line analysis using MEGA-CC. It is necessary to click Analyze to return to 61 the standard mode to conduct analysis using the GUI. 62
Figure 2. Substitution model selection using MEGA12. (a) The relationships between the 116 time required for the standard model selection and the data size: the product of the number of 117 sequences (S) and the number of distinct site configurations (C) in the sequence alignment. 118 The time required for model selection analysis increases linearly with the data size. (b) 119 MEGA12's Analysis Preferences dialog box allows users to set options for model selection 120 analysis. The newly added Filtered option is shown, which offers a setting of BIC and AICc 121 thresholds. As the main text explains, a smaller number will result in testing fewer models. (c) 122 Time savings are achieved using the Filtered option with default parameters, which is the 123 greatest for datasets for which the full analysis selects a complex best-fit substitution model. 124 (d) The relationship between the number of model parameters and the average percentage of 125 model combinations whose ML evaluation was skipped. (e) The relationship of time taken with 126 the Filtered and Full options for model selection for chloroplast amino acid MSAs. The slope 127 of the regression line is 0.18, indicating that the Filtered option greatly speeds model selection. 128
Figure 3. Adaptive bootstrap analysis of the Drosophila Adh dataset. (a) MEGA12's Analysis
Figure 4. Adaptive bootstrap analysis in MEGA12. (a) Comparison of BS values obtained 220 using the Adaptive determination of the number of bootstrap replicates (y-axis) and those 221 obtained using 500 bootstrap replicates (x-axis). Results from all 240 data sets were pooled 222 together. The slope of the linear regression through the origin is 0.99 (R 2 = 0.99). (b) The 223 relationship between the minimum |BS -50%| in phylogeny and the number of replicates 224 needed by the Adaptive analysis. The negative trend (correlation = -0.96) confirms the inverse 225 relationship expected theoretically. (c) The relationship of time taken between the Adaptive 226 and Standard Bootstrap approach for estimating statistical support for clade relationships 227 inferred for simulated DNA sequence alignments. The slope of the regression line is ~0.20, 228 indicating that the Adaptive approach speeds up the bootstrap support estimation significantly. 229
Figure 5. Conducting DrPhylo analysis via the Tree Explorer in MEGA12. (a) Users select 278 the clade of interest by clicking on its ancestral branch or node (highlighted in green) in the 279 Tree Explorer window. (b) The context-sensitive menu, which includes the Launch DrPhylo 280 option, is displayed. (c) The dialog box to make selections for DrPhylo analysis. (d) A graphical 281 representation of the genetic model of the selected clade in a grid format (M-Grid) is shown 282 along with a descriptive caption. This model and other output files are accessible from the DrP 283 OutputTile (see Fig. 1c). (e) Caption showing the details of the DrPhylo analyses and a 284 description of the results. 285
MEGA12: Molecular Evolutionary Genetic Analysis version 12 for adaptive and green computing

December 2024

·

214 Reads

We introduce the 12th version of the Molecular Evolutionary Genetics Analysis ( MEGA ) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also implements an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer . The MEGA12 beta version can be downloaded from https://www.megasoftware.net/beta_download .


Computational Reproducibility of Molecular Phylogenies

July 2023

·

82 Reads

·

2 Citations

Molecular Biology and Evolution

Repeated runs of the same program can generate different molecular phylogenies from identical datasets under the same analytical conditions. This lack of reproducibility of inferred phylogenies casts a long shadow on downstream research employing these phylogenies in areas such as comparative genomics, systematics, and functional biology. We have assessed the relative accuracies and log-likelihoods of alternative phylogenies generated for computer-simulated and empirical datasets. Our findings indicate that these alternative phylogenies reconstruct evolutionary relationships with comparable accuracy. They also have similar log-likelihoods that are not inferior to the log-likelihoods of the true tree. We determined that the direct relationship between irreproducibility and inaccuracy is due to their common dependence on the amount of phylogenetic information in the data. While computational reproducibility can be enhanced through more extensive heuristic searches for the maximum likelihood tree, this does not lead to higher accuracy. We conclude that computational irreproducibility plays a minor role in molecular phylogenetics.


FIG. 1. Calibration points for MEGA'S RelTime method are chosen in the Node Calibration Editor window (A), accessed via the Timetree Wizard system (see fig. 2A). The Node Calibration Editor displays the phylogeny where individual node calibrations and probability densities can be chosen by clicking the calibration button on the top toolbar for the selected node. A dropdown menu (B) with several calibration density types is displayed. The Node Calibration Editor then prompts the user for required distribution parameters, depending on the distribution selected: normal distribution (mean and standard deviation), lognormal (offset, mean and standard deviation), exponential (offset and decay parameter), uniform (min and max) (C).
FIG. 2. The Tip Dating Wizard (A) guides the user through the steps required to set up the RTDT analysis. Once a sequence alignment and/or a tree is provided, the user is prompted to specify the outgroup by selecting a node in the Tree Explorer or specifying outgroup taxa by name (not shown). Next, sample times are specified using the Tip Dates Editor (B) with facilities for parsing tip dates (C) encoded in taxa names, importing tip dates from a text file, and manually entering the dates. In the next step, the Analysis Preferences dialog (not shown) is displayed, allowing the user to set analysis options to estimate branch lengths used by RTDT. The estimated timetree is displayed in the Tree Explorer (see fig. 3).
MEGA11: Molecular Evolutionary Genetics Analysis Version 11

April 2021

·

10,274 Reads

·

12,208 Citations

Molecular Biology and Evolution

The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses, which will be supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. We have now added a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface (GUI) has been made more responsive and interactive for very big datasets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled GUI and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.


Figure 4. Examples of clone seeding scenarios used for generating simulated data (El-Kebir et al., 2018), arranged by complexity: single clones migrating from single tumor sources (mS, monoclonal single-source seeding) or from multiple tumors (pS, polyclonal single-source seeding), and multiple clones migrating from multiple sources (pM, polyclonal multi-source seeding) or migrating from metastasis back to primary (pR, polyclonal reseeding). Redrawn from Chroni et al. (2019).
Figure 7. Overall performance of PathFinder for different types of (a) migration histories and (b) datasets with small and large number of tumor sites sampled. Standard errors are also shown. (c) A tabular comparison of the difference in F 1 -scores of PathFinder between the seeding scenarios is shown. The z-scores and the corresponding P values are shown below and above the diagonal, respectively.
Figure 8. Scatter plot of F 1 scores of PathFinder (consensus migration history, MH) and (i) of those with MACHINA's hierarchical (circles) and (ii) of those with PathFinder's most probable MH (triangles). The graph shows the results for which PathFinder found multiple alternative MH (31 datasets).
Figure 9. Comparative performance of PathFinder (black bars) and MACHINA (gray bars) for (a) different types of migration histories and (b) datasets with small and large number of tumor sites sampled. Standard errors and P values by t-test are also shown.
Figure 10. Analysis of Patient A7 with basal-like breast cancer (Hoadley et al., 2016). (a) Clone phylogeny and tumor location of each clone as reported in the original study. Nodes A1-A4 are ancestral nodes. (b) Clone migration history predicted by PathFinder. Inference of migration paths includes P!M and M!M paths, all of which have a high P ¼ 1.0. Colors correspond to the tumor location where clones were sampled from
PathFinder: Bayesian inference of clone migration histories in cancer

December 2020

·

105 Reads

·

10 Citations

Bioinformatics

Metastases cause a vast majority of cancer morbidity and mortality. Metastatic clones are formed by dispersal of cancer cells to secondary tissues, and are not medically detected or visible until later stages of cancer development. Clone phylogenies within patients provide a means of tracing the otherwise inaccessible dynamic history of migrations of cancer cells. Here, we present a new Bayesian approach, PathFinder, for reconstructing the routes of cancer cell migrations. PathFinder uses the clone phylogeny, the number of mutational differences among clones, and the information on the presence and absence of observed clones in primary and metastatic tumors. By analyzing simulated datasets, we found that PathFinder performes well in reconstructing clone migrations from the primary tumor to new metastases as well as between metastases. It was more challenging to trace migrations from metastases back to primary tumors. We found that a vast majority of errors can be corrected by sampling more clones per tumor, and by increasing the number of genetic variants assayed per clone. We also identified situations in which phylogenetic approaches alone are not sufficient to reconstruct migration routes. In conclusion, we anticipate that the use of PathFinder will enable a more reliable inference of migration histories and their posterior probabilities, which is required to assess the relative preponderance of seeding of new metastasis by clones from primary tumors and/or existing metastases. Availability and implementation PathFinder is available on the web at https://github.com/SayakaMiura/PathFinder.


PathFinder: Bayesian inference of clone migration histories in cancer

July 2020

·

113 Reads

Metastases form by dispersal of cancer cells to secondary tissues. They cause a vast majority of cancer morbidity and mortality. Metastatic clones are not medically detected or visible until later stages of cancer development. Thus, clone phylogenies within patients provide a means of tracing the otherwise inaccessible dynamic history of migrations of cancer cells. Here we present a new Bayesian approach, PathFinder, for reconstructing the routes of cancer cell migrations. PathFinder uses the clone phylogeny and the numbers of mutational differences among clones, along with the information on the presence and absence of observed clones in different primary and metastatic tumors. In the analysis of simulated datasets, PathFinder performed well in reconstructing migrations from the primary tumor to new metastases as well as between metastases. However, it was much more challenging to trace migrations from metastases back to primary tumors. We found that a vast majority of errors can be corrected by sampling more clones per tumor and by increasing the number of genetic variants assayed. We also identified situations in which phylogenetic approaches alone are not sufficient to reconstruct migration routes. Conclusions: We anticipate that the use of PathFinder will enable a more reliable inference of migration histories, along with their posterior probabilities, which is required to assess the relative preponderance of seeding of new metastasis by clones from primary tumors and/or existing metastases. Availability: PathFinder is available on the web at https://github.com/SayakaMiura/PathFinder.


FIG. 1. MEGA for macOS graphical user interface, developed with 64-bit Cocoa API and widgets. The close, minimize, and maximize buttons are located on the left side of the title bar, and native toolbars, scroll bars, spin edits, and other visual controls are used. (a) The main form has the same look in macOS, Windows, and Linux. (b) The Trace Editor form uses an entirely custom-drawn GUI control for data display. (c) The Distance Matrix Explorer uses a native Cocoa view for displaying the pairwise distance matrix table. (d) The Alignment Editor uses the native Cocoa tab control for switching between the DNA Sequences and Translated Protein Sequences grids. (e) The Caption Expert form uses an HTML5 compliant web browser control to display results and detailed analysis information. (f) The Tree Explorer also uses a custom-drawn graphical control for rendering trees while the display is uniform across operating systems. (g) On the main form, the Analysis and Prototype buttons are used to toggle between GUI and CLI (Command Line Interface) execution modes (see Kumar et al. 2018).
Molecular Evolutionary Genetics Analysis (MEGA) for macOS

January 2020

·

1,687 Reads

·

1,500 Citations

Molecular Biology and Evolution

The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on Apple computers. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.


Fig. 12.4 Distributions of the normalized differences between true node times (NT) and estimated times obtained from RelTime and MCMCTree for internal nodes. Comparisons of the performance of RelTime (black curve) and MCMCTree (grey curve) for data sets simulated under (a) independent branch-rates (IBR) model with low variation, (b) IBR model with high variation, and (c) autocorrelated branch-rates (ABR) model. Comparisons of the performance of RelTime (black curve) and MCMCTree (grey curve) for estimating node times (d) outside the speed-up clades and (e) inside the speed-up clades. Data and results are from Tamura et al. (2012). Dashed grey line indicates the 0% difference in NT
Differences between Bayesian dating methods, penalized likelihood, and RelTime
Efficient Methods for Dating Evolutionary Divergences

January 2020

·

312 Reads

·

26 Citations

Reliable estimates of divergence times are crucial for biological studies to decipher temporal patterns of macro- and microevolution of genes and organisms. Molecular sequences have become the primary source of data for estimating divergence times. The sizes of molecular data sets have grown quickly due to the development of inexpensive sequencing technology. To deal with the increasing volumes of molecular data, many efficient dating methods are being developed. These methods not only relax the molecular clock and offer flexibility to use multiple clock calibrations, but also complete calculations much more quickly than Bayesian approaches. Here, we discuss the theoretical and practical aspects of these non-Bayesian approaches and present a guide to using these methods effectively. We suggest that the computational speed and reliability of non-Bayesian relaxed-clock methods offer opportunities for enhancing scientific rigour and reproducibility in biological research for large and small data sets.


Citations (7)


... Codon usage bias was analyzed using MEGA v12 [34] to calculate the relative synonymous codon usage (RSCU) values. Pairwise genetic distances among species were calculated also using MEGA v12 [34] under the Kimura 2-parameter model with gammadistributed rate variation. ...

Reference:

Phylogenetics and Evolutionary Dynamics of Yunnan Acrididae Grasshoppers Inferred from 17 New Mitochondrial Genomes
MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing
  • Citing Article
  • December 2024

Molecular Biology and Evolution

... In addition, PsiPartition is better than PsiPartitionFast, which suggests that the parameters w play a vital role for partitioning. On the dataset Calisto, PsiPartitionFast is slightly better than PsiPartition, which is probably caused by the relatively small size of the dataset and the randomness in phylogenetic inference (Kumar et al. 2023). The results demonstrate that our proposed methods are more accurate than the default IQ-TREE and other existing partitioning methods, especially on the datasets with more sites and taxa. ...

Computational Reproducibility of Molecular Phylogenies

Molecular Biology and Evolution

... The average genetic distances between species were calculated for the 16S fragment flanked by primers 16Sa-L and 16Sb-H, because this fragment was available for all samples included in the analyses. Estimates were done using the alignment generated in MAFFT v.7.25 imported in MEGA v11.0.1 (Tamura et al. 2021) and using uncorrected p-distance with pairwise deletion and variance estimated by bootstrap (1000 replicates). ...

MEGA11: Molecular Evolutionary Genetics Analysis Version 11

Molecular Biology and Evolution

... In RRF, the principle of minimum rate change among lineages and their descendants is applied, enabling an algebraic solution for estimating relative times and lineage rates using branch lengths (Tamura et al. 2018). RRF is the theoretical foundation of the RelTime method, which is statistically accurate and computationally efficient in estimating divergence times in analyzing large computer-simulated and empirical datasets (review in Tao et al., 2020). RelTime, as implemented in MEGA Tamura et al. 2021), has been cited in hundreds of research articles. ...

Efficient Methods for Dating Evolutionary Divergences

... For instance, methods derived from phylogeography can be adapted to analyze tumor metastasis by tracing the geographical spread of tumor cells within the body. [32][33][34][35][36] Since its inception, the field of tumor phylogenetics has garnered increasing attention and has rapidly developed. Today, tumor phylogenetics has evolved into a highly diversified field. ...

PathFinder: Bayesian inference of clone migration histories in cancer

Bioinformatics

... The RelTime method was developed to estimate timetrees from molecular sequence data when evolutionary rates vary among lineages . It has been shown to be accurate in the analysis of computer simulated data that were generated with extensive rate heterogeneity throughout the tree (Tamura et al. 2012;Filipski et al. 2014;Tamura et al. 2017). In analyses of many large empirical datasets, RelTime estimated divergence times similar to those reported from Bayesian methods, as long as both methods were used under the same conditions (Mello et al. 2017). ...

Theoretical foundation of the RelTime method for estimating divergence times from variable evolutionary rates

... All positions with less than 90 % site coverage were eliminated, i.e., fewer than 10 % alignment gaps, missing data, and ambiguous bases were allowed at any one position (partial deletion option). Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018;Stecher et al., 2020). ...

Molecular Evolutionary Genetics Analysis (MEGA) for macOS

Molecular Biology and Evolution