Maxwell Sanderford’s research while affiliated with Temple University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (35)


Fig. 1. One-hot encoding and Evolutionary Sparse Learning. (a) The sequence alignment input to ESL consists of p positions (columns) belonging to g groups (e.g., genes). The one-hot representation of the alignment is shown below the sequence, where each allele present at a position gets a bit column containing a 1 when the given allele is present in the position and a 0 otherwise. Every bit column is a feature in the ESL model, which produces weights (β) for each bit column. β captures the correlation between the binary pattern in the bit column and the hypothesis specified by labels (+1 or −1) assigned to rows in the alignment. (b) Computational time comparison for constructing an ESL model on smaller datasets using MyESL and sparagegl. (c) Computational time comparison for constructing an ESL model on larger datasets using MyESL and sparsegl, where the computational time for sparseegl was projected. (d) The distribution of non-zero regression coefficients estimated in MyESL. (e) Top genes ordered by gene sparsity score (GSS) for features selected in the ESL model.
MyESL: Sparse learning in molecular evolution and phylogenetic analysis
  • Article
  • Full-text available

January 2025

·

13 Reads

Maxwell Sanderford

·

·

·

[...]

·

Evolutionary sparse learning (ESL) uses a supervised machine learning approach, Least Absolute Shrinkage and Selection Operator (LASSO), to build models explaining the relationship between a hypothesis and the variation across genomic features (e.g., sites) in sequences alignments. ESL employs sparsity between and within the groups of genomic features (e.g., genomic loci or genes) by using sparse-group LASSO. Although some software packages are available for performing sparse group LASSO, we found them less well-suited for processing and analyzing genome-scale sequence data containing millions of features, such as bases. MyESL software fills the need for open-source software for conducting ESL analyses with facilities to pre-process the input hypotheses and large alignments, make LASSO flexible and computationally efficient, and post-process the output model to produce different metrics useful in functional or evolutionary genomics. MyESL takes binary response or phylogenetic trees as the regression response, processing them into class-balanced hypotheses as required. It also processes continuous and binary features or sequence alignments that are transformed into a binary one-hot encoded feature matrix for analysis. The model outputs are processed into user-friendly text and graphical files. The computational core of MyESL is written in C++, which offers model building with or without group sparsity, while the pre- and post-processing of inputs and model outputs is performed using customized functions written in Python. One of its applications in phylogenomics showcases the utility of MyESL. Our analysis of empirical genome-scale datasets shows that MyESL can build evolutionary models quickly and efficiently on a personal desktop, while other computational packages were unable due to their prohibitive requirements of computational resources and time. MyESL is available for Python environments on Linux and distributed as a standalone application for both Windows and macOS, which can be integrated into third-party software and pipelines.

Download

Figure 1 . The paired species contrast (PSC) design. A : An example phylogeny with one set of selected species (solid blue and red lines). Extraneous lineages (black dotted lines) and shared evolutionary history (gray dotted lines). B : A schematic depiction of the four species selected for ESL-PSC analysis. In the ESL experiment, the response variable refers to the binary phenotype, where +1 represents the convergent trait, and -1 represents the ancestral trait.
Figure 3 Heat map of Model Fit Scores. 20 values for each inclusion penalty (site and protein) were sampled from a logspace ranging from 1-99% of the maximum non-trivial penalty. A higher MFS suggests a higher risk of overfitting. Models with the best (lowest) 5% of MFS are included in predictive ensembles (Fig. 4, 5).
Evolutionary sparse learning with paired species contrast reveals the shared genetic basis of convergent traits

January 2025

·

20 Reads

Cases abound in which nearly identical traits have appeared in distant species facing similar environments. These unmistakable examples of adaptive evolution offer opportunities to gain insight into their genetic origins and mechanisms through comparative analyses. Here, we present a novel comparative genomics approach to build genetic models that underlie the independent origins of convergent traits using evolutionary sparse learning. We test the hypothesis that common genes and sites are involved in the convergent evolution of two key traits: C4 photosynthesis in grasses and echolocation in mammals. Genetic models were highly predictive of independent cases of convergent evolution of C4 photosynthesis. These results support the involvement of sequence substitutions in many common genetic loci in the evolution of convergent traits studied. Genes contributing to genetic models for echolocation were highly enriched for functional categories related to hearing, sound perception, and deafness (P << 0.01); a pattern that has eluded previous efforts applying standard molecular evolutionary approaches. We conclude that phylogeny-informed machine learning naturally excludes apparent molecular convergences due to shared species history, enhances the signal-to-noise ratio for detecting molecular convergence, and empowers the discovery of common genetic bases of trait convergences.


Fig. 1. One-hot encoding and Evolutionary Sparse Learning. (a) The sequence alignment input to ESL consists of p positions (columns) belonging to g groups (e.g., genes). The one-hot representation of the alignment is shown below the sequence, where each allele present at a position gets a bit column containing a 1 when the given allele is present in the position and a 0 otherwise. Every bit column is a feature in the ESL model, which produces weights (β) for each bit column. β captures the correlation between the binary pattern in the bit column and the hypothesis specified by labels (+1 or −1) assigned to rows in the alignment. (b) Computational time comparison for constructing an ESL model on smaller datasets using MyESL and sparagegl. (c) Computational time comparison for constructing an ESL model on larger datasets using MyESL and sparsegl, where the computational time for sparseegl was projected. (d) The distribution of non-zero regression coefficients estimated in MyESL. (e) Top genes ordered by gene sparsity score (GSS) for features selected in the ESL model.
MyESL: Sparse learning in molecular evolution and phylogenetic analysis

January 2025

·

31 Reads

Evolutionary sparse learning (ESL) uses a supervised machine learning approach, Least Absolute Shrinkage and Selection Operator (LASSO), to build models explaining the relationship between a hypothesis and the variation across genomic features (e.g., sites) in sequence alignments. ESL employs sparsity between and within the groups of genomic features (e.g., genomic loci) by using sparse-group LASSO. Although some software packages are available for performing sparse group LASSO, we found them less well-suited for processing and analyzing genome-scale data containing millions of features, such as bases. MyESL software fills the need for open-source software for conducting ESL analyses with facilities to pre-process the input hypotheses and large alignments, make LASSO flexible and computationally efficient, and post-process the output model to produce different metrics useful in functional or evolutionary genomics. MyESL can take phylogenetic trees and sequence alignments as input and transform them into numeric responses and features, respecetively. The model outputs are processed into user-friendly text and graphical files. The computational core of MyESL is written in C++, which offers model building with or without group sparsity, while the pre- and post-processing of inputs and model outputs is performed using customized functions written in Python. One of its applications in phylogenomics showcases the utility of MyESL. Our analysis of empirical genome-scale datasets shows that MyESL can build evolutionary models quickly and efficiently on a personal desktop, while other computational packages were unable due to their prohibitive requirements of computational resources and time. MyESL is available for Python environments on Linux and distributed as a standalone application for Windows and macOS. It is available from https://github.com/kumarlabgit/MyESL.


MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing

December 2024

·

52 Reads

·

17 Citations

Molecular Biology and Evolution

We introduce the 12th version of the Molecular Evolutionary Genetics Analysis (MEGA12) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also links-in an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer. MEGA12 can be downloaded from https://www.megasoftware.net.


Figure 1. Main Graphical User Interface (GUI) of MEGA12. (a) The main toolbar provides 55 access to various analytical capabilities organized in drop-down menus. (b) One of the drop-56 down menus is shown on the main window. (c) The AppTile provides access to the linked 57 DrPhylo application, which is also accessible from the Tree Explorer window (see Fig. 4a). (d) 58 The OutputTile provides access to results from DrPhylo analysis via a drop-down menu (e). 59 (f) Clicking the Prototype button allows for the building of a '.mao' analysis configuration file 60 for the command line analysis using MEGA-CC. It is necessary to click Analyze to return to 61 the standard mode to conduct analysis using the GUI. 62
Figure 2. Substitution model selection using MEGA12. (a) The relationships between the 116 time required for the standard model selection and the data size: the product of the number of 117 sequences (S) and the number of distinct site configurations (C) in the sequence alignment. 118 The time required for model selection analysis increases linearly with the data size. (b) 119 MEGA12's Analysis Preferences dialog box allows users to set options for model selection 120 analysis. The newly added Filtered option is shown, which offers a setting of BIC and AICc 121 thresholds. As the main text explains, a smaller number will result in testing fewer models. (c) 122 Time savings are achieved using the Filtered option with default parameters, which is the 123 greatest for datasets for which the full analysis selects a complex best-fit substitution model. 124 (d) The relationship between the number of model parameters and the average percentage of 125 model combinations whose ML evaluation was skipped. (e) The relationship of time taken with 126 the Filtered and Full options for model selection for chloroplast amino acid MSAs. The slope 127 of the regression line is 0.18, indicating that the Filtered option greatly speeds model selection. 128
Figure 3. Adaptive bootstrap analysis of the Drosophila Adh dataset. (a) MEGA12's Analysis
Figure 4. Adaptive bootstrap analysis in MEGA12. (a) Comparison of BS values obtained 220 using the Adaptive determination of the number of bootstrap replicates (y-axis) and those 221 obtained using 500 bootstrap replicates (x-axis). Results from all 240 data sets were pooled 222 together. The slope of the linear regression through the origin is 0.99 (R 2 = 0.99). (b) The 223 relationship between the minimum |BS -50%| in phylogeny and the number of replicates 224 needed by the Adaptive analysis. The negative trend (correlation = -0.96) confirms the inverse 225 relationship expected theoretically. (c) The relationship of time taken between the Adaptive 226 and Standard Bootstrap approach for estimating statistical support for clade relationships 227 inferred for simulated DNA sequence alignments. The slope of the regression line is ~0.20, 228 indicating that the Adaptive approach speeds up the bootstrap support estimation significantly. 229
Figure 5. Conducting DrPhylo analysis via the Tree Explorer in MEGA12. (a) Users select 278 the clade of interest by clicking on its ancestral branch or node (highlighted in green) in the 279 Tree Explorer window. (b) The context-sensitive menu, which includes the Launch DrPhylo 280 option, is displayed. (c) The dialog box to make selections for DrPhylo analysis. (d) A graphical 281 representation of the genetic model of the selected clade in a grid format (M-Grid) is shown 282 along with a descriptive caption. This model and other output files are accessible from the DrP 283 OutputTile (see Fig. 1c). (e) Caption showing the details of the DrPhylo analyses and a 284 description of the results. 285
MEGA12: Molecular Evolutionary Genetic Analysis version 12 for adaptive and green computing

December 2024

·

313 Reads

We introduce the 12th version of the Molecular Evolutionary Genetics Analysis ( MEGA ) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also implements an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer . The MEGA12 beta version can be downloaded from https://www.megasoftware.net/beta_download .


Improving cellular phylogenies through the integrated use of mutation order and optimality principles

August 2023

·

13 Reads

·

1 Citation

Computational and Structural Biotechnology Journal

The study of tumor evolution is being revolutionalized by single-cell sequencing technologies that survey the somatic variation of cancer cells. In these endeavors, reliable inference of the evolutionary relationship of single cells is a key step. However, single-cell sequences contain many errors and missing bases, which necessitate advancing standard molecular phylogenetics approaches for applications in analyzing these datasets. We have developed a computational approach that integratively applies standard phylogenetic optimality principles and patterns of co-occurrence of sequence variations to produce more expansive and accurate cellular phylogenies from single-cell sequence datasets. We found the new approach to also perform well for CRISPR/Cas9 genome editing datasets, suggesting that it can be useful for various applications. We apply the new approach to some empirical datasets to showcase its use for reconstructing recurrent mutations and mutational reversals as well as for phylodynamics analysis to infer metastatic cell migrations between tumors.


FIGURE 5 Empirical data analysis. The 40 empirical datasets from various cancer types were analyzed with the bootstrap approach. (A) Bootstrap support of consensus clones. All consensus clones from each dataset were pooled. The number in the parenthesis is the number of clones with ≤10% bootstrap support. (B) Bootstrap supports of clones for each cancer type. The cancer type is the primary tumor site. Eight datasets do not have information on the primary tumor site, so the cancer type is "unknown." The number at the top of each box plot is the number of patients, and the total number of bootstrap consensus clones with >10% bootstrap support is shown in parenthesis. (C-F) The relationship between tumor mutation burden and bootstrap support of a clone for lung (C), pancreas (D), and other cancer with low (E) and high (F) tumor mutation burden. The tumor mutation burden is the number of total variants in a dataset. The trend line was (C) y = 0.0049x + 66.03 (R 2 = 0.0026), (D) y = 0.097x + 48.035 (R 2 = 0.018), (E) y = 0.092x + 59.50 (R 2 = 0.024), and (F) y = 54.17 (R 2 = 0). Clones with ≤10% bootstrap support were excluded (B-F).
FIGURE 7 Clone phylogeny and metastatic cell migration history of a lung cancer patient. (A) Inferred clone phylogeny using CloneFinder+ without the bootstrap assessment. Grey circles represent tumor clones, and their predicted tumor sites (>0% clone frequency) are shown within boxes below the clone IDs. Tumor sites shown at internal nodes are predicted sites by PathFinder. Letters along branches are the branch ID, and branches are colored based on predicted tumor sites. All mutations are mapped at branches of the phylogeny through ancestral sequence reconstruction. When a cell migration event is inferred at a branch, the number of drivers and total mutations are shown. (B) Inferred cell migration history by PathFinder using CloneFinder+ clones without the bootstrap assessment. The numbers of drivers and total mutations are shown for each migration path. The primary and metastatic tumors are shown in blue and red boxes, respectively. (C) Bootstrap consensus migration history. The number along a path is bootstrap support (%). Dotted arrows indicate paths with <40% bootstrap support. (D) Driver mutation count and (E) driver mutation rates were compared between the paths originating from primary and metastatic tumors. The p values were computed using t-test. ATP401 patient was used. CGI was used for driver mutation prediction.
Bootstrap confidence for molecular evolutionary estimates from tumor bulk sequencing data

May 2023

·

60 Reads

·

1 Citation

Frontiers in Bioinformatics

Bulk sequencing is commonly used to characterize the genetic diversity of cancer cell populations in tumors and the evolutionary relationships of cancer clones. However, bulk sequencing produces aggregate information on nucleotide variants and their sample frequencies, necessitating computational methods to predict distinct clone sequences and their frequencies within a sample. Interestingly, no methods are available to measure the statistical confidence in the variants assigned to inferred clones. We introduce a bootstrap resampling approach that combines clone prediction and statistical confidence calculation for every variant assignment. Analysis of computer-simulated datasets showed the bootstrap approach to work well in assessing the reliability of predicted clones as well downstream inferences using the predicted clones (e.g., mapping metastatic migration paths). We found that only a fraction of inferences have good bootstrap support, which means that many inferences are tentative for real data. Using the bootstrap approach, we analyzed empirical datasets from metastatic cancers and placed bootstrap confidence on the estimated number of mutations involved in cell migration events. We found that the numbers of driver mutations involved in metastatic cell migration events sourced from primary tumors are similar to those where metastatic tumors are the source of new metastases. So, mutations with driver potential seem to keep arising during metastasis. The bootstrap approach developed in this study is implemented in software available at https://github.com/SayakaMiura/CloneFinderPlus.


Figure 5. Metastatic migration histories inferred from clone phylogenies in Figure 4. The migration maps for each patient ID (top) were classified into seeding models based on their shapes (a-d). The number of variants mapped is shown next to a path (solid: high support, dashed: low support, blue: P→M, red: M→M, brown: M→P) between sites (Primary: blue and metastatic: red) when their count is greater than zero. An abbreviation (Figure 4 legend) for each tumor site labels each box.
Figure 7. Distribution of clone migrations for the (a-e) Hu and (f-j) De Mattos cohort datasets (a,f) Histograms of migration path counts in patients. (b,g) Linear regression of the number of migrations against the number of metastatic sites sampled. (c,h) The number (center) and proportions of migrations of clones from primary tumors. (d,i) The number (center) and proportions of migrations of Figure 7. Distribution of clone migrations for the (a-e) Hu and (f-j) De Mattos cohort datasets (a,f) Histograms of migration path counts in patients. (b,g) Linear regression of the number of migrations against the number of metastatic sites sampled. (c,h) The number (center) and proportions of migrations of clones from primary tumors. (d,i) The number (center) and proportions of migrations of clones from metastatic tumors. (e,j) Proportions of path types. The De Mattos cohort does not contain primary tumor data, thus there are no M→P paths to infer.
Clone Phylogenetics Reveals Metastatic Tumor Migrations, Maps, and Models

September 2022

·

75 Reads

·

5 Citations

Dispersal routes of metastatic cells are not medically detected or even visible. A molecular evolutionary analysis of tumor variation provides a way to retrospectively infer metastatic migration histories and answer questions such as whether the majority of metastases are seeded from clones within primary tumors or seeded from clones within pre-existing metastases, as well as whether the evolution of metastases is generally consistent with any proposed models. We seek answers to these fundamental questions through a systematic patient-centric retrospective analysis that maps the dynamic evolutionary history of tumor cell migrations in many cancers. We analyzed tumor genetic heterogeneity in 51 cancer patients and found that most metastatic migration histories were best described by a hybrid of models of metastatic tumor evolution. Synthesizing across metastatic migration histories, we found new tumor seedings arising from clones of pre-existing metastases as often as they arose from clones from primary tumors. There were also many clone exchanges between the source and recipient tumors. Therefore, a molecular phylogenetic analysis of tumor variation provides a retrospective glimpse into general patterns of metastatic migration histories in cancer patients.


FIG. 1. Results produced by TToL5 web portal. (A) The divergence time of mice and humans produced by the "Get Divergence Time" search function. The median time and its confidence interval are derived from the TToL5 database. Solar luminosity (red AQ11 ¶
FIG. 2. Timetrees produced by the "Build a Timetree" feature in TToL5 to present an overview of the global timetree. Timetrees of (A) Animal phyla, (B) Animal classes, and (C) mammalian orders are shown. Tip labels are removed due to space constraints, and the timescale is shown in millions of years. Blue dots on nodes can be clicked to view the NCBI name, taxonomic rank, median time, and confidence interval around the median. Polytomies reflect phylogenetic uncertainties caused by conflicting resolutions of species relationships and divergence times among published studies included in TToL5. Users can interact with the timetree display in numerous ways (Kumar et al. 2017) and download the resulting timetree in a Newick or graphic format. One can also view individual timetrees and openly download individual published timetrees used in building the global timetree.
FIG. 3. Numbers of species, genera, families, classes, and phyla in TToL5. Orange pies show increased taxonomic representation in this edition compared to the previous edition published in 2017 (blue pies). The grey pies correspond to the number of taxa missing from the global timetree at the given taxonomic level. While the NCBI taxonomy database (Schoch et al. 2020) has over 1.3 million taxa, our count of 429,141 only included species whose names followed binomial nomenclature. This means that species whose whole names included abbreviations such as "sp." and were marked as environmental samples were excluded. Notably, the number of species with binomial nomenclature fluctuated considerably in NCBI over the last few months (429,141-510,722). We note that 124,654 species names follow the binomial nomenclature in TToL5.
TimeTree 5: An Expanded Resource for Species Divergence Times

August 2022

·

588 Reads

·

723 Citations

Molecular Biology and Evolution

We present the fifth edition of the TimeTree of Life resource (TToL5), a product of the timetree of life project that aims to synthesize published molecular timetrees and make evolutionary knowledge easily accessible to all. Using the TToL5 web portal, users can retrieve published studies and divergence times between species, the timeline of a species’ evolution beginning with the origin of life, and the timetree for a given evolutionary group at the desired taxonomic rank. TToL5 contains divergence time information on 137,306 species, 41% more than the previous edition. The TToL5 web interface is now ADA-compliant and mobile-friendly, a result of comprehensive source code refactoring. TToL5 also offers programmatic access to species divergence times and timelines through an application programming interface, which is accessible at timetree.temple.edu/api. TToL5 is publicly available at timetree.org.


Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants

April 2022

·

132 Reads

·

29 Citations

Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.


Citations (21)


... The evolutionary distances were computed using the maximum composite likelihood method [52] and are in the units of the number of base substitutions per site. Evolutionary analysis were conducted in MEGA12 [53] utilizing up to 8 parallel computing threads. ...

Reference:

Isolation, characterization and screening of phosphate (P) solubilizing actinomycetes and exploring its potency in finger millet (Eleusine coracana L.)
MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing
  • Citing Article
  • December 2024

Molecular Biology and Evolution

... For instance, methods derived from phylogeography can be adapted to analyze tumor metastasis by tracing the geographical spread of tumor cells within the body. [32][33][34][35][36] Since its inception, the field of tumor phylogenetics has garnered increasing attention and has rapidly developed. Today, tumor phylogenetics has evolved into a highly diversified field. ...

Clone Phylogenetics Reveals Metastatic Tumor Migrations, Maps, and Models

... To construct a robust phylogenetic framework, we focused on single-copy orthologs, which were used to build a maximum-likelihood phylogenetic tree, providing insights into the evolutionary relationships among these species. To estimate divergence times, we incorporated reference divergence times from fossil evidence obtained from the TimeTree database (Kumar et al. 2022), ensuring that our analysis was anchored to well-established evolutionary benchmarks. The Bayesian inference tool MCMCTree, part of the PAML v4.10.7 package (Reis & Reis and Yang 2011), was then employed to estimate divergence times across all nodes within the phylogeny using only four-fold sites within all single-copy gene sequences. ...

TimeTree 5: An Expanded Resource for Species Divergence Times

Molecular Biology and Evolution

... The DCI analysis is a method that quantifies the strength of allostery coupling between residue pairs, usually between mutation sites and function important regions of the protein [11,14]. Briefly, this method firstly records the conformational displacement of a protein in response to random perturbation on residues. ...

Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants

... The consensus sequence represents the most frequent nucleotide at each position following short-read alignment. There have been attempts to align a large number of consensus sequences with each other, thus obtaining frequencies assignable to each position [3]. However, one sequence alone does not necessarily accurately represent the composition of the sample, as it represents only a single sequence. ...

TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity

Bioinformatics

... 24 31 More recent computational analyses including a very large number of complete genomes have moved the time of viral emergence to well before the major Wuhan outbreak, up to the summer of 2019. [32][33][34] Because of these disagreements, laboratory evidence for early circulation is often dismissed and labelled as a result of false-positive testing. Antibody detection results can indeed be affected by the presence in sera of antibodies which, although able to recognise SARS-CoV-2 antigens, were induced by other agents. ...

TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity

... In light of these constraints, mutations very close to a binding site are bound to have a large e ect on the exibility and geometry of the binding site. 64 On the other hand, mutations further from the binding site can have ever smaller e ects, 65,66 enabling ne-tuning of mechanics (Fig. 3), [67][68][69][70][71] and structure. 72,73 While we do not model protein folding in this study, we studied the e ect of single mutations in proteins in the protein data bank (PDB), nding structural perturbations far from the mutated residue (SI Fig 4). ...

Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants
  • Citing Preprint
  • September 2021

... This table summarizes the notable mutations and the first detection locations of various SARS-CoV-2 variants reported as concerning. The COVID-19 pandemic, caused by the SARS-CoV-2 virus, was first detected in Wuhan, China, in November 2019 [1][2][3]. The virus, known for its high infectivity and pathogenicity, rapidly spread across the globe, leading the World Health Organization (WHO) to declare it as a global pandemic [1]. ...

An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic

Molecular Biology and Evolution

... For instance, methods derived from phylogeography can be adapted to analyze tumor metastasis by tracing the geographical spread of tumor cells within the body. [32][33][34][35][36] Since its inception, the field of tumor phylogenetics has garnered increasing attention and has rapidly developed. Today, tumor phylogenetics has evolved into a highly diversified field. ...

PathFinder: Bayesian inference of clone migration histories in cancer

Bioinformatics