ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Genetic linkage causes the fate of new mutations in a population to be contingent on the genetic background on which they appear. This makes it challenging to identify how individual mutations affect fitness. To overcome this challenge, we developed marginal path likelihood (MPL), a method to infer selection from evolutionary histories that resolves genetic linkage. Validation on real and simulated data sets shows that MPL is fast and accurate, outperforming existing inference approaches. We found that resolving linkage is crucial for accurately quantifying selection in complex evolving populations, which we demonstrate through a quantitative analysis of intrahost HIV-1 evolution using multiple patient data sets. Linkage effects generated by variants that sweep rapidly through the population are particularly strong, extending far across the genome. Taken together, our results argue for the importance of resolving linkage in studies of natural selection.
Articles
https://doi.org/10.1038/s41587-020-0737-3
1Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China. 2Institute for Advanced Study,
Hong Kong University of Science and Technology, Hong Kong, China. 3The Kirby Institute, University of New South Wales, Sydney, New South Wales,
Australia. 4School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia. 5Department of Chemical and Biological
Engineering, Hong Kong University of Science and Technology, Hong Kong, China. 6Department of Physics and Astronomy, University of California,
Riverside, Riverside, CA, USA. 7These authors contributed equally: Muhammad Saqib Sohail, Raymond H. Y. Louie. e-mail: m.mckay@ust.hk;
john.barton@ucr.edu
Evolving populations exhibit complex dynamics. Cancers16 and
pathogens, such as HIV-1 (refs. 79) and influenza10,11, gener-
ate multiple beneficial mutations that increase fitness or allow
them to escape immunity. Subpopulations with different beneficial
mutations then compete with one another for dominance, referred
to as clonal interference, resulting in the loss of some mutations
that increase fitness12. Neutral or deleterious mutations can also
hitchhike to high frequencies if they occur on advantageous genetic
backgrounds13. Experiments have demonstrated that these features
of genetic linkage are pervasive in nature1416.
Linkage makes distinguishing the fitness effects of individual
mutations challenging because their dynamics are contingent on the
genetic background on which they appear. Lineage tracking experi-
ments can be used to identify beneficial mutations17, but they can-
not readily be applied to evolution in natural conditions, such as in
cancer or in natural infection by viruses or bacteria. Most existing
computational methods to infer fitness from population dynamics
ignore linkage entirely1825. Ignoring linkage could lead to errors
when genetic hitchhiking or clonal interference are present, which
frequently occur in nature. A few methods have attempted to incor-
porate linkage information, but these methods are exceptionally
computationally intensive and may scale poorly to populations with
many polymorphic variants2628.
Here we describe a method to infer selection from evolution-
ary histories, captured by genetic time series data, and demon-
strate its ability to resolve linkage effects. Simulations show that
our approach, which we call marginal path likelihood (MPL)29,30, is
faster and more accurate than current state-of-the-art methods for
selection inference. As an example application, we use our method
to reveal patterns of selection in intrahost HIV-1 evolution using
14 patient data sets. The genetic diversity exhibited in these data
sets makes them exceptionally challenging to analyze using exist-
ing linkage-aware methods. With MPL, we observe strong selection
for escape from CD8+ T cell responses, which is partially masked
by linkage due to extensive clonal interference between competing
escape mutants. We further quantify the influence of linkage on
inferred selection across the viral genome. Our results show that
most variants have negligible effects on inferred selection at other
sites, but a small minority of highly influential variants have dra-
matic and far-reaching effects. These highly influential variants are
often ones that sweep rapidly through the population. We also find
modest selection for escape from antibody responses, even in an
individual who develops broadly neutralizing antibodies (bnAbs).
Collectively, our results argue for the importance of accounting for
genetic linkage when inferring selection, while providing a practical
method for achieving this for large data sets.
Results
Evolutionary model incorporating linkage. The principle idea of
our inference approach is to efficiently quantify the probability of an
evolutionary ‘path,’ defined by the set of all mutant allele frequencies
at each time, using a path integral method derived from statistical
physics (Methods). Path integrals for related evolutionary models
have been derived under different assumptions in past work3133,
but they have not been widely applied for inference. This method
allows us to disentangle the effects of individual mutations from the
sequence background, that is, genetic linkage, without making the
likelihood function intractable. In fact, the path integral can be ana-
lytically inverted to find the parameters that are most likely to have
generated a path.
To define the path integral, we consider Wright–Fisher (WF)
population dynamics with selection, mutation and recombination,
in the diffusion limit34. Under an additive fitness model, the fitness
of any individual is a sum of selection coefficients, si, which quan-
tify the selective advantage of mutant allele i relative to wild-type
(WT). The probability of an evolutionary path is then a product of
MPL resolves genetic linkage in fitness inference
from complex evolutionary histories
Muhammad Saqib Sohail1,7, Raymond H. Y. Louie1,2,3,4,7, Matthew R. McKay 1,5 ✉ and
John P. Barton 6 ✉
Genetic linkage causes the fate of new mutations in a population to be contingent on the genetic background on which they
appear. This makes it challenging to identify how individual mutations affect fitness. To overcome this challenge, we developed
marginal path likelihood (MPL), a method to infer selection from evolutionary histories that resolves genetic linkage. Validation
on real and simulated data sets shows that MPL is fast and accurate, outperforming existing inference approaches. We found
that resolving linkage is crucial for accurately quantifying selection in complex evolving populations, which we demonstrate
through a quantitative analysis of intrahost HIV-1 evolution using multiple patient data sets. Linkage effects generated by vari-
ants that sweep rapidly through the population are particularly strong, extending far across the genome. Taken together, our
results argue for the importance of resolving linkage in studies of natural selection.
NATURE BIOTECHNOLOGY | VOL 39 | APRIL 2021 | 472–479 | www.nature.com/naturebiotechnology
472
Content courtesy of Springer Nature, terms of use apply. Rights reserved
... This makes it more difficult to accurately estimate model parameters since statistics over the path must be estimated from incomplete information. A workaround used in a previous study 4 for this problem is to use linear interpolation to estimate the state of the system between the observed data points. However, this approximation may fail when gaps in time are large enough such that the behavior of the system is highly nonlinear 13 . ...
... Sohail et al. solved this problem analytically in the limit that the population size N → ∞ while the selection coefficients s i and mutation rate µ scale as 1/N (ref. 4 ). In this case, the maximum a posteriori vector of selection coefficientsŝ = (ŝ i ) L i=1 that best explain the data are given bŷ ...
... The reduction in error for Bézier interpolation is more substantial for off-diagonal terms compared to diagonal ones. Consistent with this observation, Bézier interpolation yields smaller improvements in performance for a simple version of MPL in which the off-diagonal terms of the integrated covariance matrix are ignored (Methods; referred to as the single locus (SL) method in ref. 4 ). ...
Preprint
Full-text available
Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accurately estimating time-integrated quantities using B\'ezier interpolation. We applied our approach to two dynamical inference problems: determining fitness parameters for evolving populations and inferring forces driving Ornstein-Uhlenbeck processes. We found that B\'ezier interpolation reduces the estimation bias for both dynamical inference problems. This improvement was especially noticeable for data sets with limited time resolution. Our method could be broadly applied to improve accuracy for other dynamical inference problems using finitely sampled data.
... Most existing methods are based on single-locus models which assume independent evolution of loci (Bollback et al. 2008;Malaspinas et al. 2012;Mathieson and McVean 2013;Feder et al. 2014;Lacerda and Seoighe 2014;Steinrücken et al. 2014;Foll et al. 2015;Topa et al. 2015;Ferrer-Admetlla et al. 2016;Gompert 2016;Schraiber et al. 2016;Iranmehr et al. 2017;Taus et al. 2017;Zinger et al. 2019), thus they are unable to directly account for genetic linkage or epistasis. A few methods (Illingworth and Mustonen 2011;Terhorst et al. 2015;Sohail et al. 2021) have been developed that consider the joint evolution of multiple loci, but these assume additive fitness models. Hence, while they account for genetic linkage, they do not consider epistasis. ...
... Due to its analytical form, our approach is straightforward to implement and computationally efficient for moderate numbers of loci. Our method is based on an extension of the marginal path likelihood (MPL) framework (Sohail et al. 2021) to account for epistasis. We use a path integral method derived from statistical physics (Risken 1989) to efficiently represent the likelihood of an observed trajectory of single and double mutant allele frequencies. ...
... For fitness landscapes with epistasis, any inference model that does not explicitly account for epistasis will ascribe the effect of epistasis terms to individual selection coefficients, thereby over-or under-estimating them. To test this, we ran simulations to compare the performance of the MPL method, which accounts for both linkage and epistasis, with the one we proposed previously, which accounts only for linkage and considers a first-order fitness model with no epistasis (Sohail et al. 2021). Here we term this variant as "MPL (without epistasis)". ...
Article
Full-text available
Epistasis refers to fitness or functional effects of mutations that depend on the sequence background in which these mutations arise. Epistasis is prevalent in nature, including populations of viruses, bacteria, and cancers, and can contribute to the evolution of drug resistance and immune escape. However, it is difficult to directly estimate epistatic effects from sampled observations of a population. At present, there are very few methods that can disentangle the effects of selection (including epistasis), mutation, recombination, genetic drift, and genetic linkage in evolving populations. Here we develop a method to infer epistasis, along with the fitness effects of individual mutations, from observed evolutionary histories. Simulations show that we can accurately infer pairwise epistatic interactions provided that there is sufficient genetic diversity in the data. Our method also allows us to identify which fitness parameters can be reliably inferred from a particular data set and which ones are unidentifiable. Our approach therefore allows for the inference of more complex models of selection from time series genetic data, while also quantifying uncertainty in the inferred parameters.
... It is important to note that the use of diffusion processes similar to that in Eq (3) has a long history in population genetics, including seminal work by Kimura [15] as well recent applications that employ diffusion-based likelihoods in the context of statistical inference [16][17][18][19]. ...
Article
Full-text available
The global effort to sequence millions of SARS-CoV-2 genomes has provided an unprecedented view of viral evolution. Characterizing how selection acts on SARS-CoV-2 is critical to developing effective, long-lasting vaccines and other treatments, but the scale and complexity of genomic surveillance data make rigorous analysis challenging. To meet this challenge, we develop Bayesian Viral Allele Selection (BVAS), a principled and scalable probabilistic method for inferring the genetic determinants of differential viral fitness and the relative growth rates of viral lineages, including newly emergent lineages. After demonstrating the accuracy and efficacy of our method through simulation, we apply BVAS to 6.9 million SARS-CoV-2 genomes. We identify numerous mutations that increase fitness, including previously identified mutations in the SARS-CoV-2 Spike and Nucleocapsid proteins, as well as mutations in non-structural proteins whose contribution to fitness is less well characterized. In addition, we extend our baseline model to identify mutations whose fitness exhibits strong dependence on vaccination status as well as pairwise interaction effects, i.e. epistasis. Strikingly, both these analyses point to the pivotal role played by the N501 residue in the Spike protein. Our method, which couples Bayesian variable selection with a diffusion approximation in allele frequency space, lays a foundation for identifying fitness-associated mutations under the assumption that most alleles are neutral.
... For the Dryvax vaccine, 11 sequences were downloaded from NCBI, with accession IDs JN654976 through JN654986, and their consensus sequence was used in the analysis. Similar to our previous works [32,33], an in-house bioinformatics pipeline was developed to align these nucleotide sequences and then translate them into amino acid residues according to the coding sequence positions provided along the reference sequence for VACV and the location of the gene (whether on the forward or reverse strand). MAFFT software was used to perform all multiple sequence alignments [34]. ...
Article
Full-text available
Beginning in May 2022, a novel cluster of monkeypox virus infections was detected in humans. This virus has spread rapidly to non-endemic countries, sparking global concern. Specific vaccines based on the vaccinia virus (VACV) have demonstrated high efficacy against monkeypox viruses in the past and are considered an important outbreak control measure. Viruses observed in the current outbreak carry distinct genetic variations that have the potential to affect vaccineinduced immune recognition. Here, by investigating genetic variation with respect to orthologous immunogenic vaccinia-virus proteins, we report data that anticipates immune responses induced by VACV-based vaccines, including the currently available MVA-BN and ACAM2000 vaccines, to remain highly cross-reactive against the newly observed monkeypox viruses.
... In the molecular sciences, chemical reactions and molecular rearrangements occur on timescales many orders of magnitude longer than the timescale of individual bond vibrations [7,8]. In the biomedical sciences, it may take many mutations before a virulent strain of a pathogen emerges [9], or many heart beats before a cardiac arrhythmia becomes life-threatening [10,11]. ...
Preprint
Forecasting the likelihood, timing, and nature of events is a major goal of modeling stochastic dynamical systems. When the event is rare in comparison with the timescales of simulation and/or measurement needed to resolve the elemental dynamics, accurate forecasting from direct observations becomes challenging. In such cases a more effective approach is to cast statistics of interest as solutions to Feynman-Kac equations (partial differential equations). Here, we develop an approach to solve Feynman-Kac equations by training neural networks on short-trajectory data. Unlike previous approaches, our method avoids assumptions about the underlying model and dynamics. This makes it applicable to treating complex computational models and observational data. We illustrate the advantages of our method using a low-dimensional model that facilitates visualization, and this analysis motivates an adaptive sampling strategy that allows on-the-fly identification of and addition of data to regions important for predicting the statistics of interest. Finally, we demonstrate that we can compute accurate statistics for a 75-dimensional model of sudden stratospheric warming. This system provides a stringent test bed for our method.
... The complete genome reference sequences for the VACV and MPXV-CB were downloaded from NCBI using the GenBank accession IDs NC_006998 and NC_003310 (Zaire-96-I-16), respectively. Similar to our previous works [22,23], an in-house bioinformatics pipeline was developed to align these nucleotide sequences and then translate them into amino acid residues according to the coding sequence positions provided along the reference sequence for VACV and the location of the gene (whether on the forward or reverse strand). . ...
Preprint
Full-text available
Starting May 2022, a novel cluster of monkeypox virus infections was detected in humans. This has spread rapidly to non-endemic countries and sparked global concern. Vaccinia virus vaccines have demonstrated high efficacy against monkeypox viruses in the past and are considered an important outbreak control measure. Viruses observed in the current outbreak carry distinct genetic variation that have the potential to affect vaccine-induced immune recognition. Here, by investigating genetic variation with respect to orthologous immunogenic vaccinia-virus proteins, we report data that anticipates vaccine-induced immune responses to remain highly cross-reactive against the newly observed monkeypox viruses.
Article
Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accurately estimating time-integrated quantities using Bézier interpolation. We applied our approach to two dynamical inference problems: Determining fitness parameters for evolving populations and inferring forces driving Ornstein-Uhlenbeck processes. We found that Bézier interpolation reduces the estimation bias for both dynamical inference problems. This improvement was especially noticeable for data sets with limited time resolution. Our method could be broadly applied to improve accuracy for other dynamical inference problems using finitely sampled data.
Article
Genetic sequences collected over time provide an exciting opportunity to study natural selection. In such studies, it is important to account for linkage disequilibrium to accurately measure selection and to distinguish between selection and other effects that can cause changes in allele frequencies, such as genetic hitchhiking or clonal interference. However, most high-throughput sequencing methods cannot directly measure linkage due to short read lengths. Here we develop a simple method to estimate linkage disequilibrium from time-series allele frequencies. This reconstructed linkage information can then be combined with other inference methods to infer the fitness effects of individual mutations. Simulations show that our approach reliably outperforms inference that ignores linkage disequilibrium and, with sufficient sampling, performs similarly to inference using the true linkage information. We also introduce two regularization methods derived from random matrix theory that help to preserve its performance under limited sampling effects. Overall, our method enables the use of linkage-aware inference methods even for data sets where only allele frequency time series are available.
Preprint
Full-text available
The global effort to sequence millions of SARS-CoV-2 genomes has provided an unprecedented view of viral evolution. Characterizing how selection acts on SARS-CoV-2 is critical to developing effective, long-lasting vaccines and other treatments, but the scale and complexity of genomic surveillance data make rigorous analysis distinctly challenging. To meet this challenge, we develop Bayesian Viral Allele Selection (BVAS), a principled and scalable probabilistic method for inferring the genetic determinants of differential viral fitness and the relative growth rates of viral lineages, including newly emergent lineages. After demonstrating the accuracy and efficacy of our method through simulation, we apply BVAS to 6.9 million SARS-CoV-2 genomes. We identify numerous mutations that increase fitness, including previously identified mutations in the SARS-CoV-2 Spike and Nucleocapsid proteins, as well as mutations in non-structural proteins whose contribution to fitness is less well characterized. In addition, we extend our baseline model to identify mutations whose fitness exhibits strong dependence on vaccination status. Our method, which couples Bayesian variable selection with a diffusion approximation in allele frequency space, lays a foundation for identifying fitness-associated mutations under the assumption that most alleles are neutral.
Article
There still are no effective long-term protective vaccines against viruses that continuously evolve under immune pressure such as seasonal influenza, which has caused, and can cause, devastating epidemics in the human population. To find such a broadly protective immunization strategy, it is useful to know how easily the virus can escape via mutation from specific antibody responses. This information is encoded in the fitness landscape of the viral proteins (i.e., knowledge of the viral fitness as a function of sequence). Here we present a computational method to infer the intrinsic mutational fitness landscape of influenzalike evolving antigens from yearly sequence data. We test inference performance with computer-generated sequence data that are based on stochastic simulations mimicking basic features of immune-driven viral evolution. Although the numerically simulated model does create a phylogeny based on the allowed mutations, the inference scheme does not use this information. This provides a contrast to other methods that rely on reconstruction of phylogenetic trees. Our method just needs a sufficient number of samples over multiple years. With our method, we are able to infer single as well as pairwise mutational fitness effects from the simulated sequence time series for short antigenic proteins. Our fitness inference approach may have potential future use for the design of immunization protocols by identifying intrinsically vulnerable immune target combinations on antigens that evolve under immune-driven selection. In the future, this approach may be applied to influenza and other novel viruses such as SARS-CoV-2, which evolves and, like influenza, might continue to escape the natural and vaccine-mediated immune pressures.
Article
Full-text available
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations.
Article
Full-text available
5 The impact of clonal heterogeneity on cancer progression in chronic lymphocytic leukemia (CLL) is not well understood. We hypothesized that the evolutionary dynamics of subclonal mutations contribute to the variations in disease tempo and response to therapy that characterize CLL. We therefore carried out a large-scale analysis of subclonal and clonal point mutations and copy-number alterations in 149 CLLs, detected by whole exome sequencing (WES) and SNP arrays. We utilized a novel computational approach, which integrates purity and local ploidy information, to infer the cancer cell fraction (CCF) of each mutation from WES data, and to classify mutations as clonal or subclonal. Subclonal mutations were detected in 146/149 CLLs and were enriched with putative cancer driver events (P=0.001). Furthermore, higher numbers of subclonal mutations were associated with prior anti-leukemia therapy (P=0.017). Together, these results suggest that a strong extrinsic selection pressure, such as cytotoxic treatment, promotes the expansion of fitter subclones, driving them to above our detection threshold (CCF of ∼0.10). The order of mutation acquisition may be inferred from the aggregate frequencies at which driver events are clonal or subclonal, as clonal mutations represent earlier events and subclonal later events. Of the 149 samples, we found 3 drivers (MYD88, trisomy 12, and del(13q)) that were clonal in 80–100% of samples harboring these alterations –significantly higher than other driver events (q<0.1), suggesting that they arise earlier in typical CLL development. Other drivers (e.g., ATM, TP53 and SF3B1) were often observed at subclonal frequencies, indicating that they often arise later in leukemic development. We directly assessed the evolution of somatic mutations in 18 patients, in which data from two distant timepoints were available. Clonal evolution was observed in 11 of 18 patients (10 of 12 who received intervening treatment, but only 1 of 6 without intervening treatment, P=0.012) and confirmed that subclonal mutations (e.g., del(11q), SF3B1 and TP53) shifted towards clonality over time. Indeed, expanding subclonal mutations were enriched in putative drivers (P=0.021), suggesting that these mutations not only mark genetic evolution but also provide the fitness advantage driving it. Changes in the genetic composition of CLL cells with clonal evolution were associated with network level changes in gene expression. If treatment-associated genetic evolution leads to expansion of a fitter subclone, we would predict a shorter time to relapse in these individuals. Indeed, presence of a detectable subclonal driver mutation was associated with a shorter time to retreatment in these 18 samples (P=0.04), indicating that the presence of subclonal drivers adversely impacts clinical outcome. In the analysis of the full cohort of 149 samples, we observed that CLLs with subclonal driver mutations were associated with shorter times from diagnosis to first therapy (P=0.001) and between sample collection to treatment (P<0.001). Moreover, in the subset of 67 of 149 patients who were treated after sampling, presence of subclonal driver mutations evident in the pre-treatment sample was associated with earlier retreatment (P=0.003). Regression models adjusting for CLL prognostic factors (IGHV status, prior therapy and high risk cytogenetics) demonstrated that the presence of a subclonal driver was an independent risk factor for earlier retreatment (adjusted hazard ratio of 4.61 (CI 1.59–13.34), P=0.005). Thus, the detection of subclonal drivers(indicative of an active evolutionary process) is associated with shorter duration of remission. In conclusion, the analysis of clonal heterogeneity in CLL provides a glimpse into the past, present and future of a patient's disease. Through the cross-sectional analysis of 149 samples, we derived the number and genetic composition of clonal and subclonal mutations and thus uncovered footprints of the past history of CLL. Furthermore, we inferred a temporal order of genetic events implicated in CLL. Finally, our combined longitudinal and cross-sectional analyses revealed that knowledge of subclonal mutations anticipates the genetic composition of the future relapsing leukemia as well as the rapidity with which it will occur. These data challenge us to therapeutically address not only genetic targets but also their dynamic evolutionary landscape. Disclosures No relevant conflicts of interest to declare.
Article
Full-text available
Isolation of broadly neutralizing human monoclonal antibodies (HmAbs) targeting the E2 glycoprotein of Hepatitis C virus (HCV) has sparked hope for effective vaccine development. Nonetheless, escape mutations have been reported. Ideally, a potent vaccine should elicit HmAbs that target regions of E2 that are most difficult to escape. Here, aimed at addressing this challenge, we develop a predictive in-silico evolutionary model for E2 that identifies one such region, a specific antigenic domain, making it an attractive target for a robust antibody response. Specific broadly neutralizing HmAbs that appear difficult to escape from are also identified. By providing a framework for identifying vulnerable regions of E2 and for assessing the potency of specific antibodies, our results can aid the rational design of an effective prophylactic HCV vaccine. A good vaccine should direct the immune response to virus regions that are most difficult to escape. Here, Quadeer et al. develop a predictive in-silico evolutionary model for HCV E2 which identifies one such antigenic region and identifies multiple broadly neutralizing human antibodies that appear difficult to escape from.
Article
Full-text available
Significance An effective vaccine for HIV is still not available, although recent hope has emerged through the discovery of antibodies capable of neutralizing diverse HIV strains. Nonetheless, there exist mutational pathways through which HIV can evade known broadly neutralizing antibody responses. An ideal vaccine would elicit broadly neutralizing antibodies that target parts of the virus’s spike proteins where mutations severely compromise the virus’s fitness. Here, we employ a computational approach that allows estimation of the fitness landscape (fitness as a function of sequence) of the polyprotein that comprises HIV’s spike. We validate the inferred landscape through comparisons with diverse experimental measurements. The availability of this fitness landscape will aid the rational design of immunogens for effective vaccines.
Article
Full-text available
Checkpoint blockade immunotherapies enable the host immune system to recognize and destroy tumour cells. Their clinical activity has been correlated with activated T-cell recognition of neoantigens, which are tumour-specific, mutated peptides presented on the surface of cancer cells. Here we present a fitness model for tumours based on immune interactions of neoantigens that predicts response to immunotherapy. Two main factors determine neoantigen fitness: the likelihood of neoantigen presentation by the major histocompatibility complex (MHC) and subsequent recognition by T cells. We estimate these components using the relative MHC binding affinity of each neoantigen to its wild type and a nonlinear dependence on sequence similarity of neoantigens to known antigens. To describe the evolution of a heterogeneous tumour, we evaluate its fitness as a weighted effect of dominant neoantigens in the subclones of the tumour. Our model predicts survival in anti-CTLA-4-treated patients with melanoma and anti-PD-1-treated patients with lung cancer. Importantly, low-fitness neoantigens identified by our method may be leveraged for developing novel immunotherapies. By using an immune fitness model to study immunotherapy, we reveal broad similarities between the evolution of tumours and rapidly evolving pathogens.
Article
Full-text available
The outcomes of evolution are determined by a stochastic dynamical process that governs how mutations arise and spread through a population. However, it is difficult to observe these dynamics directly over long periods and across entire genomes. Here we analyse the dynamics of molecular evolution in twelve experimental populations of Escherichia coli, using whole-genome metagenomic sequencing at five hundred-generation intervals through sixty thousand generations. Although the rate of fitness gain declines over time, molecular evolution is characterized by signatures of rapid adaptation throughout the duration of the experiment, with multiple beneficial variants simultaneously competing for dominance in each population. Interactions between ecological and evolutionary processes play an important role, as long-term quasi-stable coexistence arises spontaneously in most populations, and evolution continues within each clade. We also present evidence that the targets of natural selection change over time, as epistasis and historical contingency alter the strength of selection on different genes. Together, these results show that long-term adaptation to a constant environment can be a more complex and dynamic process than is often assumed.
Article
Full-text available
Allele frequency time series constitute a powerful resource for unravelling mechanisms of adaptation, because the temporal dimension captures important information about evolutionary forces. In particular, Evolve and Resequence (E&R), the whole-genome sequencing of replicated experimentally evolving populations, is becoming increasingly popular. Based on computer simulations several studies proposed experimental parameters to optimize the identification of the selection targets. No such recommendations are available for the underlying parameters selection strength and dominance. Here, we introduce a highly accurate method to estimate selection parameters from replicated time series data, which is fast enough to be applied on a genome scale. Using this new method, we evaluate how experimental parameters can be optimized to obtain the most reliable estimates for selection parameters. We show that the effective population size (Ne) and the number of replicates have the largest impact. Because the number of time points and sequencing coverage had only a minor effect, we suggest that time series analysis is feasible without major increase in sequencing costs. We anticipate that time series analysis will become routine in E&R studies.
Article
Full-text available
We explore the effect of different mechanisms of natural selection on the evolution of populations for one- and two-locus systems. We compare the effect of viability and fecundity selection in the context of the Wright-Fisher model with selection under the assumption of multiplicative fitness. We show that these two modes of natural selection correspond to different orderings of the processes of population regulation and natural selection in the Wright-Fisher model. We find that under the Wright-Fisher model these two different orderings can affect the distribution of trajectories of haplotype frequencies evolving with genetic recombination. However, the difference in the distribution of trajectories is only appreciable when the population is in significant linkage disequilibrium. We find that as linkage disequilibrium decays the trajectories for the two different models rapidly become indistinguishable. We discuss the significance of these findings in terms of biological examples of viability and fecundity selection, and speculate that the effect may be significant when factors such as gene migration maintain a degree of linkage disequilibrium.
Article
Full-text available
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the dynamics of genetic variation. At approximately neutral sites, mutations accumulate with a rate of 1.2 × 10⁻⁵ per site per day, in agreement with the rate measured in cell cultures. We estimated the rate from G to A to be the largest, followed by the other transitions C to T, T to C, and A to G, while transversions are less frequent. At other sites, mutations tend to reduce virus replication. We estimated the fitness cost of mutations at every site in the HIV-1 genome using a model of mutation selection balance. About half of all non-synonymous mutations have large fitness costs (>10 percent), while most synonymous mutations have costs <1 percent. The cost of synonymous mutations is especially low in most of pol where we could not detect measurable costs for the majority of synonymous mutations. In contrast, we find high costs for synonymous mutations in important RNA structures and regulatory regions. The intra-patient fitness cost estimates are consistent across multiple patients, indicating that the deleterious part of the fitness landscape is universal and explains a large fraction of global HIV-1 group M diversity.
Article
To a large extent, cancer conforms to evolutionary rules defined by the rates at which clones mutate, adapt and grow. Next-generation sequencing has provided a snapshot of the genetic landscape of most cancer types, and cancer genomics approaches are driving new insights into cancer evolutionary patterns in time and space. In contrast to species evolution, cancer is a particular case owing to the vast size of tumour cell populations, chromosomal instability and its potential for phenotypic plasticity. Nevertheless, an evolutionary framework is a powerful aid to understand cancer progression and therapy failure. Indeed, such a framework could be applied to predict individual tumour behaviour and support treatment strategies.