- A preview of this full-text is provided by Springer Nature.
- Learn more
Preview content only
Content available from Nature Biotechnology
This content is subject to copyright. Terms and conditions apply.
Articles
https://doi.org/10.1038/s41587-020-0737-3
1Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China. 2Institute for Advanced Study,
Hong Kong University of Science and Technology, Hong Kong, China. 3The Kirby Institute, University of New South Wales, Sydney, New South Wales,
Australia. 4School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia. 5Department of Chemical and Biological
Engineering, Hong Kong University of Science and Technology, Hong Kong, China. 6Department of Physics and Astronomy, University of California,
Riverside, Riverside, CA, USA. 7These authors contributed equally: Muhammad Saqib Sohail, Raymond H. Y. Louie. ✉e-mail: m.mckay@ust.hk;
john.barton@ucr.edu
Evolving populations exhibit complex dynamics. Cancers1–6 and
pathogens, such as HIV-1 (refs. 7–9) and influenza10,11, gener-
ate multiple beneficial mutations that increase fitness or allow
them to escape immunity. Subpopulations with different beneficial
mutations then compete with one another for dominance, referred
to as clonal interference, resulting in the loss of some mutations
that increase fitness12. Neutral or deleterious mutations can also
hitchhike to high frequencies if they occur on advantageous genetic
backgrounds13. Experiments have demonstrated that these features
of genetic linkage are pervasive in nature14–16.
Linkage makes distinguishing the fitness effects of individual
mutations challenging because their dynamics are contingent on the
genetic background on which they appear. Lineage tracking experi-
ments can be used to identify beneficial mutations17, but they can-
not readily be applied to evolution in natural conditions, such as in
cancer or in natural infection by viruses or bacteria. Most existing
computational methods to infer fitness from population dynamics
ignore linkage entirely18–25. Ignoring linkage could lead to errors
when genetic hitchhiking or clonal interference are present, which
frequently occur in nature. A few methods have attempted to incor-
porate linkage information, but these methods are exceptionally
computationally intensive and may scale poorly to populations with
many polymorphic variants26–28.
Here we describe a method to infer selection from evolution-
ary histories, captured by genetic time series data, and demon-
strate its ability to resolve linkage effects. Simulations show that
our approach, which we call marginal path likelihood (MPL)29,30, is
faster and more accurate than current state-of-the-art methods for
selection inference. As an example application, we use our method
to reveal patterns of selection in intrahost HIV-1 evolution using
14 patient data sets. The genetic diversity exhibited in these data
sets makes them exceptionally challenging to analyze using exist-
ing linkage-aware methods. With MPL, we observe strong selection
for escape from CD8+ T cell responses, which is partially masked
by linkage due to extensive clonal interference between competing
escape mutants. We further quantify the influence of linkage on
inferred selection across the viral genome. Our results show that
most variants have negligible effects on inferred selection at other
sites, but a small minority of highly influential variants have dra-
matic and far-reaching effects. These highly influential variants are
often ones that sweep rapidly through the population. We also find
modest selection for escape from antibody responses, even in an
individual who develops broadly neutralizing antibodies (bnAbs).
Collectively, our results argue for the importance of accounting for
genetic linkage when inferring selection, while providing a practical
method for achieving this for large data sets.
Results
Evolutionary model incorporating linkage. The principle idea of
our inference approach is to efficiently quantify the probability of an
evolutionary ‘path,’ defined by the set of all mutant allele frequencies
at each time, using a path integral method derived from statistical
physics (Methods). Path integrals for related evolutionary models
have been derived under different assumptions in past work31–33,
but they have not been widely applied for inference. This method
allows us to disentangle the effects of individual mutations from the
sequence background, that is, genetic linkage, without making the
likelihood function intractable. In fact, the path integral can be ana-
lytically inverted to find the parameters that are most likely to have
generated a path.
To define the path integral, we consider Wright–Fisher (WF)
population dynamics with selection, mutation and recombination,
in the diffusion limit34. Under an additive fitness model, the fitness
of any individual is a sum of selection coefficients, si, which quan-
tify the selective advantage of mutant allele i relative to wild-type
(WT). The probability of an evolutionary path is then a product of
MPL resolves genetic linkage in fitness inference
from complex evolutionary histories
Muhammad Saqib Sohail1,7, Raymond H. Y. Louie1,2,3,4,7, Matthew R. McKay 1,5 ✉ and
John P. Barton 6 ✉
Genetic linkage causes the fate of new mutations in a population to be contingent on the genetic background on which they
appear. This makes it challenging to identify how individual mutations affect fitness. To overcome this challenge, we developed
marginal path likelihood (MPL), a method to infer selection from evolutionary histories that resolves genetic linkage. Validation
on real and simulated data sets shows that MPL is fast and accurate, outperforming existing inference approaches. We found
that resolving linkage is crucial for accurately quantifying selection in complex evolving populations, which we demonstrate
through a quantitative analysis of intrahost HIV-1 evolution using multiple patient data sets. Linkage effects generated by vari-
ants that sweep rapidly through the population are particularly strong, extending far across the genome. Taken together, our
results argue for the importance of resolving linkage in studies of natural selection.
NATURE BIOTECHNOLOGY | VOL 39 | APRIL 2021 | 472–479 | www.nature.com/naturebiotechnology
472
Content courtesy of Springer Nature, terms of use apply. Rights reserved