Modeling HIV-1 Drug Resistance as Episodic Directional Selection

Imperial College London, United Kingdom
PLoS Computational Biology (Impact Factor: 4.83). 05/2012; 8(5):e1002507. DOI: 10.1371/journal.pcbi.1002507
Source: PubMed

ABSTRACT Author Summary
When exposed to treatment, HIV-1 and other rapidly evolving viruses have the capacity to acquire drug resistance mutations (DRAMs), which limit the efficacy of antivirals. There are a number of experimentally well characterized HIV-1 DRAMs, but many mutations whose roles are not fully understood have also been reported. In this manuscript we construct evolutionary models that identify the locations and targets of mutations conferring resistance to antiretrovirals from viral sequences sampled from treated and untreated individuals. While the evolution of drug resistance is a classic example of natural selection, existing analyses fail to detect the majority of DRAMs. We show that, in order to identify resistance mutations from sequence data, it is necessary to recognize that in this case natural selection is both episodic (it only operates when the virus is exposed to the drugs) and directional (only mutations to a particular amino-acid confer resistance while allowing the virus to continue replicating). The new class of models that allow for the episodic and directional nature of adaptive evolution performs very well at recovering known DRAMs, can be useful at identifying unknown resistance-associated mutations, and is generally applicable to a variety of biological scenarios where similar selective forces are at play.

Download full-text


Available from: Christopher Seebregts, Aug 10, 2015
1 Follower
  • Source
    • "Methods that quantify the strength and type of natural selection by estimating the ratio of nonsynonymous to synonymous substitution (!) using phylogenetic codon-substitution models, pioneered by Muse and Gaut (1994) and Goldman and Yang (1994), have proven particularly popular and useful. In the context of infectious diseases (see Aguileta et al. 2009 for a review), these models have been used successfully to study transmission (Jonges et al. 2011), zoonosis (Demogines et al. 2012), the evolution of drug resistance (Stanhope et al. 2008; Hill et al. 2009; Murrell, De Oliveira, et al. 2012), escape from host immune response (Frost et al. 2005; Cento et al. 2013), the development of pathogenicity and virulence (Brault et al. 2007), emergence of new strains (Schuh et al. 2014), and evolutionary arms-races between viruses and host antiviral defenses (Duggal et al. 2011; Daugherty et al. 2014). A key feature of natural selection is its variability. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of "branch-site" models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, Adaptive Branch-Site Random Effects Likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for "branch-site" models. An aBSREL analysis of 8893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, i.e. current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail:
    Molecular Biology and Evolution 02/2015; 32(5). DOI:10.1093/molbev/msv022 · 14.31 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Phylogeny-based modeling of heterogeneity across the positions of multiple-sequence-alignments has generally been approached from two main perspectives. The first treats site-specificities as random variables drawn from a statistical law, and the likelihood function takes the form of an integral over this law. The second assigns distinct variables to each position, and, in a maximum-likelihood context, adjusts these variables, along with global parameters, to optimize a joint likelihood function. Here, it is emphasized that while the first approach directly enjoys the statistical guaranties of traditional likelihood theory, the latter does not, and should be approached with particular caution when the site-specific variables are high-dimensional. Using a phylogeny-based mutation-selection framework, it is shown that the difference in interpretation of site-specific variables explains the incongruities in recent studies regarding distributions of selection coefficients.
    Genetics 12/2012; 193. DOI:10.1534/genetics.112.145722 · 4.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection - an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: we illustrate this on a large influenza haemagglutinin dataset (3142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (, as well as on the Datamonkey web server (
    Molecular Biology and Evolution 02/2013; 30((5)). DOI:10.1093/molbev/mst030 · 14.31 Impact Factor
Show more