[show abstract][hide abstract] ABSTRACT: The genetic code is redundant with most amino acids using multiple codons. In many organisms, codon usage is biased toward particular codons. Understanding the adaptive and nonadaptive forces driving the evolution of codon usage bias (CUB) has been an area of intense focus and debate in the fields of molecular and evolutionary biology. However, their relative importance in shaping genomic patterns of CUB remains unsolved. Using a nested model of protein translation and population genetics, we show that observed gene level variation of CUB in Saccharomyces cerevisiae can be explained almost entirely by selection for efficient ribosomal usage, genetic drift, and biased mutation. The correlation between observed codon counts within individual genes and our model predictions is 0.96. Although a variety of factors shape patterns of CUB at the level of individual sites within genes, our results suggest that selection for efficient ribosome usage is a central force in shaping codon usage at the genomic scale. In addition, our model allows direct estimation of codon-specific mutation rates and elongation times and can be readily applied to any organism with high-throughput expression datasets. More generally, we have developed a natural framework for integrating models of molecular processes to population genetics models to quantitatively estimate parameters underlying fundamental biological processes, such a protein translation.
Proceedings of the National Academy of Sciences 06/2011; 108(25):10231-6. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Fucosyltransferase-IV and -VII double knockout (FtDKO) mice reveal profound impairment in T cell trafficking to lymph nodes (LNs) due to an inability to synthesize selectin ligands. We observed an increase in the proportion of memory/effector (CD44(high)) T cells in LNs of FtDKO mice. We infected FtDKO mice with lymphocytic choriomeningitis virus to generate and track Ag-specific CD44(high)CD8 T cells in secondary lymphoid organs. Although frequencies were similar, total Ag-specific effector CD44(high)CD8 T cells were significantly reduced in LNs, but not blood, of FtDKO mice at day 8. In contrast, frequencies of Ag-specific memory CD44(high)CD8 T cells were up to 8-fold higher in LNs of FtDKO mice at day 60. Because wild-type mice treated with anti-CD62L treatment also showed increased frequencies of CD44(high) T cells in LNs, we hypothesized that memory T cells were preferentially retained in, or preferentially migrated to, FtDKO LNs. We analyzed T cell entry and egress in LNs using adoptive transfer of bone fide naive or memory T cells. Memory T cells were not retained longer in LNs compared with naive T cells; however, T cell exit slowed significantly as T cell numbers declined. Memory T cells were profoundly impaired in entering LNs of FtDKO mice; however, memory T cells exhibited greater homeostatic proliferation in FtDKO mice. These results suggest that memory T cells are enriched in LNs with T cell deficits by several mechanisms, including longer T cell retention and increased homeostatic proliferation.
The Journal of Immunology 10/2010; 185(10):5751-61. · 5.52 Impact Factor
[show abstract][hide abstract] ABSTRACT: Despite the fact that tRNA abundances are thought to play a major role in determining translation error rates, their distribution across the genetic code and the resulting implications have received little attention. In general, studies of codon usage bias (CUB) assume that codons with higher tRNA abundance have lower missense error rates. Using a model of protein translation based on tRNA competition and intra-ribosomal kinetics, we show that this assumption can be violated when tRNA abundances are positively correlated across the genetic code. Examining the distribution of tRNA abundances across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. This work challenges one of the fundamental assumptions made in over 30 years of research on CUB that codons with higher tRNA abundances have lower missense error rates and that missense errors are the primary selective force responsible for CUB.
[show abstract][hide abstract] ABSTRACT: Upstream open reading frames (uORFs) are protein coding elements in the 5' leader of messenger RNAs. uORFs generally inhibit translation of the main ORF because ribosomes that perform translation elongation suffer either permanent or conditional loss of reinitiation competence. After conditional loss, reinitiation competence may be regained by, at the minimum, reacquisition of a fresh methionyl-tRNA. The conserved h subunit of Arabidopsis eukaryotic initiation factor 3 (eIF3) mitigates the inhibitory effects of certain uORFs. Here, we define more precisely how this occurs, by combining gene expression data from mutated 5' leaders of Arabidopsis AtbZip11 (At4g34590) and yeast GCN4 with a computational model of translation initiation in wild-type and eif3h mutant plants. Of the four phylogenetically conserved uORFs in AtbZip11, three are inhibitory to translation, while one is anti-inhibitory. The mutation in eIF3h has no major effect on uORF start codon recognition. Instead, eIF3h supports efficient reinitiation after uORF translation. Modeling suggested that the permanent loss of reinitiation competence during uORF translation occurs at a faster rate in the mutant than in the wild type. Thus, eIF3h ensures that a fraction of uORF-translating ribosomes retain their competence to resume scanning. Experiments using the yeast GCN4 leader provided no evidence that eIF3h fosters tRNA reaquisition. Together, these results attribute a specific molecular function in translation initiation to an individual eIF3 subunit in a multicellular eukaryote.
[show abstract][hide abstract] ABSTRACT: A large number of studies have been dedicated to identify the structural and sequence based features of RNA thermometers, mRNAs that regulate their translation initiation rate with temperature. It has been shown that the melting of the ribosome-binding site (RBS) plays a prominent role in this thermosensing process. However, little is known as to how widespread this melting phenomenon is as earlier studies on the subject have worked with a small sample of known RNA thermometers. We have developed a novel method of studying the melting of RNAs with temperature by computationally sampling the distribution of the RNA structures at various temperatures using the RNA folding software Vienna. In this study, we compared the thermosensing property of 100 randomly selected mRNAs and three well known thermometers--rpoH, ibpA and agsA sequences from E. coli. We also compared the rpoH sequences from 81 mesophilic proteobacteria. Although both rpoH and ibpA show a higher rate of melting at their RBS compared with the mean of non-thermometers, contrary to our expectations these higher rates are not significant. Surprisingly, we also do not find any significant differences between rpoH thermometers from other gamma-proteobacteria and E. coli non-thermometers.
PLoS ONE 01/2010; 5(7):e11308. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power.
Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context.
Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.
[show abstract][hide abstract] ABSTRACT: Codon usage bias (CUB) has been documented across a wide range of taxa and is the subject of numerous studies. While most explanations of CUB invoke some type of natural selection, most measures of CUB adaptation are heuristically defined. In contrast, we present a novel and mechanistic method for defining and contextualizing CUB adaptation to reduce the cost of nonsense errors during protein translation. Using a model of protein translation, we develop a general approach for measuring the protein production cost in the face of nonsense errors of a given allele as well as the mean and variance of these costs across its coding synonyms. We then use these results to define the nonsense error adaptation index (NAI) of the allele or a contiguous subset thereof. Conceptually, the NAI value of an allele is a relative measure of its elevation on a specific and well-defined adaptive landscape. To illustrate its utility, we calculate NAI values for the entire coding sequence and across a set of nonoverlapping windows for each gene in the Saccharomyces cerevisiae S288c genome. Our results provide clear evidence of adaptation to reduce the cost of nonsense errors and increasing adaptation with codon position and expression. The magnitude and nature of this adaptation are also largely consistent with simulation results in which nonsense errors are the only selective force driving CUB evolution. Because NAI is derived from mechanistic models, it is both easier to interpret and more amenable to future refinement than other commonly used measures of codon bias. Further, our approach can also be used as a starting point for developing other mechanistically derived measures of adaptation such as for translational accuracy.
[show abstract][hide abstract] ABSTRACT: Infectious pathogens compete and are subject to natural selection at multiple levels. For example, viral strains compete for access to host resources within an infected host and, at the same time, compete for access to susceptible hosts within the host population. Here we propose a novel approach to study the interplay between within- and between-host competition. This approach allows for a single host to be infected by and transmit two strains of the same pathogen. We do this by nesting a model for the host-pathogen dynamics within each infected host into an epidemiological model. The nesting of models allows the between-host infectivity and mortality rates suffered by infected hosts to be functions of the disease progression at the within-host level. We present a general method for computing the basic reproduction ratio of a pathogen in such a model. We then illustrate our method using a basic model for the within-host dynamics of viral infections, embedded within the simplest susceptible-infected (SI) epidemiological model. Within this nested framework, we show that the virion production rate at the level of the cell-virus interaction leads, via within-host competition, to the presence or absence of between-host level competitive exclusion. In particular, we find that in the absence of mutation the strain that maximizes between-host fitness can outcompete all other strains. In the presence of mutation we observe a complex invasion landscape showing the possibility of coexistence. Although we emphasize the application to human viral diseases, we expect this methodology to be applicable to be many host-parasite systems.
Theoretical Population Biology 01/2008; 72(4):576-91. · 1.24 Impact Factor
[show abstract][hide abstract] ABSTRACT: This paper explores Bayesian inference for a biased sampling model in situations where the population of interest cannot be sampled directly, but rather through an indirect and inherently biased method. Observations are viewed as being the result of a multinomial sampling process from a tagged population which is, in turn, a biased sample from the original population of interest. This paper presents several Gibbs Sampling techniques to estimate the joint posterior distribution of the original population based on the observed counts of the tagged population. These algorithms efficiently sample from the joint posterior distribution of a very large multinomial parameter vector. Samples from this method can be used to generate both joint and marginal posterior inferences. We also present an iterative optimization procedure based upon the conditional distributions of the Gibbs Sampler which directly computes the mode of the posterior distribution. To illustrate our approach, we apply it to a tagged population of messanger RNAs (mRNA) generated using a common high-throughput technique, Serial Analysis of Gene Expression (SAGE). Inferences for the mRNA expression levels in the yeast Saccharomyces cerevisiae are reported.
[show abstract][hide abstract] ABSTRACT: Genes are often biased in their codon usage. The degree of bias displayed often changes with expression level and intragenic position. Numerous indices, such as the codon adaptation index, have been developed to measure this bias. Although the expression level of a gene and index values are correlated, the heuristic nature of these metrics limits their ability to explain this relationship. As an alternative approach, this study integrates mechanistic models of cellular and population processes in a nested manner to develop a stochastic evolutionary model of a protein's production rate (SEMPPR). SEMPPR assumes that the evolution of codon bias is driven by selection to reduce the cost of nonsense errors and that this selection is counteracted by mutation and drift. Through the application of Bayes' theorem, SEMPPR generates a posterior probability distribution for the protein production rate of a given gene. Conceptually, SEMPPR's predictions are based on the degree of adaptation to reduce the cost of nonsense errors observed in the codon usage pattern of the gene. As an illustration, SEMPPR was parameterized using the Saccharomyces cerevisiae genome and its predictions tested using available empirical data. The results indicate that SEMPPR's predictions are as reliable index based ones. In addition, SEMPPR's output is more easily interpreted and its predictions could be improved through refinements of the models upon which it is built.
Molecular Biology and Evolution 12/2007; 24(11):2362-72. · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Virus evolution during infection of a single individual is a well-known feature of disease progression in chronic viral diseases. However, the simplest models of virus competition for host resources show the existence of a single dominant strain that grows most rapidly during the initial period of infection and competitively excludes all other virus strains. Here, we examine the dynamics of strain replacement in a simple model that includes a convex trade-off between rapid virus reproduction and long-term host cell survival. Strains are structured according to their within-cell replication rate. Over the course of infection, we find a progression in the dominant strain from fast- to moderately-replicating virus strains featuring distinct jumps in the replication rate of the dominant strain over time. We completely analyze the model and provide estimates for the replication rate of the initial dominant strain and its successors. Our model lays the groundwork for more detailed models of HIV selection and mutation. We outline future directions and application of related models to other biological situations.
Bulletin of Mathematical Biology 11/2007; 69(7):2361-85. · 2.02 Impact Factor
[show abstract][hide abstract] ABSTRACT: There are many biological steps between viral infection of CD4(+) T cells and the production of HIV-1 virions. Here we incorporate an eclipse phase, representing the stage in which infected T cells have not started to produce new virus, into a simple HIV-1 model. Model calculations suggest that the quicker infected T cells progress from the eclipse stage to the productively infected stage, the more likely that a viral strain will persist. Long-term treatment effectiveness of antiretroviral drugs is often hindered by the frequent emergence of drug resistant virus during therapy. We link drug resistance to both the rate of progression of the eclipse phase and the rate of viral production of the resistant strain, and explore how the resistant strain could evolve to maximize its within-host viral fitness. We obtained the optimal progression rate and the optimal viral production rate, which maximize the fitness of a drug resistant strain in the presence of drugs. We show that the window of opportunity for invasion of drug resistant strains is widened for a higher level of drug efficacy provided that the treatment is not potent enough to eradicate both the sensitive and resistant virus.
Journal of Theoretical Biology 09/2007; 247(4):804-18. · 2.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.
Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA.
With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.
[show abstract][hide abstract] ABSTRACT: Filamentous fungi are ubiquitous and ecologically important organisms with rich and varied life histories, however, there is no consensus on how to identify or measure their fitness. In the first part of this study we adapt a general epidemiological model to identify the appropriate fitness metric for a saprophytic filamentous fungus. We find that fungal fitness is inversely proportional to the equilibrium density of uncolonized fungal resource patches which, in turn, is a function of the expected spore production of a fungus. In the second part of this study we use a simple life history model of the same fungus within a resource patch to show that a bang-bang resource allocation strategy maximizes the expected spore production, a critical fitness component. Unlike bang-bang strategies identified in other life-history studies, we find that the optimal allocation strategy for saprophytes does not entail the use of all of the resources within a patch.
[show abstract][hide abstract] ABSTRACT: We present and analyse a model of protein translation at the scale of an individual messenger RNA (mRNA) transcript. The model we develop is unique in that it incorporates the phenomena of ribosome recycling and nonsense errors. The model conceptualizes translation as a probabilistic wave of ribosome occupancy traveling down a heterogeneous medium, the mRNA transcript. Our results show that the heterogeneity of the codon translation rates along the mRNA results in short-scale spikes and dips in the wave. Nonsense errors attenuate this wave on a longer scale while ribosome recycling reinforces it. We find that the combination of nonsense errors and codon usage bias can have a large effect on the probability that a ribosome will completely translate a transcript. We also elucidate how these forces interact with ribosome recycling to determine the overall translation rate of an mRNA transcript. We derive a simple cost function for nonsense errors using our model and apply this function to the yeast (Saccharomyces cervisiae) genome. Using this function we are able to detect position dependent selection on codon bias which correlates with gene expression levels as predicted a priori. These results indirectly validate our underlying model assumptions and confirm that nonsense errors can play an important role in shaping codon usage bias.
Journal of Theoretical Biology 05/2006; 239(4):417-34. · 2.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Natural selection acts on virus populations at two distinct but interrelated levels: within individual hosts and between them. Studies of the evolution of virulence typically focus on selection acting at the epidemiological or between-host level and demonstrate the importance of trade-offs between disease transmission and virulence rates. Within-host studies reach similar conclusions regarding trade-offs between transmission and virulence at the level of individual cells. Studies which examine selection at both scales assume that between- and within-host selection are necessarily in conflict. We explicitly examine these ideas and assumptions using a model of within-host viral dynamics nested within a model of between-host disease dynamics. Our approach allows us to evaluate the direction of selection at the within- and between-host levels and identify situations leading to conflict and accord between the two levels of selection.
Theoretical Population Biology 04/2006; 69(2):145-53. · 1.24 Impact Factor
[show abstract][hide abstract] ABSTRACT: Mathematical models of HIV-1 infection can help interpret drug treatment experiments and improve our understanding of the interplay between HIV-1 and the immune system. We develop and analyze an age- structured model of HIV-1 infection that allows for variations in the death rate of productively infected T cells and the production rate of viral particles as a function of the length of time a T cell has been infected. We show that this model is a generalization of the standard differential equation and of delay models previously used to describe HIV-1 infection, and provides a means for exploring fundamental issues of viral production and death. We show that the model has uninfected and infected steady states, linked by a transcritical bifurcation. We perform a local stability analysis of the nontrivial equilibrium solution and provide a general stability condition for models with age structure. We then use numerical methods to study solutions of our model focusing on the analysis of primary HIV infection. We show that the time to reach peak viral levels in the blood depends not only on initial conditions but also on the way in which viral production ramps up. If viral production ramps up slowly, we find that the time to peak viral load is delayed compared to results obtained using the standard (constant viral production) model of HIV infection. We find that data on viral load changing over time is insufficient to identify the functions specifying the dependence of the viral production rate or infected cell death rate on infected cell age. These functions must be determined through new quantitative experiments.