Effects of GC content and mutational pressure on the lengths of exons and coding sequences

Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
Journal of Molecular Evolution (Impact Factor: 1.86). 03/2003; 56(3):362-70. DOI: 10.1007/s00239-002-2406-1
Source: PubMed

ABSTRACT It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025).

Download full-text


Available from: Xuhua Xia, Jun 16, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Previous studies have argued that, given the AT-rich nature of stop codons, the length and CG% of coding sequences (CDSs) should be positively correlated. This prediction is generally supported empirically by prokaryotic genomes. However, the correlation is weak for a number of species, with 4 species showing a negative correlation. Here we formulate a more general hypothesis incorporating selection against cytosine (C) usage to explain the lack of strong positive correlation between the length and GC% of CDSs. Two factors contribute to the selection against C usage in long CDSs. First, C is the least abundant nucleotide in the cell, and a long CDS should have fewer Cs to increase transcription efficiency. Second, C is prone to mutation to U/T and selection for increased reliability should reduce C usage in long CDSs. Empirical data from prokaryotic genomes lend strong support for this new hypothesis.
    Molecular Biology and Evolution 08/2006; 23(7):1450-4. DOI:10.1093/molbev/msl012 · 14.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions.Des études ont montré que la taille des introns et des exons ainsi que la distance intergénique seraient corrélées avec le niveau et l'étendue de l'expression génique. Les études antérieures sur ce sujet chez les plantes et les animaux se sont avérées contradictoires. Dans cette étude, des données de séquence de seconde génération, lesquelles fournissent des données transcriptomiques plus sensibles que celles obtenues à l'aide des techniques antérieures, ont été produites et analysées chez 14 tissus. Les résultats des auteurs révèlent une nouvelle dichotomie. Chez les gènes faiblement exprimés, un accroissement de l'étendue de l'expression était corrélé avec un accroissement de la taille des transcrits attribuable à une augmentation du nombre d'exons et d'introns. Aucun changement quant à la taille des introns ou des exons n'a été noté. Inversement, les gènes exprimés de manière intermédiaire ou forte présentaient des transcrits dont la taille diminuait au fur et à mesure que s'accroissait l'étendue de leur expression. Cette réduction était due à une réduction de la taille des exons, sans qu'il y ait eu réduction du nombre de ceux-ci. En tirant avantage de la connaissance de l'espace génique chez le soya, les auteurs ont examiné le positionnement des gènes et ont observé un groupement significatif des gènes qui présentent un niveau d'expression semblable. L'identification de corrélations entre les paramètres physiques de gènes individuels pourrait permettre de mieux comprendre la régulation génique découlant de la composition nucléotidique, laquelle pourrait aider à discerner le rôle des régions non-codantes.
    Genome 12/2010; 54(1):10-18. · 1.56 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Surprisingly, in several multi-cellular eukaryotes optimal codon use correlates negatively with gene length. This contrasts with the expectation under selection for translational accuracy. While suggested explanations focus on variation in strength and efficiency of translational selection, it has rarely been noticed that the negative correlation is reported only in organisms whose optimal codons are biased towards codons that end with G or C (-GC). This raises the question whether forces that affect base composition--such as GC-biased gene conversion--contribute to the negative correlation between optimal codon use and gene length. Yeast is a good organism to study this as equal numbers of optimal codons end in -GC and -AT and one may hence compare frequencies of optimal GC- with optimal AT-ending codons to disentangle the forces. Results of this study demonstrate in yeast frequencies of GC-ending (optimal AND non-optimal) codons decrease with gene length and increase with recombination. A decrease of GC-ending codons along genes contributes to the negative correlation with gene length. Correlations with recombination and gene expression differentiate between GC-ending and optimal codons, and also substitution patterns support effects of GC-biased gene conversion. While the general effect of GC-biased gene conversion is well known, the negative correlation of optimal codon use with gene length has not been considered in this context before. Initiation of gene conversion events in promoter regions and the presence of a gene conversion gradient most likely explain the observed decrease of GC-ending codons with gene length and gene position.
    BMC Evolutionary Biology 04/2011; 11:93. DOI:10.1186/1471-2148-11-93 · 3.41 Impact Factor