Effects of GC content and mutational pressure on the lengths of exons and coding sequences.

Department of Biology, University of Ottawa, Ottawa, Ontario, Canada KIN 6N5.
Journal of Molecular Evolution (Impact Factor: 2.15). 03/2003; 56(3):362-70. DOI: 10.1007/s00239-002-2406-1
Source: PubMed

ABSTRACT It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025).

  • [Show abstract] [Hide abstract]
    ABSTRACT: The mechanisms that dictate whether a particular mRNA is exported from the nucleus are still poorly defined. However, it has become increasingly clear that these mechanisms act to promote the expression of protein-coding mRNAs over the high levels of spurious transcription that is endemic to most eukaryotic genomes. For example, mRNA processing events that are associated with protein-coding transcripts, such as splicing, act as mRNA identity elements that promote nuclear export of these transcripts. Six years ago, we made the serendipitous discovery that regions within the open reading frame of an mRNA that encode short secretory or mitochondrial-targeting peptides can also act as an mRNA identity element which promotes an alternative mRNA nuclear export (ALREX) pathway. These regions are enriched in protein coding genes and have particular features that can be used to identify this class of protein-coding mRNA. In this article we review our current knowledge of how mRNA export evolved in response to particular events that occurred at the base of the eukaryotic tree. We will then focus on our current understanding of ALREX and compare its features to splicing-dependent export, the main mRNA export pathway in metazoans. WIREs RNA 2013. doi: 10.1002/wrna.1176 For further resources related to this article, please visit the WIREs website.
    WIREs RNA 05/2013; · 4.19 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and--apart from increasing protein length--playing no role in the specific function or structure of a protein (the conventional phenotype). We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases). Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species' GC% increases, and decreases as species' AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role. In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.
    Applied Bioinformatics 02/2005; 4(2):117-30.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions.
    Genome 01/2011; 54(1):10-8. · 1.65 Impact Factor

Full-text (2 Sources)

Available from
May 22, 2014