Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function

Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
Molecular Biology and Evolution (Impact Factor: 9.11). 03/2012; 29(9):2265-83. DOI: 10.1093/molbev/mss100
Source: PubMed


Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of “novel” and “putative” protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets.
We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts.
Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.

Download full-text


Available from: Jose Manuel Rodríguez
  • Source
    • "Alternative splicing of transcripts has the potential to expand the repertoire of proteins. Recent studies have estimated that all multi-exonic human genes are able to produce at least two alternatively spliced mRNA transcripts by alternative splicing, generating different proteins isoforms with altered structures and biological functions [32]. In trypanosomatids, mature mRNAs are generated after two processing events: trans-splicing to add the spliced leader (SL) sequence to the 5′ end of transcripts and subsequent polyadenylation [17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Trypanosoma cruzi, the causative agent of Chagas disease, is extremely resistant to ionizing radiation, enduring up to 1.5 kGy of gamma rays. Ionizing radiation can damage the DNA molecule both directly, resulting in double-strand breaks, and indirectly, as a consequence of reactive oxygen species production. After a dose of 500 Gy of gamma rays, the parasite genome is fragmented, but the chromosomal bands are restored within 48 hours. Under such conditions, cell growth arrests for up to 120 hours and the parasites resume normal growth after this period. To better understand the parasite response to ionizing radiation, we analyzed the proteome of irradiated (4, 24, and 96 hours after irradiation) and non-irradiated T. cruzi using two-dimensional differential gel electrophoresis followed by mass spectrometry for protein identification. A total of 543 spots were found to be differentially expressed, from which 215 were identified. These identified protein spots represent different isoforms of only 53 proteins. We observed a tendency for overexpression of proteins with molecular weights below predicted, indicating that these may be processed, yielding shorter polypeptides. The presence of shorter protein isoforms after irradiation suggests the occurrence of post-translational modifications and/or processing in response to gamma radiation stress. Our results also indicate that active translation is essential for the recovery of parasites from ionizing radiation damage. This study therefore reveals the peculiar response of T. cruzi to ionizing radiation, raising questions about how this organism can change its protein expression to survive such a harmful stress.
    Full-text · Article · May 2014 · PLoS ONE
  • Source
    • "On the same line, Hon et al. [92] have used RNA-seq in E.histolytica to show that a majority of alternative splicing and polyadenylation isoforms are the result of stochastic processes and therefore unlikely to play a functional role. Reinforcing these results, recent proteomics studies [105] show that a fraction of transcripts do not reach the protein level, and for this reason are less likely to be functional. "
    [Show abstract] [Hide abstract]
    ABSTRACT: At present we know that phenotypic differences between organisms arise from a variety of sources, like protein sequence divergence, regulatory sequence divergence, alternative splicing, etc. However, we do not have yet a complete view of how these sources are related. Here we address this problem, studying the relationship between protein divergence and the ability of genes to express multiple isoforms. We used three genome-wide datasets of human-mouse orthologs to study the relationship between isoform multiplicity co-occurrence between orthologs (the fact that two orthologs have more than one isoform) and protein divergence. In all cases our results showed that there was a monotonic dependence between these two properties. We could explain this relationship in terms of a more fundamental one, between exon number of the largest isoform and protein divergence. We found that this last relationship was present, although with variations, in other species (chimpanzee, cow, rat, chicken, zebrafish and fruit fly). In summary, we have identified a relationship between protein divergence and isoform multiplicity co-occurrence and explained its origin in terms of a simple gene-level property. Finally, we discuss the biological implications of these findings for our understanding of inter-species phenotypic differences.
    Full-text · Article · Aug 2013 · PLoS ONE
  • Source
    • "The evolution of alternative transcripts in mammals has been studied in a number of gene families, showing that some are conserved across species and others are truly species-specific [26], [44], [45], [46], [47], [48], [49]. Two recent papers conclude that global alternative splicing patterns are species-specific [50], [51], and that the changes are often associated with the availability of splicing factor binding sites in introns. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Kallikreins are secreted serine proteases with important roles in human physiology. Human plasma kallikrein, encoded by the KLKB1 gene on locus 4q34-35, functions in the blood coagulation pathway, and in regulating blood pressure. The human tissue kallikrein and kallikrein-related peptidases (KLKs) have diverse expression patterns and physiological roles, including cancer-related processes such as cell growth regulation, angiogenesis, invasion, and metastasis. Prostate-specific antigen (PSA), the product of the KLK3 gene, is the most widely used biomarker in clinical practice today. A total of 15 KLKs are encoded by the largest contiguous cluster of protease genes in the human genome (19q13.3-13.4), which makes them ideal for evolutionary analysis of gene duplication events. Previous studies on the evolution of KLKs have traced mammalian homologs as well as a probable early origin of the family in aves, amphibia and reptilia. The aim of this study was to address the evolutionary and functional relationships between tissue KLKs and plasma kallikrein, and to examine the evolution of alternative splicing isoforms. Sequences of plasma and tissue kallikreins and their alternative transcripts were collected from the NCBI and Ensembl databases, and comprehensive phylogenetic analysis was performed by Bayesian as well as maximum likelihood methods. Plasma and tissue kallikreins exhibit high sequence similarity in the trypsin domain (>50%). Phylogenetic analysis indicates an early divergence of KLKB1, which groups closely with plasminogen, chymotrypsin, and complement factor D (CFD), in a monophyletic group distinct from trypsin and the tissue KLKs. Reconstruction of the earliest events leading to the diversification of the tissue KLKs is not well resolved, indicating rapid expansion in mammals. Alternative transcripts of each KLK gene show species-specific divergence, while examination of sequence conservation indicates that many annotated human KLK isoforms are missing the catalytic triad that is crucial for protease activity.
    Full-text · Article · Jul 2013 · PLoS ONE
Show more