This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes, it contains many cancer-associated genes, including BRCA1, ERBB2 (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell line models of hormone receptor negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell-lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (over-expression)/ ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (over-expression)/ER-/PR-; inflammatory breast cancer) and SUM149 (ERBB2 (low expression) ER-/PR -; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared to the other two cancer cell lines and to normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell- to-cell adhesion, integrin and ERK1/ERK2 signaling, and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 over-expressed cell line models; and an association of nucleotide binding, RNA splicing and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.
"A method for accounting of the missing information in the protein database using a coding genome polymorphism data has been explored recently . In general, a use of one or more custom DNA and/or mRNA sequence databases for LC–MS/MS data search is becoming a current trend in identifying the encoded variants of amino acid sequence that originated from single amino acid polymorphism or alternative splicing . These areas of research, as well as the studies on genome re-annotations using proteomics data, are often referred to as proteogenomics . "
"The SKBR3 proteomic analyses of chromosome 17 reported 217 distinct protein isoforms from 108 genes with transcript and peptide evidence of novel, alternative splicing; these 108 genes are annotated in the Ensembl database as genes with more than one protein-coding transcript . All of the 108 genes were identified as having novel spliced regions by RSW (Table 4). "
[Show abstract][Hide abstract] ABSTRACT: Setting
During endoplasmic reticulum (ER) stress, the endoribonuclease (RNase) Ire1α initiates removal of a 26 nt region from the mRNA encoding the transcription factor Xbp1 via an unconventional mechanism (atypically within the cytosol). This causes an open reading frame-shift that leads to altered transcriptional regulation of numerous downstream genes in response to ER stress as part of the unfolded protein response (UPR). Strikingly, other examples of targeted, unconventional splicing of short mRNA regions have yet to be reported.
Our goal was to develop an approach to identify non-canonical, possibly very short, splicing regions using RNA-Seq data and apply it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines to identify additional Ire1α targets.
We developed a bioinformatics approach called the Read-Split-Walk (RSW) pipeline, and evaluated it using two Ire1α heterozygous and two Ire1α-null samples. The 26 nt non-canonical splice site in Xbp1 was detected as the top hit by our RSW pipeline in heterozygous samples but not in the negative control Ire1α knockout samples. We compared the Xbp1 results from our approach with results using the alignment program BWA, Bowtie2, STAR, Exonerate and the Unix “grep” command. We then applied our RSW pipeline to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW reported a large number of non-canonical spliced regions for 108 genes in chromosome 17, which were identified by an independent study.
We conclude that our RSW pipeline is a practical approach for identifying non-canonical splice junction sites on a genome-wide level. We demonstrate that our pipeline can detect novel splice sites in RNA-Seq data generated under similar conditions for multiple species, in our case mouse and human.
PLoS ONE 07/2014; 9(7):e100864. DOI:10.1371/journal.pone.0100864 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Chromosome-centric human proteome project (C-HPP) aims at differentiating chromosome-based and tissue-specific protein compositions in terms of protein expression, quantification and modification. We previously found that the analysis of translating mRNA (mRNA attached to ribosome-nascent chain complex, RNC-mRNA) can explain over 94% of mRNA-protein abundance. Therefore, we propose here to use full-length RNC-mRNA information to illustrate protein expression both qualitatively and quantitatively. We performed RNA-seq on RNC-mRNA (RNC-seq) and detected 12,758 and 14,113 translating genes in human normal bronchial epithelial (HBE) cells and human colorectal adenocarcinoma Caco-2 cells, respectively. We found that most of these genes were mapped with greater than 80% of coding sequence coverage. In Caco-2 cells, we provided translating evidence on 4,180 significant single-nucleotide variations. While using RNC-mRNA data as a standard for proteomic data integration, both translating and protein evidence of 7,876 genes can be acquired from 4 inter-laboratory datasets with different MS platforms. In addition, we detected 1,397 non-coding mRNAs that were attached to ribosomes, suggesting potential source of new protein explorations. By comparing the two cell lines, a total of 677 differentially translated genes were found to be non-evenly distributed across chromosomes. In addition, 2,105 genes in Caco-2 and 750 genes in HBE cells are expressed in a cell-specific manner. These genes are significantly and specifically clustered on multiple chromosomes, such as chromosome 19. We conclude that HPP/C-HPP investigations can be considerably improved by integrating RNC-mRNA analysis with MS, bioinformatics and antibody-based verifications.
Journal of Proteome Research 11/2013; 13(1). DOI:10.1021/pr4007409 · 4.25 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.