Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes

ArticleinJournal of Proteome Research 13(1) · October 2013with32 Reads
DOI: 10.1021/pr400773v · Source: PubMed
This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes, it contains many cancer-associated genes, including BRCA1, ERBB2 (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell line models of hormone receptor negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell-lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (over-expression)/ ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (over-expression)/ER-/PR-; inflammatory breast cancer) and SUM149 (ERBB2 (low expression) ER-/PR -; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared to the other two cancer cell lines and to normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell- to-cell adhesion, integrin and ERK1/ERK2 signaling, and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 over-expressed cell line models; and an association of nucleotide binding, RNA splicing and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.
    • "However, the loss of linkage information during the generation of short reads limits their utility. In particular, short reads are insufficient to phase the haplotypes of individuals within mixtures of similar sequences, including homeologous and homologous chromosomes in polyploids [3, 4], viral quasispecies [5], multiply or alternatively spliced mRNA [6], genes from metagenomic samples containing related organisms [7, 8], and immune antibody gene repertoires [9]. In these cases, additional information is required to determine whether mutations separated by distances longer than the read length are present in the same individual. "
    [Show abstract] [Hide abstract] ABSTRACT: Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise.
    Full-text · Article · Jan 2016
    • "A method for accounting of the missing information in the protein database using a coding genome polymorphism data has been explored recently [12]. In general, a use of one or more custom DNA and/or mRNA sequence databases for LC–MS/MS data search is becoming a current trend in identifying the encoded variants of amino acid sequence that originated from single amino acid polymorphism or alternative splicing [13]. These areas of research, as well as the studies on genome re-annotations using proteomics data, are often referred to as proteogenomics [14]. "
    [Show abstract] [Hide abstract] ABSTRACT: Searching deep proteome data for 9 NCI-60 cancer cell lines obtained earlier by Moghaddas Gholami, et. al. (Cell Reports, 2013) against a database from cancer genomes returned a variant tryptic peptide fragment 57-72 of molecular chaperone HSC70, in which methionine residue at 61 position is replaced by threonine, or isothreonine (homoserine), residue. However, no traces of the corresponding genetic alteration were found in the cell line genomes reported by Abaan et. al. (Cancer Research, 2013). Studying on the background of this modification led us to conclude that a conversion of methionine into isothreonine resulted from iodoacetamide treatment of the probe during a sample preparation step. We found that up to 10% of methionine containing peptides experienced the above conversion for the datasets under study. The artifact was confirmed by model experiment with bovine albumin, where three of four methionine residues were partly converted to isothreonine by conventional iodoacetamide treatment. This experimental side reaction has to be taken into account when searching for genetically encoded peptide variants in the proteogenomics studies. A lot of effort is currently put into proteogenomics of cancer. Studies detect non-synonymous cancer mutations at protein level by search of high-throughput LC-MS/MS data against customized genomic databases. In such studies, much attention is paid to potential false positive identifications. Here we describe one possible cause of such false identifications, an artifact of sample preparation which mimics methionine to threonine nucleic acid-encoded variant. The methionine to isothreonine conversion should be taken into consideration for correct interpretation of proteogenomic data. Copyright © 2015. Published by Elsevier B.V.
    Full-text · Article · Mar 2015
    • "The SKBR3 proteomic analyses of chromosome 17 reported 217 distinct protein isoforms from 108 genes with transcript and peptide evidence of novel, alternative splicing; these 108 genes are annotated in the Ensembl database as genes with more than one protein-coding transcript [23]. All of the 108 genes were identified as having novel spliced regions by RSW (Table 4). "
    [Show abstract] [Hide abstract] ABSTRACT: Setting During endoplasmic reticulum (ER) stress, the endoribonuclease (RNase) Ire1α initiates removal of a 26 nt region from the mRNA encoding the transcription factor Xbp1 via an unconventional mechanism (atypically within the cytosol). This causes an open reading frame-shift that leads to altered transcriptional regulation of numerous downstream genes in response to ER stress as part of the unfolded protein response (UPR). Strikingly, other examples of targeted, unconventional splicing of short mRNA regions have yet to be reported. Objective Our goal was to develop an approach to identify non-canonical, possibly very short, splicing regions using RNA-Seq data and apply it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines to identify additional Ire1α targets. Results We developed a bioinformatics approach called the Read-Split-Walk (RSW) pipeline, and evaluated it using two Ire1α heterozygous and two Ire1α-null samples. The 26 nt non-canonical splice site in Xbp1 was detected as the top hit by our RSW pipeline in heterozygous samples but not in the negative control Ire1α knockout samples. We compared the Xbp1 results from our approach with results using the alignment program BWA, Bowtie2, STAR, Exonerate and the Unix “grep” command. We then applied our RSW pipeline to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW reported a large number of non-canonical spliced regions for 108 genes in chromosome 17, which were identified by an independent study. Conclusions We conclude that our RSW pipeline is a practical approach for identifying non-canonical splice junction sites on a genome-wide level. We demonstrate that our pipeline can detect novel splice sites in RNA-Seq data generated under similar conditions for multiple species, in our case mouse and human.
    Full-text · Article · Jul 2014
