Are you Fangfei Yan?

Claim your profile

Publications (4)47.61 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: As part of the chromosome-centric human proteome project (C-HPP) initiative, we report our progress on annotation of chromosome 22. Chromosome 22, spanning 51 million base pairs, was the first chromosome to be sequenced. Gene dosage alterations on this chromosome have been shown to be associated with a number of congenital anomalies. In addition, several rare but aggressive tumors have been associated with this chromosome. A number of important gene families including immunoglobulin lambda locus, crystallin beta family and APOBEC gene family are located on this chromosome. Based on proteomic profiling of 30 histologically normal tissues and cells using high resolution mass spectrometry, we show protein evidence of 367 genes on chromosome 22. Importantly, this includes 47 proteins, which are currently annotated as "missing" proteins. We also confirmed the translation start sites of 120 chromosome 22-encoded proteins. Employing a comprehensive proteogenomics analysis pipeline, we provide evidence of novel coding regions on this chromosome which include upstream ORFs, novel exons in addition to correcting existing gene structures. We describe tissue-wise expression of the proteins and the distribution of gene families on this chromosome. The data have been deposited to ProteomeXchange with the identifier PXD000561.
    Journal of Proteome Research 03/2014; · 5.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: As part of the genome-wide and chromosome-centric human proteomic project (C-HPP) we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (~9800 transcripts per cell line) versus the protein observations (~1900 protein identifications per cell line). Conversely the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.
    Journal of Proteome Research 03/2013; · 5.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
    Journal of Proteome Research 12/2012; · 5.06 Impact Factor
  • Nature Biotechnology 01/2012; 30(3):221-3. · 32.44 Impact Factor