Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316-1323

National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA.
Genome Research (Impact Factor: 14.63). 07/2009; 19(7):1316-23. DOI: 10.1101/gr.080531.108
Source: PubMed


Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Download full-text


Available from: Barbara Ruef
  • Source
    • "Because querying predictions from different databases/Webservers for different algorithms is both tedious and time consuming, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions) to facilitate the process. We first compiled a collection of all possible NSs in the human genome (a total of 75,931,005) based on the annotation of the Consensus Coding Sequence (CCDS) project [Pruitt et al., 2009]. We next collected their corresponding prediction scores from four new and popular prediction algorithms (SIFT [Kumar et al., 2009], Polyphen2 [Adzhubei et al., 2010], LRT [Chun and Fay, 2009], and MutationTaster [Schwarz et al., 2010]). "

    Full-text · Dataset · Aug 2014
  • Source
    • "The H. sapiens and S. scrofa mRNAs and repeat-associated RNAs were downloaded from the NCBI database (April 2013,, and Repbase (17.11 release), respectively. Additionally, the human coding sequences (CDS) were obtained from the NCBI CCDS Database (release 11.0, [54]. After this procedure, the remaining tags were verified in a second step, wherein the reads were mapped to the human and pig genomes, respectively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: MicroRNAs (miRNAs) are a class of small RNA molecules that regulate gene expression by inhibiting the protein translation or targeting the mRNA cleavage. They play many important roles in living organism cells; however, the knowledge on miRNAs functions has become more extensive upon their identification in biological fluids and recent reports on plant-origin miRNAs abundance in human plasma and serum. Considering these findings, we performed a rigorous bioinformatics analysis of publicly available, raw data from high-throughput sequencing studies on miRNAs composition in human and porcine breast milk exosomes to identify the fraction of food-derived miRNAs. Several processing and filtering steps were applied to increase the accuracy, and to avoid false positives. Through aforementioned analysis, 35 and 17 miRNA species, belonging to 25 and 11 MIR families, were identified, respectively. In the human samples the highest abundance levels yielded the ath-miR166a, pab-miR951, ptc-miR472a and bdi-miR168, while in the porcine breast milk exosomes, the zma-miR168a, zma-miR156a and ath-miR166a have been identified in the largest amounts. The consensus prediction and annotation of potential human targets for select plant miRNAs suggest that the aforementioned molecules may interact with mRNAs coding several transcription factors, protein receptors, transporters and immune-related proteins, thus potentially influencing human organism. Taken together, the presented analysis shows proof of abundant plant miRNAs in mammal breast milk exosomes, pointing at the same time to the new possibilities arising from this discovery.
    Full-text · Article · Jun 2014 · PLoS ONE
  • Source
    • "During the upload-process, every SNV is automatically functionally annotated using our in-house software tool snpActs ( snpActs identifies whether an SNV causes a protein coding substitution and which amino acid is affected using the gene annotations from CCDS [10] and RefSeq [11]. The amino acid changes in all iso-forms of the affected gene are classified and ranked in the following order: "nonsense" (most likely to be damaging), "readthrough", "start-lost", "splice site", "missense", "synonymous" (least likely to be damaging). "
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest.
    Full-text · Article · May 2014 · BMC Genomics
Show more