Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316-1323

National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA.
Genome Research (Impact Factor: 14.63). 07/2009; 19(7):1316-23. DOI: 10.1101/gr.080531.108
Source: PubMed


Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Download full-text


Available from: Barbara Ruef,
57 Reads
  • Source
    • "Because querying predictions from different databases/Webservers for different algorithms is both tedious and time consuming, we developed dbNSFP (database for nonsynonymous SNPs' functional predictions) to facilitate the process. We first compiled a collection of all possible NSs in the human genome (a total of 75,931,005) based on the annotation of the Consensus Coding Sequence (CCDS) project [Pruitt et al., 2009]. We next collected their corresponding prediction scores from four new and popular prediction algorithms (SIFT [Kumar et al., 2009], Polyphen2 [Adzhubei et al., 2010], LRT [Chun and Fay, 2009], and MutationTaster [Schwarz et al., 2010]). "
  • Source
    • "The H. sapiens and S. scrofa mRNAs and repeat-associated RNAs were downloaded from the NCBI database (April 2013,, and Repbase (17.11 release), respectively. Additionally, the human coding sequences (CDS) were obtained from the NCBI CCDS Database (release 11.0, [54]. After this procedure, the remaining tags were verified in a second step, wherein the reads were mapped to the human and pig genomes, respectively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: MicroRNAs (miRNAs) are a class of small RNA molecules that regulate gene expression by inhibiting the protein translation or targeting the mRNA cleavage. They play many important roles in living organism cells; however, the knowledge on miRNAs functions has become more extensive upon their identification in biological fluids and recent reports on plant-origin miRNAs abundance in human plasma and serum. Considering these findings, we performed a rigorous bioinformatics analysis of publicly available, raw data from high-throughput sequencing studies on miRNAs composition in human and porcine breast milk exosomes to identify the fraction of food-derived miRNAs. Several processing and filtering steps were applied to increase the accuracy, and to avoid false positives. Through aforementioned analysis, 35 and 17 miRNA species, belonging to 25 and 11 MIR families, were identified, respectively. In the human samples the highest abundance levels yielded the ath-miR166a, pab-miR951, ptc-miR472a and bdi-miR168, while in the porcine breast milk exosomes, the zma-miR168a, zma-miR156a and ath-miR166a have been identified in the largest amounts. The consensus prediction and annotation of potential human targets for select plant miRNAs suggest that the aforementioned molecules may interact with mRNAs coding several transcription factors, protein receptors, transporters and immune-related proteins, thus potentially influencing human organism. Taken together, the presented analysis shows proof of abundant plant miRNAs in mammal breast milk exosomes, pointing at the same time to the new possibilities arising from this discovery.
    PLoS ONE 06/2014; 9(6):e99963. DOI:10.1371/journal.pone.0099963 · 3.23 Impact Factor
  • Source
    • "During the upload-process, every SNV is automatically functionally annotated using our in-house software tool snpActs ( snpActs identifies whether an SNV causes a protein coding substitution and which amino acid is affected using the gene annotations from CCDS [10] and RefSeq [11]. The amino acid changes in all iso-forms of the affected gene are classified and ranked in the following order: "nonsense" (most likely to be damaging), "readthrough", "start-lost", "splice site", "missense", "synonymous" (least likely to be damaging). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest. Tool description GrabBlur facilitates the combination of phenotypic and NGS data (VCF files) via a local interface or command line operations. Data submissions may include HPO (Human Phenotype Ontology) terms, other trait descriptions, NGS technology information and the identity of the submitter. Most of this information is optional and its provision at the discretion of the submitter. Upon initial intake, GrabBlur merges and aggregates all sample-specific data. If a certain SNV is rare, the sample-specific information is replaced with the submitter identity. Generally, all data in GrabBlur are highly aggregated so that they can be shared with others while ensuring maximum privacy. Thus, it is impossible to reconstruct complete exomes or genomes from the database or to re-identify single individuals. After the individual information has been sufficiently "blurred", the data can be uploaded into a publicly accessible domain where aggregated genotypes are provided alongside phenotypic information. A web interface allows querying the database and the extraction of gene-wise SNV information. If an interesting SNV is found, the interrogator can get in contact with the submitter to exchange further information on the carrier and clarify, for example, whether the latter's phenotype matches with phenotype of their own patient.
    BMC Genomics 05/2014; 15 Suppl 4(Suppl 4). DOI:10.1186/1471-2164-15-S4-S8 · 3.99 Impact Factor
Show more