Improvements in the Protein Identifier Cross-Reference service

EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Nucleic Acids Research (Impact Factor: 9.11). 04/2012; 40(Web Server issue):W276-80. DOI: 10.1093/nar/gks338
Source: PubMed


The Protein Identifier Cross-Reference (PICR) service is a tool that allows users to map protein identifiers, protein sequences
and gene identifiers across over 100 different source databases. PICR takes input through an interactive website as well as
Representational State Transfer (REST) and Simple Object Access Protocol (SOAP) services. It returns the results as HTML pages,
XLS and CSV files. It has been in production since 2007 and has been recently enhanced to add new functionality and increase
the number of databases it covers. Protein subsequences can be Basic Local Alignment Search Tool (BLAST) against the UniProt
Knowledgebase (UniProtKB) to provide an entry point to the standard PICR mapping algorithm. In addition, gene identifiers
from UniProtKB and Ensembl can now be submitted as input or mapped to as output from PICR. We have also implemented a ‘best-guess’
mapping algorithm for UniProt. In this article, we describe the usefulness of PICR, how these changes have been implemented,
and the corresponding additions to the web services. Finally, we explain that the number of source databases covered by PICR
has increased from the initial 73 to the current 102. New resources include several new species-specific Ensembl databases
as well as the Ensembl Genome ones. PICR can be accessed at

Download full-text


Available from: Juan Antonio Vizcaino
  • Source
    • "Several services have been developed by the PRIDE team, which are heavily used by external users but also by PRIDE itself, especially the ‘Protein Identifier Cross-Reference’ (PICR) service (a protein identifier mapping resource) (18) and the ‘Ontology Lookup Service’ (OLS) (to query, browse and navigate biomedical ontologies) (19). In addition, ‘Database on Demand’ is a service to generate tailored databases for performing proteomics searches (20). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The PRoteomics IDEntifications (PRIDE, database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
    Full-text · Article · Nov 2012 · Nucleic Acids Research
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Human Proteome Project was launched in September 2010 with the goal of characterizing at least one protein product from each protein-coding gene. Here we assess how much of the proteome has been detected to date via tandem mass spectrometry by analyzing PeptideAtlas, a compendium of human derived LC-MS/MS proteomics data from many laboratories around the world. All data sets are processed with a consistent set of parameters using the Trans-Proteomic Pipeline and subjected to a 1% protein FDR filter before inclusion in PeptideAtlas. Therefore, PeptideAtlas contains only high confidence protein identifications. To increase proteome coverage, we explored new comprehensive public data sources for data likely to add new proteins to the Human PeptideAtlas. We then folded these data into a Human PeptideAtlas 2012 build and mapped it to Swiss-Prot, a protein sequence database curated to contain one entry per human protein coding gene. We find that this latest PeptideAtlas build includes at least one peptide for each of ∼12500 Swiss-Prot entries, leaving ∼7500 gene products yet to be confidently cataloged. We characterize these "PA-unseen" proteins in terms of tissue localization, transcript abundance, and Gene Ontology enrichment, and propose reasons for their absence from PeptideAtlas and strategies for detecting them in the future.
    No preview · Article · Dec 2012 · Journal of Proteome Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One approach to infer functions of new proteins from their homologs utilizes visualization of an all-against-all pairwise similarity network (A2ApsN) that exploits the speed of BLAST and avoids the complexity of multiple sequence alignment. However, identifying functions of the protein clusters in A2ApsN is never trivial, due to a lack of linking characterized proteins to their relevant information in current software packages. Given the database errors introduced by automatic annotation transfer, functional deduction should be made from proteins with experimental studies, i.e. "reference proteins". Here, we present a web server, termed Pclust, which provides a user-friendly interface to visualize the A2ApsN, placing emphasis on such "reference proteins" and providing access to their full information in source databases, e.g. articles in PubMed. The identification of "reference proteins" and the ease of cross-database linkage will facilitate understanding the functions of protein clusters in the network, thus promoting interpretation of proteins of interest. The Pclust server is freely available at CONTACT:;
    Full-text · Article · Aug 2013 · Bioinformatics
Show more