New developments in the InterPro database

EMBL Outstation-European Bioinformatics Institute Hinxton, Cambridge, UK.
Nucleic Acids Research (Impact Factor: 9.11). 02/2007; 35(Database issue):D224-8. DOI: 10.1093/nar/gkl841
Source: PubMed

ABSTRACT InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (, and for download by anonymous FTP ( The InterProScan search tool is now also available via a web service at

Download full-text


Available from: Alberto Labarga, Aug 22, 2015
  • Source
    • "Sequence alignments were performed using ClustalW [47] from BioEdit ( Similarity searches were performed from InterProScan v4.3 [48] against InterPro (IPR) v32.0 [49] "
    [Show abstract] [Hide abstract]
    ABSTRACT: As part of the Sec translocase, the accessory ATPase SecA2 is present in some pathogenic Gram-positive bacteria. In Listeria monocytogenes, deletion of secA2 results in filamentous cells that forms rough colonies and have lower virulence. However, only a few proteins have been identified that are secreted by this pathway. This investigation aims to provide the first exoproteomic analysis of the SecA2-dependent secretion in L. monocytogenes EGD-e. By using media and temperatures relevant to bacterial physiology, we demonstrated that the rough colony and elongated bacterial cell morphotypes are highly dependent on growth conditions. Subsequently, comparative exoproteomic analyses of the ΔsecA2 versus wt strains were performed in chemically defined medium at 20°C and 37°C. Analyzing the proteomic data following the secretomics-based method, part of the proteins appeared routed towards the Sec pathway and exhibited an N-terminal signal peptide. For another significant part, they were primarily cytoplasmic proteins, thus lacking signal peptide and with no predictable secretion pathway. In total, 13 proteins were newly identified as secreted via SecA2, which were essentially associated with cell-wall metabolism, adhesion and/or biofilm formation. From this comparative exoproteomic analysis, new insights into the L. monocytogenes physiology are discussed in relation to its saprophytic and pathogenic lifestyle.
    Journal of proteomics 01/2013; 80. DOI:10.1016/j.jprot.2012.11.027 · 3.93 Impact Factor
  • Source
    • "In the case of homology data, the STRING scoring system uses the E-value obtained from sequence similarity searches. However, there are also protein signature databases such as InterPro [26], which is an integrated database for protein families and domains ( [27] and can be used to increase the reliability and coverage of these homology data. Therefore, there is a need for an effective scoring system to fill gaps found in homology and microarray data in STRING for this specific organism to produce a more complete MTB functional interaction network. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast amounts of data generated by these technologies provides a strategy for identifying potential drug targets within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets, functional interaction networks between proteins are used to identify proteins essential to the survival, growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies.
    Advances in Bioinformatics 11/2011; 2011:801478. DOI:10.1155/2011/801478
  • Source
    • "We only consider genes, for which all four experiments are reported which leaves 5878 of the total of 6178 genes. Item data were extracted from Interpro database that contains information on many types of of sequence signatures including protein domains and motifs [25]. For simplicity, we refer to all sequence signatures as domains. "
    [Show abstract] [Hide abstract]
    ABSTRACT: An algorithm is presented for finding patterns between sets of continuous attributes and item sets. In contrast to most pattern mining approaches, the algorithm considers multi-ple continuous attributes as a single vector attribute. This approach results in a separate abstraction level and allows multiple vector attributes to be considered. We show that the pattern mining process can uncover relationships be-tween the vector data and item sets. Filtering according to these patterns can be seen as feature selection at the level of the vector attributes as opposed to individual continu-ous attributes. In the evaluation, we show that the pattern mining algorithm can more effectively and efficiently achieve this filtering than a direct application of classification al-gorithms. Patterns are identified by relating item data to the distribution of objects within the vector space that is spanned by the sets of continuous attributes. The Kullback– Leibler divergence provides a quantitative measure that es-tablishes whether the subset defined by an item set differs from the overall distribution of data points. The set-subset relationship of data points, which violates i.i.d assumptions, requires an adaptation of standard algorithms for comput-ing the Kullback–Leibler divergence. The algorithm is evalu-ated on gene expression data and on a classification example problem that is constructed from time series data.
Show more