Interactive InterPro-based comparisons of proteins in whole genomes.

EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
Bioinformatics (Impact Factor: 4.98). 03/2002; 18(2):374-5.
Source: PubMed


MOTIVATION: The SWISS-PROT group at the EBI has developed the Proteome Analysis Database utilizing existing resources and providing comprehensive and integrated comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes. The Proteome Analysis Database is accompanied by a program that has been designed to carry out interactive InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.

Download full-text


Available from: Alexander Kanapin,
  • Source
    • "The InterPro set of domains was constructed by building groups of homologous polypeptide patters , including patterns, if any such exist at least in one of the integrated databases. Each such group is recorded as a unique domain in the InterPro database by the InterPro criteria [24]. All the sequences in a domain group must be suuciently long and mast be homologous enough to warrant this grouping. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We study statistical distributions appearing in various genome-related phenomena, including the distribution of the transcript copy number in the transcriptome of eukaryotic cells and the distribution of the number of proteins containing a protein domain in proteomes of species. We found that the empirical distributions for all studied data sets are well fitted by a family of Pareto-like distribution functions whose shape depends in a predictable manner on the sample size. Such distributions are generated as limiting distributions in a Markov random process where the birth and death intensities are linear functions of events. We also propose a novel model of progressive evolution of a population in terms of the increase of the numbers of distinct components and their links in the system and we study evolution of the probability distribution of these links. Estimating two unknown parameters of this model allows us to describe the progressive evolution of the number of distinct protein domain sets and the number of proteins containing a given protein domain in the proteomes of 70 fully sequenced genome organisms. This model also predicts trends in proteome complexity evolution.
    Signal Processing 05/2003; 83(4-83):889-910. DOI:10.1016/S0165-1684(02)00481-4 · 2.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The dissertation is submitted for the degree of Doctor of Philosophy.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase is used as the main tool in the sequence database group at the EBI to apply automatic annotation to unknown sequences. The annotated sequences are stored and distributed in the TrEMBL protein sequence database. InterPro also provides a means to carry out statistical and comparative analyses of whole genomes. In the Proteome Analysis Database, InterPro analyses have been combined with other analyses based on CluSTr, the Gene Ontology (GO) and structural information on the proteins.
    Briefings in Bioinformatics 10/2002; 3(3):285-95. · 9.62 Impact Factor
Show more