The prosite database

Swiss Institute of Bioinformatics (SIB), Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.
Nucleic Acids Research (Impact Factor: 9.11). 02/2006; 34(Database issue):D227-30. DOI: 10.1093/nar/gkj063
Source: PubMed


The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or
profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain
or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give
more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages
were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains
1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot
entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at

Download full-text


Available from: Amos Bairoch,
  • Source
    • "Using the Swiss-Prot classification (Bairoch and Boeckmann, 1991), these proteins were classified into seven categories: (1) cytoplasmic proteins, (2) membrane proteins, (3) mitochondrial proteins, (4) secreted proteins, (5) nuclear proteins, (6) endoplasmic reticulum proteins and (7) 'unknown' proteins. With high-quality data from the OPHID, Swiss-Prot (Bairoch and Boeckmann, 1991), OGEE (Chen et al., 2012), OMIM (Hamosh et al., 2005), GO (Ashburner et al., 2000), KEGG (Kanehisa et al., 2004), Prosite (Hulo et al., 2006), Pfam (Bateman et al., 2004), TRANSFAC (Matys et al., 2003), CORUM (Ruepp et al., 2008), Ensembl (Hubbard et al., 2002) and BioGPS (Wu et al., 2013) databases, we presented a computational analysis workflow aimed at characterizing human proteins at a subcellular localization level, which included the use of differential topological properties, biological properties, codon usage indices, gene expression levels, protein complexity and physicochemical properties. Based on the analysis, significant differences were found in all properties in the seven categories. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell’s survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
    Journal of Theoretical Biology 10/2014; 358:61–73. DOI:10.1016/j.jtbi.2014.05.008 · 2.12 Impact Factor
  • Source
    • "from the Trypanosoma cruzi CL Brener non-Esmeraldo-like genome from TriTrypDB v3.3 database was taken as a reference for the domain search. The protein domain order and characterization of the reference protein was performed using the protein domain database PROSITE (Hulo et al., 2006) and the stand-alone ScanProsite tool (Gattiker et al., 2002) with default parameters. The protein domains (PS50178 and PS50290) found in the initial survey were used to search among all annotated proteins in the TriTrypDB v3.3 database to obtain the proteins sharing the same domains and the same order in other kinetoplastid organisms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Chagas disease is caused by the protozoan Trypanosoma cruzi which affects 10 million people worldwide. Very few kinases have been characterized in this parasite, including the phosphatidylinositol kinases (PIKs) that are at the heart of one of the major pathways of intracellular signal transduction. Recently, we have classified the PIK family in T. cruzi using five different models based on the presence of PIK conserved domains. In this study, we have mapped PIK genes to the chromosomes of two different T. cruzi lineages (G and CL Brener) and determined the cellular localization of two PIK members. The kinases have crucial roles in metabolism and are assumed to be conserved throughout evolution. For this reason, they should display a conserved localization within the same eukaryotic species. In spite of this, there is an extensive polymorphism regarding PIK localization at both genomic and cellular levels, among different T. cruzi isolates and between T. cruzi and T. brucei, respectively. We showed in this study that the cellular localization of two PIK-related proteins (TOR1 and 2) in the T. cruzi lineage is distinct from that previously observed in T. brucei. In addition, we identified a new PIK gene with peculiar feature, that is, it codes for a FYVE domain at N-terminal position. FYVE-PIK genes are phylogenetically distant from the groups containing exclusively the FYVE or PIK domain. The FYVE-PIK architecture is only present in trypanosomatids and in virus such as Acanthamoeba mimivirus, suggesting a horizontal acquisition. Our Bayesian phylogenetic inference supports this hypothesis. The exact functions of this FYVE-PIK gene are unknown, but the presence of FYVE domain suggests a role in membranous compartments, such as endosome. Taken together, the data presented here strengthen the possibility that trypanosomatids are characterized by extensive genomic plasticity that may be considered in designing drugs and vaccines for prevention of Chagas disease.
    Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases 04/2014; 25. DOI:10.1016/j.meegid.2014.03.022 · 3.02 Impact Factor
  • Source
    • ") allows to scan the PROSITE database (Hulo et al., 2006) using either patterns (regular expressions) or profiles (tables of position-specific amino acid weights and gap costs), but it does not allow to combine both a regular expression and a weight matrix. Finally ELM (Gould et al., 2010) is a database of eukaryotic linear motifs described as regular expressions, including the " TRG NES CRM1 1 " that allows prediction of CRM1-dependent NES motifs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Leucine-rich nuclear export signals (NESs) are short amino acid motifs that mediate binding of cargo proteins to the nuclear export receptor CRM1, and thus contribute to regulate the localization and function of many cellular proteins. Computational prediction of NES motifs is of great interest, but remains a significant challenge. We have developed a novel approach for amino acid motif searching that can be used for NES prediction. This approach, termed Wregex (weighted regular expression), combines regular expressions with a Position-Specific Scoring Matrix (PSSM), and has been implemented in a web-based, freely available, software tool. By making use of a PSSM, Wregex provides a score to prioritize candidates for experimental testing. Key features of Wregex include its flexibility, which makes it useful for searching other types of protein motifs, and its fast execution time, which makes it suitable for large-scale analysis. In comparative tests with previously available prediction tools, Wregex is shown to offer a good rate of true positive motifs, while keeping a smaller number of potential candidates. Wregex is free, open-source software available from CONTACT:
    Bioinformatics 01/2014; 30(9). DOI:10.1093/bioinformatics/btu016 · 4.98 Impact Factor
Show more