SWISSPROT protein sequence data bank, recent developments

Department of Medical Biochemistry, University of Geneva, Switzerland.
Nucleic Acids Research (Impact Factor: 9.11). 08/1993; 21(13):3093-6. DOI: 10.1093/nar/21.13.3093
Source: DBLP
Download full-text


Available from: Amos Bairoch
  • Source
    • "For the functional annotation of unassembled metagenome reads, the Metanor analysis pipeline was applied with four different BLAST tools: Blast2n vs. the MvirDB nucleotide database (E-value cut-off of 10 −10 ), Blast2x vs. the MvirDB protein database (E-value cut-off of 10 −10 ) and Blast2n vs. the NCBI Plasmid Database (E-value cut-off of 10 −10 ). For the functional annotation of assembled contigs from the plasmid metagenome, the Metanor analysis pipeline was employed with six tools: Blast2n vs. the NCBI nucleotide database (E-value cut-off of 10 −4 ); Blast2x vs. the NCBI protein database (E-value cut-off of 10 −4 ); Blast2x vs. the KEGG database (Kanehisa et al., 2006) (E-value cut-off of 10 −4 ); Blast2x vs. the Swissprot database (Bairoch and Boeckmann, 1993) (E-value cut-off of 10 −4 ); Blast2x vs. COG (Tatusov et al., 2003) (E-value cut-off of 10 −4 ) and InterPro. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Wastewater treatment plants (WWTPs) are a reservoir for bacteria harbouring antibiotic resistance plasmids. To get a comprehensive overview on the plasmid metagenome of WWTP bacteria showing reduced susceptibility to certain antimicrobial drugs an ultrafast sequencing approach applying the 454-technology was carried out. One run on the GS 20 System yielded 346,427 reads with an average read length of 104 bases resulting in a total of 36,071,493 bases sequence data. The obtained plasmid metagenome was analysed and functionally annotated by means of the Sequence Analysis and Management System (SAMS) software package. Known plasmid genes could be identified within the WWTP plasmid metagenome data set by BLAST searches using the NCBI Plasmid Database. Most abundant hits represent genes involved in plasmid replication, stability, mobility and transposition. Mapping of plasmid metagenome reads to completely sequenced plasmids revealed that many sequences could be assigned to the cryptic pAsa plasmids previously identified in Aeromonas salmonicida subsp. salmonicida and to the accessory modules of the conjugative IncU resistance plasmid pFBAOT6 of Aeromonas punctata. Matches of sequence reads to antibiotic resistance genes indicate that plasmids from WWTP bacteria encode resistances to all major classes of antimicrobial drugs. Plasmid metagenome sequence reads could be assembled into 605 contigs with a minimum length of 500 bases. Contigs predominantly encode plasmid survival functions and transposition enzymes.
    Full-text · Article · Jul 2008 · Journal of Biotechnology
  • Source
    • "First, the glycoprotein dataset used in [22] is extracted from SWISS-PROT/UniProt6.1 [34] and contains only mammalian glycoprotein sequences that have "mucin-type" O-linked glycosylation annotations. We use a glycoprotein dataset extracted from O-GlycBase v6.00 [35], a resource containing experimentally verified glycosylation sites compiled from protein databases and literature. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.
    Full-text · Article · Feb 2007 · BMC Bioinformatics
  • Source
    • "Dataset construction 2.1.1. Positive dataset The glycoprotein sequences come from Swiss-Prot/ UniProt6.1 (Bairoch, 1993 "
    [Show abstract] [Hide abstract]
    ABSTRACT: O-glycosylation is one of the most important, frequent and complex post-translational modifications. This modification can activate and affect protein functions. Here, we present three support vector machines models based on physical properties, 0/1 system, and the system combining the above two features. The prediction accuracies of the three models have reached 0.82, 0.85 and 0.85, respectively. The accuracies of the three SVMs methods were evaluated by 'leave-one-out' cross validation. This approach provides a useful tool to help identify the O-glycosylation sites in mammalian proteins. An online prediction web server is available at
    Preview · Article · Jul 2006 · Computational Biology and Chemistry
Show more