Article
Automatically extracting functionally equivalent proteins from SwissProt.
Research Department of Structural & Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
BMC Bioinformatics (impact factor:
2.75).
11/2008;
9:418.
DOI:10.1186/1471-2105-9-418
pp.418
Source: PubMed
- Citations (6)
-
Cited In (0)
-
Article: Predicting gene function by conserved co-expression.
[show abstract] [hide abstract]
ABSTRACT: We show that gene co-expression, which generally provides only a very weak signal for the prediction of functional interactions, can provide a reliable signal by exploiting evolutionary conservation. The encoded proteins of conserved co-expressed gene pairs are highly likely to be part of the same pathway not only after speciation (98%), but also after parallel gene duplication (97%). Conserved co-expression combined with homology data enables us to predict specific gene functions. The use of conservation between parallel duplicated gene pairs to predict function is especially promising given that gene duplication is common in eukaryotes, and that data from only a single organism can be used.Trends in Genetics 06/2003; 19(5):238-42. · 10.06 Impact Factor -
Article: Mining protein function from text using term-based support vector machines.
[show abstract] [hide abstract]
ABSTRACT: Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2.BMC Bioinformatics 02/2005; 6 Suppl 1:S22. · 2.75 Impact Factor -
Article: Basic local alignment search tool.
[show abstract] [hide abstract]
ABSTRACT: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.Journal of Molecular Biology 11/1990; 215(3):403-10. · 4.00 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
automated analysis
candidate list
different species
extracting FEPs
functional annotations
functional equivalence
functional equivalence datasets
functionally diverged proteins
functionally equivalent
functionally equivalent homologous proteins
generated database
gold-standard dataset
good performance
Large scale evaluation
large-scale analysis
manual analysis
one-off basis
orthologous proteins
possible future extensions
UniProtKB/Swiss-Prot functional annotations