Article

Sequence-based feature prediction and annotation of proteins

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Genome biology (Impact Factor: 10.47). 03/2009; 10(2):206. DOI: 10.1186/gb-2009-10-2-206
Source: PubMed

ABSTRACT A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.

Download full-text

Full-text

Available from: Alfonso Valencia, Jul 05, 2015
1 Follower
 · 
127 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteomics experiments often generate a vast amount of data. However, the simple identification and quantification of proteins from a cell proteome or subproteome is not sufficient for the full understanding of complex mechanisms occurring in the biological systems. Therefore, the functional annotation analysis of protein datasets using bioinformatics tools is essential for interpreting the results of high-throughput proteomics. Although large-scale proteomics data have rapidly increased, the biological interpretation of these results remains as a challenging task. Here we reviewed basic concepts and different programs that are commonly used in proteomics data functional annotation, emphasizing the main strategies focused in the use of gene ontology annotations. Furthermore, we explored the characteristics of some tools developed for functional annotation analysis, concerning the ease of use and typical caveats on ontology annotations. The utility and variations between different tools were assessed through the comparison of the resulting outputs generated for an example of proteomics dataset.
    Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 01/2015; 1854(1):46–54. DOI:10.1016/j.bbapap.2014.10.019 · 3.19 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar data sets. We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled data sets. As shown using simulated gene sets with simulated data and MSigDB collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. http://cran.r-project.org/web/packages/EMVC/index.html CONTACT: jason.h.moore@dartmouth.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2014; 30(12). DOI:10.1093/bioinformatics/btu110 · 4.62 Impact Factor
  • Source
    Machine Learning 07/2011; DOI:10.1007/s10994-011-5271-6 · 1.69 Impact Factor