Sequence-based feature prediction and annotation of proteins

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Genome biology (Impact Factor: 10.81). 03/2009; 10(2):206. DOI: 10.1186/gb-2009-10-2-206
Source: PubMed


A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.

Download full-text


Available from: Alfonso Valencia
  • Source
    • "Many tools have been developed to mine several databases of biological information to finally predict a protein function based on sequence similarities. Detailed strategies on genomics and proteomics sequence annotation can be found in previous publications [11] [12] [13] [14] [15] [16] [17]. Nevertheless, once the genome and proteome are annotated, one of the most disseminated strategies of proteomics data functional annotation includes the use of ontologies, which can be understood as an explicit specification of a conceptualization [18]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteomics experiments often generate a vast amount of data. However, the simple identification and quantification of proteins from a cell proteome or subproteome is not sufficient for the full understanding of complex mechanisms occurring in the biological systems. Therefore, the functional annotation analysis of protein datasets using bioinformatics tools is essential for interpreting the results of high-throughput proteomics. Although large-scale proteomics data have rapidly increased, the biological interpretation of these results remains as a challenging task. Here we reviewed basic concepts and different programs that are commonly used in proteomics data functional annotation, emphasizing the main strategies focused in the use of gene ontology annotations. Furthermore, we explored the characteristics of some tools developed for functional annotation analysis, concerning the ease of use and typical caveats on ontology annotations. The utility and variations between different tools were assessed through the comparison of the resulting outputs generated for an example of proteomics dataset.
    Full-text · Article · Jan 2015 · Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics
  • Source
    • "The number of instances for each class should be balanced, as some classifiers like SVM tend to produce reduced accuracies for imbalanced datasets. Appropriate representation of the informative experimental data available and its conversion into datasets relevant to machine learning denotes the critical step of generating an efficient classifier (Juncker et al., 2009). "

    Full-text · Chapter · Mar 2014
  • Source
    • "The annotation of most genes and gene products is incomplete with only a sparse set of annotations to generic high-level categories available (Faria et al., 2012). For those annotations that do exist, the overwhelming majority are automatically generated on the basis of sequence or structural similarity without any curatorial review (du Plessis et al., 2011; Juncker et al., 2009). Such automatically generated annotations have known quality issues relative to manually curated annotations, especially those based on published experimental findings (Bell et al., 2012; Dolan et al., 2005; Faria et al., 2012; Park et al., 2011; Schnoes et al., 2009; Skunca et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar data sets. We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled data sets. As shown using simulated gene sets with simulated data and MSigDB collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Full-text · Article · Feb 2014 · Bioinformatics
Show more