Sequence-based feature prediction and annotation of proteins.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Genome biology (Impact Factor: 10.47). 03/2009; 10(2):206. DOI: 10.1186/gb-2009-10-2-206
Source: PubMed

ABSTRACT A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteomics experiments often generate a vast amount of data. However, the simple identification and quantification of proteins from a cell proteome or subproteome is not sufficient for the full understanding of complex mechanisms occurring in the biological systems. Therefore, the functional annotation analysis of protein datasets using bioinformatics tools is essential for interpreting the results of high-throughput proteomics. Although large-scale proteomics data have rapidly increased, the biological interpretation of these results remains as a challenging task. Here we reviewed basic concepts and different programs that are commonly used in proteomics data functional annotation, emphasizing the main strategies focused in the use of gene ontology annotations. Furthermore, we explored the characteristics of some tools developed for functional annotation analysis, concerning the ease of use and typical caveats on ontology annotations. The utility and variations between different tools were assessed through the comparison of the resulting outputs generated for an example of proteomics dataset.
    Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 01/2015; 1854(1):46–54. · 3.19 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread-a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
    BioMed Research International 01/2014; 2014:348725. · 2.71 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a generic Bayesian framework for the peptide and protein identification in proteomics, and provide a unified interpretation for the database searching and the de novo peptide sequencing approaches that are used in peptide identification. We describe several probabilistic graphical models and a variety of prior distributions that can be incorporated into the Bayesian framework to model different types of prior information, such as the known protein sequences, the known protein abundances, the peptide precursor masses, the estimated peptide retention time and the peptide detectabilities. Various applications of the Bayesian framework are discussed theoretically, including its application to the identification of peptides containing mutations and post-translational modifications.
    Statistics and its interface 01/2012; 5(1):21-37. · 0.46 Impact Factor

Full-text (3 Sources)

Available from
May 21, 2014