Sequence-based feature prediction and annotation of proteins

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.
Genome biology (Impact Factor: 10.47). 03/2009; 10(2):206. DOI: 10.1186/gb-2009-10-2-206
Source: PubMed

ABSTRACT A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.


Available from: Alfonso Valencia, Apr 18, 2015
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread-a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
    BioMed Research International 06/2014; 2014:348725. DOI:10.1155/2014/348725 · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Stress tolerance in plants is a coordinated action of multiple stress response genes that also cross talk with other components of the stress signal transduction pathways. The expression and regulation of stress-induced genes are largely regulated by specific transcription factors, families of which have been reported in several plant species, such as Arabidopsis, rice and Populus. In sorghum, the majority of such factors remain unexplored. We used 2DE refined with MALDI-TOF techniques to analyze drought stress-induced proteins in sorghum. A total of 176 transcription factors from the MYB, AUX_ARF, bZIP, AP2 and WRKY families of drought-induced proteins were identified. We developed a method based on semantic similarity of gene ontology terms (GO terms) to identify the transcription factors. A threshold value (≥ 90%) was applied to retrieve total 1,493 transcription factors with high semantic similarity from selected plant species. It could be concluded that the identified transcription factors regulate their target proteins with endogenous signals and environmental cues, such as light, temperature and drought stress. The regulatory network and cis-acting elements of the identified transcription factors in distinct families are involved in responsiveness to auxin, abscisic acid, defense, stress and light. These responses may be highly important in the modulation of plant growth and development.
    Cellular & Molecular Biology Letters 12/2014; DOI:10.2478/s11658-014-0223-3 · 1.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteomics experiments often generate a vast amount of data. However, the simple identification and quantification of proteins from a cell proteome or subproteome is not sufficient for the full understanding of complex mechanisms occurring in the biological systems. Therefore, the functional annotation analysis of protein datasets using bioinformatics tools is essential for interpreting the results of high-throughput proteomics. Although large-scale proteomics data have rapidly increased, the biological interpretation of these results remains as a challenging task. Here we reviewed basic concepts and different programs that are commonly used in proteomics data functional annotation, emphasizing the main strategies focused in the use of gene ontology annotations. Furthermore, we explored the characteristics of some tools developed for functional annotation analysis, concerning the ease of use and typical caveats on ontology annotations. The utility and variations between different tools were assessed through the comparison of the resulting outputs generated for an example of proteomics dataset.
    Biochimica et Biophysica Acta (BBA) - Proteins & Proteomics 01/2015; 1854(1):46–54. DOI:10.1016/j.bbapap.2014.10.019 · 3.19 Impact Factor