DomSVR: domain boundary prediction with support vector regression from sequence information alone.

Department of Systems and Computer Science, Howard University, 2400 Sixth Street, NW, Washington, DC 20059, USA.
Amino Acids (Impact Factor: 3.91). 02/2010; 39(3):713-26. DOI: 10.1007/s00726-010-0506-6
Source: PubMed

ABSTRACT Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of approximately 36.5% and an average specificity of approximately 81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Inter-domain linkers (IDLs)' bridge flanking domains and support inter-domain communication in multi-domain proteins. Their sequence and conformational preferences enable them to carry out varied functions. They also provide sufficient flexibility to facilitate domain motions and, in conjunction with the interacting interfaces, they also regulate the inter-domain geometry (IDG). In spite of the basic intuitive understanding of the inter-domain orientations with respect to linker conformations and interfaces, we still do not entirely understand the precise relationship among the three. We show that IDG is evolutionarily well conserved and is constrained by the domain-domain interface interactions. The IDLs modulate the interactions by varying their lengths, conformations and local structure, thereby affecting the overall IDG. Results of our analysis provide guidelines in modelling of multi-domain proteins from the tertiary structures of constituent domain components.
    Journal of biomolecular structure & dynamics 12/2012; · 4.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein function prediction is one of the most challenging problem in the post-genomic era. With the advances of the high-throughput techniques, the number of newly identified proteins has been increasing exponentially. However, the functional characterization of these new proteins have not increased in the same proportion. To fill this gap, a large number of computational methods have been proposed in the literature. Early approaches have explored homology relationships to associate known functions to the newly discovered proteins. Nevertheless, these approaches tend to fail when a new protein is considerably different (divergent) from other known ones. Accordingly, more accurate approaches that use expressive data representation and explore sophisticate computational techniques are urgently required. Regarding these points, this review provides a comprehensible description of machine learning approaches that are currently applied to protein function prediction problems. We start by defining several problems enrolled in understanding protein function aspects, and describing how machine learning can be applied to these problems. We aim to expose, in a systematical framework, the role of these techniques on protein function inference, sometimes difficult to follow up due to the rapid evolvement of the field. With this purpose in mind, we highlighted the most representative contributions, the recent advancements, and provide an insightful categorization and classification of machine learning methods in functional proteomics.
    Recent patents on biotechnology. 06/2013; 7.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved. In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy. The DomHR is available at
    PLoS ONE 01/2013; 8(4):e60559. · 3.73 Impact Factor


Available from
May 22, 2014