Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease

BMC Bioinformatics (Impact Factor: 3.02). 01/2010; DOI: 10.1186/1471-2105-11-S10-O4
Source: DOAJ

ABSTRACT Motivation: Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. Results: We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. Availability and Implementation: Source code for the structural kernel is available at Contact: Supplementary information:Supplementary data are available at Bioinformatics online.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The report summarizes the scientific content of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB) held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) conference in Long Beach, California on July 13, 2012.
    BMC Bioinformatics 12/2012; 13(Suppl 18):A1. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at
    PLoS ONE 01/2012; 7(7):e41370. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
    Genomics 08/2011; 98(4):233-41. · 3.01 Impact Factor


Available from