Z R Li

Sichuan University, Chengdu, Sichuan Sheng, China

Are you Z R Li?

Claim your profile

Publications (19)55.43 Total impact

  • Y. Z.chen, Z. R.li, C. Y.ung
    [Show abstract] [Hide abstract]
    ABSTRACT: Ligand-protein inverse docking has recently been introduced as a computer method for identification of potential protein targets of a drug. A protein structure database is searched to find proteins to which a drug can bind or weakly bind. Examples of potential applications of this method in facilitating drug discovery include: (1) identification of unknown and secondary therapeutic targets of a drug, (2) prediction of potential toxicity and side effect of an investigative drug, and (3) probing molecular mechanism of bioactive herbal compounds such as those extracted from plants used in traditional medicines. This method and recent results on its applications in solving various drug discovery problems are reviewed.
    Journal of Theoretical and Computational Chemistry 01/2012; 01(01). DOI:10.1142/S0219633602000166 · 0.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein-protein and protein-small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein-protein and protein-small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein-protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.
    Nucleic Acids Research 05/2011; 39(Web Server issue):W385-90. DOI:10.1093/nar/gkr284 · 8.81 Impact Factor
  • N X Tan, H B Rao, Z R Li, X Y Li
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we report a successful application of machine learning approaches to the prediction of chemical carcinogenicity. Two different approaches, namely a support vector machine (SVM) and artificial neural network (ANN), were evaluated for predicting chemical carcinogenicity from molecular structure descriptors. A diverse set of 844 compounds, including 600 carcinogenic (CG+) and 244 noncarcinogenic (CG-) molecules, was used to estimate the accuracies of these approaches. The database was divided into two sets: the model construction set and the independent test set. Relevant molecular descriptors were selected by a hybrid feature selection method combining Fischer's score and Monte Carlo simulated annealing from a wide set of molecular descriptors, including physiochemical properties, constitutional, topological, and geometrical descriptors. The first model validation method was based a five-fold cross-validation method, in which the model construction set is split into five subsets. The five-fold cross-validation was used to select descriptors and optimise the model parameters by maximising the averaged overall accuracy. The final SVM model gave an averaged prediction accuracy of 90.7% for CG+ compounds, 81.6% for CG- compounds and 88.1% for the overall accuracy, while the corresponding ANN model provided an averaged prediction accuracy of 86.1% for CG+ compounds, 79.3% for CG- compounds and 84.2% for the overall accuracy. These results indicate that the hybrid feature selection method is very efficient and the selected descriptors are truly relevant to the carcinogenicity of compounds. Another model validation method, i.e. a hold-out method, was used to build the classification model using the selected descriptors and the optimised model parameters, in which the whole model construction set was used to build the classification model and the independent test set was used to test the predictive ability of the model. The SVM model gave a prediction accuracy of 87.6% for CG+ compounds, 79.1% for CG- compounds and 85.0% for the overall accuracy. The ANN model gave a prediction accuracy of 85.6% for CG+ compounds, 79.1% for CG- compounds and 83.6% for the overall accuracy. The results indicate that the built models are potentially useful for facilitating the prediction of chemical carcinogenicity of untested compounds.
    SAR and QSAR in environmental research 02/2009; 20(1-2):27-75. DOI:10.1080/10629360902724085 · 1.92 Impact Factor
  • ChemInform 08/2008; 39(35). DOI:10.1002/chin.200835216
  • [Show abstract] [Hide abstract]
    ABSTRACT: Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching method based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in the high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside the training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates.
    Journal of Chemical Information and Modeling 07/2008; 48(6):1227-37. DOI:10.1021/ci800022e · 4.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >or=1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.
    Journal of Molecular Graphics and Modelling 07/2008; 26(8):1276-86. DOI:10.1016/j.jmgm.2007.12.002 · 2.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational methods for predicting compounds of specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) property are useful for facilitating drug discovery and evaluation. Recently, machine learning methods such as neural networks and support vector machines have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic and ADMET property. These methods are particularly useful for compounds of diverse structures to complement QSAR methods, and for cases of unavailable receptor 3D structure to complement structure-based methods. A number of studies have demonstrated the potential of these methods for predicting such compounds as substrates of P-glycoprotein and cytochrome P450 CYP isoenzymes, inhibitors of protein kinases and CYP isoenzymes, and agonists of serotonin receptor and estrogen receptor. This article is intended to review the strategies, current progresses and underlying difficulties in using machine learning methods for predicting these protein binders and as potential virtual screening tools. Algorithms for proper representation of the structural and physicochemical properties of compounds are also evaluated.
    Journal of Pharmaceutical Sciences 11/2007; 96(11):2838-60. DOI:10.1002/jps.20985 · 3.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Molecular descriptors represent structural and physicochemical features of compounds. They have been extensively used for developing statistical models, such as quantitative structure activity relationship (QSAR) and artificial neural networks (NN), for computer prediction of the pharmacodynamic, pharmacokinetic, or toxicological properties of compounds from their structure. While computer programs have been developed for computing molecular descriptors, there is a lack of a freely accessible one. We have developed a web-based server, MODEL (Molecular Descriptor Lab), for computing a comprehensive set of 3,778 molecular descriptors, which is significantly more than the ∼1,600 molecular descriptors computed by other software. Our computational algorithms have been extensively tested and the computed molecular descriptors have been used in a number of published works of statistical models for predicting variety of pharmacodynamic, pharmacokinetic, and toxicological properties of compounds. Several testing studies on the computed molecular descriptors are discussed. MODEL is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi free of charge for academic use. Biotechnol. Bioeng. 2007;97: 389–396. © 2006 Wiley Periodicals, Inc.
    Biotechnology and Bioengineering 06/2007; 97(2):389 - 396. DOI:10.1002/bit.21214 · 4.16 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.
    Journal of Molecular Graphics and Modelling 12/2006; 25(3):313-23. DOI:10.1016/j.jmgm.2006.01.007 · 2.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence-derived structural and physicochemical features have frequently been used in the development of statistical learning models for predicting proteins and peptides of different structural, functional and interaction profiles. PROFEAT (Protein Features) is a web server for computing commonly-used structural and physicochemical features of proteins and peptides from amino acid sequence. It computes six feature groups composed of ten features that include 51 descriptors and 1447 descriptor values. The computed features include amino acid composition, dipeptide composition, normalized Moreau-Broto autocorrelation, Moran autocorrelation, Geary autocorrelation, sequence-order-coupling number, quasi-sequence-order descriptors and the composition, transition and distribution of various structural and physicochemical properties. In addition, it can also compute previous autocorrelations descriptors based on user-defined properties. Our computational algorithms were extensively tested and the computed protein features have been used in a number of published works for predicting proteins of functional classes, protein-protein interactions and MHC-binding peptides. PROFEAT is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/prof/prof.cgi.
    Nucleic Acids Research 08/2006; 34(Web Server issue):W32-7. DOI:10.1093/nar/gkl305 · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational methods for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological property are useful for facilitating drug discovery and drug safety evaluation. The quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) methods are the most successfully used statistical learning methods for predicting compounds of specific property. More recently, other statistical learning methods such as neural networks and support vector machines have been explored for predicting compounds of higher structural diversity than those covered by QSAR and QSPR. These methods have shown promising potential in a number of studies. This article is intended to review the strategies, current progresses and underlying difficulties in using statistical learning methods for predicting compounds of specific property. It also evaluates algorithms commonly used for representing structural and physicochemical properties of compounds.
    Mini Reviews in Medicinal Chemistry 05/2006; 6(4):449-59. DOI:10.2174/138955706776361501 · 3.19 Impact Factor
  • C W Yap, Z R Li, YZ Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Quantitative structure-pharmacokinetic relationships (QSPkR) have increasingly been used for the prediction of the pharmacokinetic properties of drug leads. Several QSPkR models have been developed to predict the total clearance (CL(tot)) of a compound. These models give good prediction accuracy but they are primarily based on a limited number of related compounds which are significantly lesser in number and diversity than the 503 compounds with known CL(tot) described in the literature. It is desirable to examine whether these and other statistical learning methods can be used for predicting the CL(tot) of a more diverse set of compounds. In this work, three statistical learning methods, general regression neural network (GRNN), support vector regression (SVR) and k-nearest neighbour (KNN) were explored for modeling the CL(tot) of all of the 503 known compounds. Six different sets of molecular descriptors, DS-MIXED, DS-3DMoRSE, DS-ATS, DS-GETAWAY, DS-RDF and DS-WHIM, were evaluated for their usefulness in the prediction of CL(tot). GRNN-, SVR- and KNN-developed models have average-fold errors in the range of 1.63 to 1.96, 1.66-1.95 and 1.90-2.23, respectively. For the best GRNN-, SVR- and KNN-developed models, the percentage of compounds with predicted CL(tot) within two-fold error of actual values are in the range of 61.9-74.3% and are comparable or slightly better than those of earlier studies. QSPkR models developed by using DS-MIXED, which is a collection of constitutional, geometrical, topological and electrotopological descriptors, generally give better prediction accuracies than those developed by using other descriptor sets. These results suggest that GRNN, SVR, and their consensus model are potentially useful for predicting QSPkR properties of drug leads.
    Journal of Molecular Graphics and Modelling 04/2006; 24(5):383-95. DOI:10.1016/j.jmgm.2005.10.004 · 2.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The odds of drug hit identification in screening are closely related to the diversity of libraries or the availability of focused libraries. There are no truly diverse libraries and it is difficult to design focused libraries without sufficient information. Hence alternative approaches need to be explored for enhancing the odds of hit discovery from existing libraries. Protein homologs have been used collectively targeted in inhibitor design and other discovery applications by exploiting the correlation between protein homologs and their ligands from specific compound classes. A receptor-homolog-based screening scheme may be derived as a strategy to potentially increase the odds of hit identification.
    Letters in Drug Design &amp Discovery 04/2006; 3(3):200-204. DOI:10.2174/157018006776286970 · 0.96 Impact Factor
  • C W Yap, Y Xue, Z R Li, Y Z Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Cytochrome P450 enzymes are responsible for phase I metabolism of the majority of drugs and xenobiotics. Identification of the substrates and inhibitors of these enzymes is important for the analysis of drug metabolism, prediction of drug-drug interactions and drug toxicity, and the design of drugs that modulate cytochrome P450 mediated metabolism. The substrates and inhibitors of these enzymes are structurally diverse. It is thus desirable to explore methods capable of predicting compounds of diverse structures without over-fitting. Support vector machine is an attractive method with these qualities, which has been employed for predicting the substrates and inhibitors of several cytochrome P450 isoenzymes as well as compounds of various other pharmacodynamic, pharmacokinetic, and toxicological properties. This article introduces the methodology, evaluates the performance, and discusses the underlying difficulties and future prospects of the application of support vector machines to in silico prediction of cytochrome P450 substrates and inhibitors.
    Current Topics in Medicinal Chemistry 02/2006; 6(15):1593-607. DOI:10.2174/156802606778108942 · 3.45 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of the energetics of small molecule ligand-protein, ligand-nucleic acid, and protein-nucleic acid interactions facilitates the quantitative understanding of molecular interactions that regulate the function and conformation of proteins. It has also been extensively used for ranking potential new ligands in virtual drug screening. We developed a Web-based software, PEARLS (Program for Energetic Analysis of Ligand-Receptor Systems), for computing interaction energies of ligand-protein, ligand-nucleic acid, protein-nucleic acid, and ligand-protein-nucleic acid complexes from their 3D structures. AMBER molecular force field, Morse potential, and empirical energy functions are used to compute the van der Waals, electrostatic, hydrogen bond, metal-ligand bonding, and water-mediated hydrogen bond energies between the binding molecules. The change in the solvation free energy of molecular binding is estimated by using an empirical solvation free energy model. Contribution from ligand conformational entropy change is also estimated by a simple model. The computed free energy for a number of PDB ligand-receptor complexes were studied and compared to experimental binding affinity. A substantial degree of correlation between the computed free energy and experimental binding affinity was found, which suggests that PEARLS may be useful in facilitating energetic analysis of ligand-protein, ligand-nucleic acid, and protein-nucleic acid interactions. PEARLS can be accessed at http://ang.cz3.nus.edu.sg/cgi-bin/prog/rune.pl.
    Journal of Chemical Information and Modeling 01/2006; 46(1):445-50. DOI:10.1021/ci0502146 · 4.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Pharmaceutical agents have been developed and tested for possessing desirable pharmacodynamic, pharmacokinetic, and minimal level of toxicological properties. Computational methods have been explored for predicting these properties aimed at the discovery of promising leads and the elimination of unsuitable ones in early stages of drug development. Statistical learning methods have shown their potential for predicting these properties for structurally diverse sets of agents by using both conventional (quantitative structure–activity and structure–property relationships) and more recently explored (such as neural networks and support vector machines) statistical models. These methods have been used for predicting agents of a variety of pharmacodynamic (such as inhibitors or agonists of a therapeutic target), pharmacokinetic (such as P-glycoprotein substrates, human intestine absorption, and blood–brain barrier penetrating capabilities), and toxicological (such as genotoxicity) properties. The strategies, current progresses, and the underlying difficulties and future prospects of the application of the recently explored statistical learning methods are discussed. Drug Dev. Res. 66:245–259, 2006. © 2006 Wiley-Liss, Inc.
    Drug Development Research 12/2005; 66(4):245 - 259. DOI:10.1002/ddr.20044 · 0.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval.
    Computers in Biology and Medicine 11/2005; 35(8):717-24. DOI:10.1016/j.compbiomed.2004.06.002 · 1.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8% for genotoxic (GT+) and 92.8% for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT- agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (k-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8% for GT+ and 92.7% for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.
    Chemical Research in Toxicology 07/2005; 18(6):1071-80. DOI:10.1021/tx049652h · 4.19 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical-learning methods have been developed for facilitating the prediction of pharmacokinetic and toxicological properties of chemical agents. These methods employ a variety of molecular descriptors to characterize structural and physicochemical properties of molecules. Some of these descriptors are specifically designed for the study of a particular type of properties or agents, and their use for other properties or agents might generate noise and affect the prediction accuracy of a statistical learning system. This work examines to what extent the reduction of this noise can improve the prediction accuracy of a statistical learning system. A feature selection method, recursive feature elimination (RFE), is used to automatically select molecular descriptors for support vector machines (SVM) prediction of P-glycoprotein substrates (P-gp), human intestinal absorption of molecules (HIA), and agents that cause torsades de pointes (TdP), a rare but serious side effect. RFE significantly reduces the number of descriptors for each of these properties thereby increasing the computational speed for their classification. The SVM prediction accuracies of P-gp and HIA are substantially increased and that of TdP remains unchanged by RFE. These prediction accuracies are comparable to those of earlier studies derived from a selective set of descriptors. Our study suggests that molecular feature selection is useful for improving the speed and, in some cases, the accuracy of statistical learning methods for the prediction of pharmacokinetic and toxicological properties of chemical agents.
    Journal of Chemical Information and Computer Sciences 09/2004; 44(5):1630-8. DOI:10.1021/ci049869h

Publication Stats

606 Citations
55.43 Total Impact Points

Institutions

  • 2004–2011
    • Sichuan University
      • • College of Chemistry
      • • State Key Laboratory of Biotherapy
      • • Department of Chemistry
      Chengdu, Sichuan Sheng, China
  • 2006
    • Xiamen University
      Amoy, Fujian, China