[Show abstract][Hide abstract] ABSTRACT: Although significant progress has been made in experimental high throughput screening (HTS) of ADME (absorption, distribution, metabolism, excretion) and pharmacokinetic properties, the ADME and Toxicity (ADME-Tox) in silico modeling is still indispensable in drug discovery as it can guide us to wisely select drug candidates prior to expensive ADME screenings and clinical trials. Compared to other ADME-Tox properties, human oral bioavailability (HOBA) is particularly important but extremely difficult to predict. In this paper, the advances in human oral bioavailability modeling will be reviewed. Moreover, our deep insight on how to construct more accurate and reliable HOBA QSAR and classification models will also discussed.
[Show abstract][Hide abstract] ABSTRACT: Tyrosine kinases are regarded as excellent targets for chemical drug therapy of carcinomas. However, under strong purifying selection, drug resistance usually occurs in the cancer cells within a short term. Many cases of drug resistance have been found to be associated with secondary mutations in drug target, which lead to the attenuated drug-target interactions. For example, recently, an acquired secondary mutation, G2032R, has been detected in the drug target, ROS1 tyrosine kinase, from a crizotinib-resistant patient, who responded poorly to crizotinib within a very short therapeutic term. It was supposed that the mutation was located at the solvent front and might hinder the drug binding. However, a different fact could be uncovered by the simulations reported in this study. Here, free energy surfaces were characterized by the drug-target distance and the phosphate-binding loop (P-loop) conformational change of the crizotinib-ROS1 complex through advanced molecular dynamics techniques, and it was revealed that the more rigid P-loop region in the G2032R-mutated ROS1 was primarily responsible for the crizotinib resistance, which on one hand, impaired the binding of crizotinib directly, and on the other hand, shortened the residence time induced by the flattened free energy surface. Therefore, both of the binding affinity and the drug residence time should be emphasized in rational drug design to overcome the kinase resistance.
[Show abstract][Hide abstract] ABSTRACT: Background
A foundational library called MORT (Molecular Objects and Relevant Templates) for the development of new software packages and tools employed in computational biology and computer-aided drug design (CADD) is described here.
MORT contains several advantages compared with the other libraries. Firstly, MORT written in C++ natively supports the paradigm of object-oriented design, and thus it can be understood and extended easily. Secondly, MORT employs the relational model to represent a molecule, and it is more convenient and flexible than the traditional hierarchical model employed by many other libraries. Thirdly, a lot of functions have been included in this library, and a molecule can be manipulated easily at different levels. For example, it can parse a variety of popular molecular formats (MOL/SDF, MOL2, PDB/ENT, SMILES/SMARTS, etc.), create the topology and coordinate files for the simulations supported by AMBER, calculate the energy of a specific molecule based on the AMBER force fields, etc.
We believe that MORT can be used as a foundational library for programmers to develop new programs and applications for computational biology and CADD. Source code of MORT is available at
Journal of Cheminformatics 06/2014; 6(1):36-36. DOI:10.1186/1758-2946-6-36 · 4.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Here, we systematically investigated how the force fields and the partial charge models for ligands affect the ranking performance of the binding free energies predicted by the Molecular Mechanics/Poisson Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) approaches. A total of 46 small molecules targeted to 5 different protein receptors were employed to test the following issues: (1) the impact of five AMBER force fields (ff99, ff99SB, ff99SB-ILDN, ff03 and ff12SB) on the performance of MM/GBSA, (2) the influence of the timescale of molecular dynamics (MD) simulations on the performance of MM/GBSA with different force fields, (3) the impact of five AMBER force fields on the performance of MM/PBSA, and (4) the impact of four different charge models (RESP, ESP, AM1-BCC and Gasteiger) for small molecules on the performance of MM/PBSA or MM/GBSA. Based on our simulation results, the following important conclusions can be obtained: (1) for short time-scale MD simulations (1 ns or less), the ff03 force field gives the best predictions by both MM/GBSA and MM/PBSA; (2) for middle time-scale MD simulations (2~4 ns), MM/GBSA based on the ff99 force field yields the best predictions, while MM/PBSA based on the ff99SB force field does the best; however, longer MD simulations, for example, 5 ns or more, may not be quite necessary; (3) for most cases, MM/PBSA with the Tan's parameters shows better ranking capability than MM/GBSA (GBOBC1); (4) the RESP charges show the best performance for both MM/PBSA and MM/GBSA, and the AM1-BCC and ESP charges can also give fairly satisfactory predictions. Our results provide useful guidance for the practical applications of the MM/GBSA and MM/PBSA approaches.
The Journal of Physical Chemistry B 07/2013; 117(28):8408-8421. DOI:10.1021/jp404160y · 3.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Conformational entropy calculation, usually computed by normal mode analysis (NMA) or quasi harmonic analysis (QHA), is extremely time-consuming. Here, instead of NMA or QHA, a solvent accessible surface area (SASA) based model was employed to compute the conformational entropy, and a new fast GPU-based method called MURCIA (Molecular Unburied Rapid Calculation of Individual Areas) was implemented to accelerate the calculation of SASA for each atom. MURCIA employs two different kernels to determine the neighbours of each atom. The first kernel (K1) uses brute force for the calculation of the neighbours of atoms, while the second one (K2) uses an advanced algorithm involving hardware interpolations via GPU texture memory unit for such purpose. These two kernels yield very similar results. Each kernel has its own advantages depending on the protein size. K1 performs better than K2 when the size is small, and vice versa. The algorithm was extensively evaluated for three protein datasets, and achieves good results for all of them. This GPU-accelerated version is ~600 times faster than the former sequential algorithm when the number of the atoms in a protein is up to 105.
Journal of Chemical Information and Modeling 07/2013; 53(8). DOI:10.1021/ci400263t · 3.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this study, in order to elucidate the action mechanism of Traditional Chinese Medicines (TCM) that exhibit clinical efficacy for type II diabetes mellitus (T2DM), an integrated protocol that combines molecular docking and pharmacophore mapping was employed to find the potential inhibitors from TCM for the T2DM-related targets and establish the compound-target interaction network. First, the prediction capabilities of molecular docking and pharmacophore mapping to distinguish inhibitors from non-inhibitors for the selected T2DM-related targets were evaluated. The results show that molecular docking or pharmacophore mapping can give satisfactory predictions for most targets but the validations are still quite necessary because the prediction accuracies of these two methods are variable across different targets. Then, the Bayesian classifiers by integrating the predictions from molecular docking and pharmacophore mapping were developed, and the well-validated Bayesian classifiers for 15 targets were utilized to find the potential inhibitors from TCM and establish the compound-target interaction network. The analysis of the compound-target network demonstrates that a small portion (18.6%) of the predicted inhibitors can interact with multi-targets. The pharmacological activities for some potential inhibitors have been experimentally confirmed, highlighting the reliability of the Bayesian classifiers. Besides, it is interesting to find that a considerable number of the predicted multi-target inhibitors have free radical scavenging/antioxidant activities, which are closely related to T2DM. It appears that the pharmacological effect of the TCM formulae is determined not only by the compounds that interact directly with one or more T2DM-related targets, but also by the compounds with other supplementary bioactivities important for relieving T2DM, such as free radical scavenging/antioxidant effects. The mechanism uncovered by this study may offer a deep insight for understanding the theory of the classical TCM formulae for combating T2DM. Moreover, the predicted inhibitors for the T2DM-related targets may provide a good source to find new lead compounds against T2DM.
Journal of Chemical Information and Modeling 06/2013; 53(7). DOI:10.1021/ci400146u · 3.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background
In order to better understand the structural features of natural compounds from traditional Chinese medicines, the scaffold architectures of drug-like compounds in MACCS-II Drug Data Report (MDDR), non-drug-like compounds in Available Chemical Directory (ACD), and natural compounds in Traditional Chinese Medicine Compound Database (TCMCD) were explored and compared.
First, the different scaffolds were extracted from ACD, MDDR and TCMCD by using three scaffold representations, including Murcko frameworks, Scaffold Tree, and ring systems with different complexity and side chains. Then, by examining the accumulative frequency of the scaffolds in each dataset, we observed that the Level 1 scaffolds of the Scaffold Tree offer advantages over the other scaffold architectures to represent the scaffold diversity of the compound libraries. By comparing the similarity of the scaffold architectures presented in MDDR, ACD and TCMCD, structural overlaps were observed not only between MDDR and TCMCD but also between MDDR and ACD. Finally, Tree Maps were used to cluster the Level 1 scaffolds of the Scaffold Tree and visualize the scaffold space of the three datasets.
The analysis of the scaffold architectures of MDDR, ACD and TCMCD shows that, on average, drug-like molecules in MDDR have the highest diversity while natural compounds in TCMCD have the highest complexity. According to the Tree Maps, it can be observed that the Level 1 scaffolds present in MDDR have higher diversity than those presented in TCMCD and ACD. However, some representative scaffolds in MDDR with high frequency show structural similarities to those in TCMCD and ACD, suggesting that some scaffolds in TCMCD and ACD may be potentially drug-like fragments for fragment-based and de novo drug design.
Journal of Cheminformatics 01/2013; 5(1):5. DOI:10.1186/1758-2946-5-5 · 4.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background
In this work, we analyzed and compared the distribution profiles of a wide variety of molecular properties for three compound classes: drug-like compounds in MDL Drug Data Report (MDDR), non-drug-like compounds in Available Chemical Directory (ACD), and natural compounds in Traditional Chinese Medicine Compound Database (TCMCD).
The comparison of the property distributions suggests that, when all compounds in MDDR, ACD and TCMCD with molecular weight lower than 600 were used, MDDR and ACD are substantially different while TCMCD is much more similar to MDDR than ACD. However, when the three subsets of ACD, MDDR and TCMCD with similar molecular weight distributions were examined, the distribution profiles of the representative physicochemical properties for MDDR and ACD do not differ significantly anymore, suggesting that after the dependence of molecular weight is removed drug-like and non-drug-like molecules cannot be effectively distinguished by simple property-based filters; however, the distribution profiles of several physicochemical properties for TCMCD are obviously different from those for MDDR and ACD. Then, the performance of each molecular property on predicting drug-likeness was evaluated. No single molecular property shows good performance to discriminate between drug-like and non-drug-like molecules. Compared with the other descriptors, fractional negative accessible surface area (FASA-) performs the best. Finally, a PCA-based scheme was used to visually characterize the spatial distributions of the three classes of compounds with similar molecular weight distributions.
If FASA- was used as a drug-likeness filter, more than 80% molecules in TCMCD were predicted to be drug-like. Moreover, the principal component plots show that natural compounds in TCMCD have different and even more diverse distributions than either drug-like compounds in MDDR or non-drug-like compounds in ACD.
Journal of Cheminformatics 11/2012; 4(1):31. DOI:10.1186/1758-2946-4-31 · 4.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Assigning bond orders is a necessary and essential step for characterizing a chemical structure correctly in force field based simulations. Several methods have been developed to do this. They all have advantages but with limitations too. Here, an automatic algorithm for assigning chemical connectivity and bond order regardless of hydrogen for organic molecules is provided, and only three dimensional coordinates and element identities are needed for our algorithm. The algorithm uses hard rules, length rules and conjugation rules to fix the structures. The hard rules determine bond orders based on the basic chemical rules; the length rules determine bond order by the length between two atoms based on a set of predefined values for different bond types; the conjugation rules determine bond orders by using the length information derived from the previous rule, the bond angles and some small structural patterns. The algorithm is extensively evaluated in three datasets, and achieves good accuracy of predictions for all the datasets. Finally, the limitation and future improvement of the algorithm are discussed.
Journal of Cheminformatics 10/2012; 4(1):26. DOI:10.1186/1758-2946-4-26 · 4.55 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Quantitative or qualitative characterization of the drug-like features of known drugs may help medicinal and computational chemists to select higher quality drug leads from a huge pool of compounds and to improve the efficiency of drug design pipelines. For this purpose, the theoretical models for drug-likeness to discriminate between drug-like and non-drug-like based on molecular physicochemical properties and structural fingerprints were developed by using the naive Bayesian classification (NBC) and recursive partitioning (RP) techniques, and then the drug-likeness of the compounds from the Traditional Chinese Medicine Compound Database (TCMCD) was evaluated. First, the impact of molecular physicochemical properties and structural fingerprints on the prediction accuracy of drug-likeness was examined. We found that, compared with simple molecular properties, structural fingerprints were more essential for the accurate prediction of drug-likeness. Then, a variety of Bayesian classifiers were constructed by changing the ratio of drug-like to non-drug-like molecules and the size of the training set. The results indicate that the prediction accuracy of the Bayesian classifiers was closely related to the size and the degree of the balance of the training set. When a balanced training set was used, the best Bayesian classifier based on 21 physicochemical properties and the LCFP_6 fingerprint set yielded an overall leave-one-out (LOO) cross-validated accuracy of 91.4% for the 140,000 molecules in the training set and 90.9% for the 40,000 molecules in the test set. In addition, the RP classifiers with different maximum depth were constructed and compared with the Bayesian classifiers, and we found that the best Bayesian classifier outperformed the best RP model with respect to overall prediction accuracy. Moreover, the Bayesian classifier employing structural fingerprints highlights the important substructures favorable or unfavorable for drug-likeness, offering extra valuable information for getting high quality lead compounds in the early stage of the drug design/discovery process. Finally, the best Bayesian classifier was used to predict the drug-likeness of 33,961 compounds in TCMCD. Our calculations show that 59.37% of the molecules in TCMCD were identified as drug-like molecules, indicating that traditional Chinese medicines (TCMs) are therefore an excellent source of drug-like molecules. Furthermore, the important structural fingerprints in TCMCD were detected and analyzed. Considering that the pharmacology of TCMCD and MDDR (MDL Drug Data Report) was linked by the important common structural features, the potential pharmacology of the compounds in TCMCD may therefore be annotated by these important structural signatures identified from Bayesian analysis, which may be valuable to promote the development of TCMs.
[Show abstract][Hide abstract] ABSTRACT: As an integrated step toward a coherent polarizable force field for biomolecular modeling, we analyzed four polarizable water models to evaluate their consistencies with the Thole polarization screening schemes utilized in our latest Amber polarizable force field. Specifically, we studied the performance of both the Thole linear and exponential schemes in these water models to assess their abilities to reproduce experimental water properties. The analysis shows that the tested water models reproduce most of the room-temperature properties of liquid water reasonably well but fall short of reproducing the dynamic properties and temperature-dependent properties. This study demonstrates the necessity to further fine-tune water polarizable potentials for more robust polarizable force fields for biomolecular simulations.
The Journal of Physical Chemistry B 06/2012; 116(28). DOI:10.1021/jp212117d · 3.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In the previous publications of this series, we presented a set of Thole induced dipole interaction models using four types of screening functions. In this work, we document our effort to refine the van der Waals parameters for the Thole polarizable models. Following the philosophy of AMBER force field development, the van der Waals (vdW) parameters were tuned for the Thole model with linear screening function to reproduce both the ab initio interaction energies and the experimental densities of pure liquids. An in-house genetic algorithm was applied to maximize the fitness of "chromosomes" which is a function of the root-mean-square errors (RMSE) of interaction energy and liquid density. To efficiently explore the vdW parameter space, a novel approach was developed to estimate the liquid densities for a given vdW parameter set using the mean residue-residue interaction energies through interpolation/extrapolation. This approach allowed the costly molecular dynamics simulations be performed at the end of each optimization cycle only and eliminated the simulations during the cycle. Test results show notable improvements over the original AMBER FF99 vdW parameter set, as indicated by the reduction in errors of the calculated pure liquid densities (d), heats of vaporization (H(vap)), and hydration energies. The average percent error (APE) of the densities of 59 pure liquids was reduced from 5.33 to 2.97%; the RMSE of H(vap) was reduced from 1.98 to 1.38 kcal/mol; the RMSE of solvation free energies of 15 compounds was reduced from 1.56 to 1.38 kcal/mol. For the interaction energies of 1639 dimers, the overall performance of the optimized vdW set is slightly better than the original FF99 vdW set (RMSE of 1.56 versus 1.63 kcal/mol). The optimized vdW parameter set was also evaluated for the exponential screening function used in the Amoeba force field to assess its applicability for different types of screening functions. Encouragingly, comparable performance was observed when the optimized vdW set was combined with the Thole Amoeba-like polarizable model, particularly for the interaction energy and liquid density calculations. Thus, the optimized vdW set is applicable to both types of Thole models with either linear or Amoeba-like screening functions.
The Journal of Physical Chemistry B 05/2012; 116(24):7088-101. DOI:10.1021/jp3019759 · 3.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Good and extensive experimental ADMET (absorption, distribution, metabolism, excretion, and toxicity) data is critical for developing reliable in silico ADMET models. Here we develop a PharmacoKinetics Knowledge Base (PKKB) to compile comprehensive information about ADMET properties into a single electronic repository. We incorporate more than 10 000 experimental ADMET measurements of 1685 drugs into the PKKB. The ADMET properties in the PKKB include octanol/water partition coefficient, solubility, dissociation constant, intestinal absorption, Caco-2 permeability, human bioavailability, plasma protein binding, blood-plasma partitioning ratio, volume of distribution, metabolism, half-life, excretion, urinary excretion, clearance, toxicity, half lethal dose in rat or mouse, etc. The PKKB provides the most extensive collection of freely available data for ADMET properties up to date. All these ADMET properties, as well as the pharmacological information and the calculated physiochemical properties are integrated into a web-based information system. Eleven separated data sets for octanol/water partition coefficient, solubility, blood-brain partitioning, intestinal absorption, Caco-2 permeability, human oral bioavailability, and P-glycoprotein inhibitors have been provided for free download and can be used directly for ADMET modeling. The PKKB is available online at http://cadd.suda.edu.cn/admet.
Journal of Chemical Information and Modeling 05/2012; 52(5):1132-7. DOI:10.1021/ci300112j · 3.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is of great interest in modern drug design to accurately calculate the free energies of protein-ligand or nucleic acid-ligand binding. MM-PBSA (molecular mechanics Poisson-Boltzmann surface area) and MM-GBSA (molecular mechanics generalized Born surface area) have gained popularity in this field. For both methods, the conformational entropy, which is usually calculated through normal-mode analysis (NMA), is needed to calculate the absolute binding free energies. Unfortunately, NMA is computationally demanding and becomes a bottleneck of the MM-PB/GBSA-NMA methods. In this work, we have developed a fast approach to estimate the conformational entropy based upon solvent accessible surface area calculations. In our approach, the conformational entropy of a molecule, S, can be obtained by summing up the contributions of all atoms, no matter they are buried or exposed. Each atom has two types of surface areas, solvent accessible surface area (SAS) and buried SAS (BSAS). The two types of surface areas are weighted to estimate the contribution of an atom to S. Atoms having the same atom type share the same weight and a general parameter k is applied to balance the contributions of the two types of surface areas. This entropy model was parametrized using a large set of small molecules for which their conformational entropies were calculated at the B3LYP/6-31G* level taking the solvent effect into account. The weighted solvent accessible surface area (WSAS) model was extensively evaluated in three tests. For convenience, TS values, the product of temperature T and conformational entropy S, were calculated in those tests. T was always set to 298.15 K through the text. First of all, good correlations were achieved between WSAS TS and NMA TS for 44 protein or nucleic acid systems sampled with molecular dynamics simulations (10 snapshots were collected for postentropy calculations): the mean correlation coefficient squares (R²) was 0.56. As to the 20 complexes, the TS changes upon binding; TΔS values were also calculated, and the mean R² was 0.67 between NMA and WSAS. In the second test, TS values were calculated for 12 proteins decoy sets (each set has 31 conformations) generated by the Rosetta software package. Again, good correlations were achieved for all decoy sets: the mean, maximum, and minimum of R² were 0.73, 0.89, and 0.55, respectively. Finally, binding free energies were calculated for 6 protein systems (the numbers of inhibitors range from 4 to 18) using four scoring functions. Compared to the measured binding free energies, the mean R² of the six protein systems were 0.51, 0.47, 0.40, and 0.43 for MM-GBSA-WSAS, MM-GBSA-NMA, MM-PBSA-WSAS, and MM-PBSA-NMA, respectively. The mean rms errors of prediction were 1.19, 1.24, 1.41, 1.29 kcal/mol for the four scoring functions, correspondingly. Therefore, the two scoring functions employing WSAS achieved a comparable prediction performance to that of the scoring functions using NMA. It should be emphasized that no minimization was performed prior to the WSAS calculation in the last test. Although WSAS is not as rigorous as physical models such as quasi-harmonic analysis and thermodynamic integration (TI), it is computationally very efficient as only surface area calculation is involved and no structural minimization is required. Moreover, WSAS has achieved a comparable performance to normal-mode analysis. We expect that this model could find its applications in the fields like high throughput screening (HTS), molecular docking, and rational protein design. In those fields, efficiency is crucial since there are a large number of compounds, docking poses, or protein models to be evaluated. A list of acronyms and abbreviations used in this work is provided for quick reference.
Journal of Chemical Information and Modeling 04/2012; 52(5):1199-212. DOI:10.1021/ci300064d · 3.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Inhibition of the human ether-a-go-go related gene (hERG) potassium channel may result in QT interval prolongation, which causes severe cardiac side effects and is a major problem in clinical studies of drug candidates. The development of in silico tools to filter out potential hERG potassium channel blockers in early stages of the drug discovery process is of considerable interest. Here, a diverse set of 806 compounds with hERG inhibition data was assembled, and the binary hERG classification models using naive Bayesian classification and recursive partitioning (RP) techniques were established and evaluated. The naive Bayesian classifier based on molecular properties and the ECFP_8 fingerprints yielded 84.8% accuracy for the training set using the leave-one-out (LOO) cross-validation procedure and 85% accuracy for the test set of 120 molecules. For the two additional test sets, the model achieved 89.4% accuracy for the WOMBAT-PK test set, and 86.1% accuracy for the PubChem test set. The naive Bayesian classifiers gave better predictions than the RP classifiers. Moreover, the Bayesian classifier, employing molecular fingerprints, highlights the important structural fragments favorable or unfavorable for hERG potassium channel blockage, which offers extra valuable information for the design of compounds avoiding undesirable hERG activity.
[Show abstract][Hide abstract] ABSTRACT: In this work, we have evaluated how well the general assisted model building with energy refinement (AMBER) force field performs in studying the dynamic properties of liquids. Diffusion coefficients (D) have been predicted for 17 solvents, five organic compounds in aqueous solutions, four proteins in aqueous solutions, and nine organic compounds in nonaqueous solutions. An efficient sampling strategy has been proposed and tested in the calculation of the diffusion coefficients of solutes in solutions. There are two major findings of this study. First of all, the diffusion coefficients of organic solutes in aqueous solution can be well predicted: the average unsigned errors and the root mean square errors are 0.137 and 0.171 × 10(-5) cm(-2) s(-1), respectively. Second, although the absolute values of D cannot be predicted, good correlations have been achieved for eight organic solvents with experimental data (R(2) = 0.784), four proteins in aqueous solutions (R(2) = 0.996), and nine organic compounds in nonaqueous solutions (R(2) = 0.834). The temperature dependent behaviors of three solvents, namely, TIP3P water, dimethyl sulfoxide, and cyclohexane have been studied. The major molecular dynamics (MD) settings, such as the sizes of simulation boxes and with/without wrapping the coordinates of MD snapshots into the primary simulation boxes have been explored. We have concluded that our sampling strategy that averaging the mean square displacement collected in multiple short-MD simulations is efficient in predicting diffusion coefficients of solutes at infinite dilution.
[Show abstract][Hide abstract] ABSTRACT: Molecular mechanical force field (FF) methods are useful in studying condensed phase properties. They are complementary to experiment and can often go beyond experiment in atomic details. Even a FF is specific for studying structures, dynamics and functions of biomolecules, it is still important for the FF to accurately reproduce the experimental liquid properties of small molecules that represent the chemical moieties of biomolecules. Otherwise, the force field may not describe the structures and energies of macromolecules in aqueous solutions properly. In this work, we have carried out a systematic study to evaluate the General AMBER Force Field (GAFF) in studying densities and heats of vaporization for a large set of organic molecules that covers the most common chemical functional groups. The latest techniques, such as the particle mesh Ewald (PME) for calculating electrostatic energies, and Langevin dynamics for scaling temperatures, have been applied in the molecular dynamics (MD) simulations. For density, the average percent error (APE) of 71 organic compounds is 4.43% when compared to the experimental values. More encouragingly, the APE drops to 3.43% after the exclusion of two outliers and four other compounds for which the experimental densities have been measured with pressures higher than 1.0 atm. For heat of vaporization, several protocols have been investigated and the best one, P4/ntt0, achieves an average unsigned error (AUE) and a root-mean-square error (RMSE) of 0.93 and 1.20 kcal/mol, respectively. How to reduce the prediction errors through proper van der Waals (vdW) parameterization has been discussed. An encouraging finding in vdW parameterization is that both densities and heats of vaporization approach their "ideal" values in a synchronous fashion when vdW parameters are tuned. The following hydration free energy calculation using thermodynamic integration further justifies the vdW refinement. We conclude that simple vdW parameterization can significantly reduce the prediction errors. We believe that GAFF can greatly improve its performance in predicting liquid properties of organic molecules after a systematic vdW parameterization, which will be reported in a separate paper.
Journal of Chemical Theory and Computation 07/2011; 7(7):2151-2165. DOI:10.1021/ct200142z · 5.50 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Oral bioavailability is an essential parameter in drug screening cascades and a good indicator of the capability of the delivery of a given compound to the systemic circulation by oral administration. In the present work, we report a database of oral bioavailability of 1014 molecules determined in humans. A systematic examination of the relationships between various physicochemical properties and oral bioavailability were carried out to investigate the influence of these properties on oral bioavailability. A number of property-based rules for bioavailability classification were generated and evaluated. We found that no rule was an effective predictor for oral bioavailability because these simple rules cannot characterize the influence of important metabolic processes on bioavailability. Finally, the genetic function approximation (GFA) technique was employed to construct the multiple linear regression models for oral bioavailability using structural fingerprints as the basic parameters, together with several important molecular properties. The best model is able to predict human oral bioavailability with an r of 0.79, a q of 0.72, and a RMSE (root-mean-square error) of 22.30% of the compounds from the training set. The analysis of the descriptors chosen by GFA shows that the important structural fingerprints are primarily related to important intestinal absorption and well-known metabolic processes. The predictive power of the models was further evaluated using a separate test set of 80 compounds, and the consensus model can predict the oral bioavailability with r(test) = 0.71 and RMSE = 23.55% for the tested compounds. Since the necessary molecular properties and structural fingerprints can be calculated easily and quickly, the models we proposed here may help speed up the process of finding or designing compounds with improved oral bioavailability.
[Show abstract][Hide abstract] ABSTRACT: In molecular docking, it is challenging to develop a scoring function that is accurate to conduct high-throughput screenings. Most scoring functions implemented in popular docking software packages were developed with many approximations for computational efficiency, which sacrifices the accuracy of prediction. With advanced technology and powerful computational hardware nowadays, it is feasible to use rigorous scoring functions, such as molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) and molecular mechanics/generalized Born surface area (MM/GBSA) in molecular docking studies. Here, we systematically investigated the performance of MM/PBSA and MM/GBSA to identify the correct binding conformations and predict the binding free energies for 98 protein-ligand complexes. Comparison studies showed that MM/GBSA (69.4%) outperformed MM/PBSA (45.5%) and many popular scoring functions to identify the correct binding conformations. Moreover, we found that molecular dynamics simulations are necessary for some systems to identify the correct binding conformations. Based on our results, we proposed the guideline for MM/GBSA to predict the binding conformations. We then tested the performance of MM/GBSA and MM/PBSA to reproduce the binding free energies of the 98 protein-ligand complexes. The best prediction of MM/GBSA model with internal dielectric constant 2.0, produced a Spearman's correlation coefficient of 0.66, which is better than MM/PBSA (0.49) and almost all scoring functions used in molecular docking. In summary, MM/GBSA performs well for both binding pose predictions and binding free-energy estimations and is efficient to re-score the top-hit poses produced by other less-accurate scoring functions.