PubChem3D: Conformer generation

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA. .
Journal of Cheminformatics (Impact Factor: 4.55). 01/2011; 3(1):4. DOI: 10.1186/1758-2946-3-4
Source: PubMed


PubChem, an open archive for the biological activities of small molecules, provides search and analysis tools to assist users in locating desired information. Many of these tools focus on the notion of chemical structure similarity at some level. PubChem3D enables similarity of chemical structure 3-D conformers to augment the existing similarity of 2-D chemical structure graphs. It is also desirable to relate theoretical 3-D descriptions of chemical structures to experimental biological activity. As such, it is important to be assured that the theoretical conformer models can reproduce experimentally determined bioactive conformations. In the present study, we investigate the effects of three primary conformer generation parameters (the fragment sampling rate, the energy window size, and force field variant) upon the accuracy of theoretical conformer models, and determined optimal settings for PubChem3D conformer model generation and conformer sampling.
Using the software package OMEGA from OpenEye Scientific Software, Inc., theoretical 3-D conformer models were generated for 25,972 small-molecule ligands, whose 3-D structures were experimentally determined. Different values for primary conformer generation parameters were systematically tested to find optimal settings. Employing a greater fragment sampling rate than the default did not improve the accuracy of the theoretical conformer model ensembles. An ever increasing energy window did increase the overall average accuracy, with rapid convergence observed at 10 kcal/mol and 15 kcal/mol for model building and torsion search, respectively; however, subsequent study showed that an energy threshold of 25 kcal/mol for torsion search resulted in slightly improved results for larger and more flexible structures. Exclusion of coulomb terms from the 94s variant of the Merck molecular force field (MMFF94s) in the torsion search stage gave more accurate conformer models at lower energy windows. Overall average accuracy of reproduction of bioactive conformations was remarkably linear with respect to both non-hydrogen atom count ("size") and effective rotor count ("flexibility"). Using these as independent variables, a regression equation was developed to predict the RMSD accuracy of a theoretical ensemble to reproduce bioactive conformations. The equation was modified to give a minimum RMSD conformer sampling value to help ensure that 90% of the sampled theoretical models should contain at least one conformer within the RMSD sampling value to a "bioactive" conformation.
Optimal parameters for conformer generation using OMEGA were explored and determined. An equation was developed that provides an RMSD sampling value to use that is based on the relative accuracy to reproduce bioactive conformations. The optimal conformer generation parameters and RMSD sampling values determined are used by the PubChem3D project to generate theoretical conformer models.

26 Reads
  • Source
    • "(Bolton et al., 2011a). The first ten most diverse conformations (of which the first is the theoretically most energy-minimized conformation in a vacuum) (Bolton et al., 2011b; Hawkins et al., 2010) of each fatty acid were compared to those ten of punicic acid, which was chosen as reference molecule (Gasmi and Sanderson, 2010). The two types of resultant similarity scores (we use the terminology of Bolton et al. throughout), ST (shape-Tanimoto) and CT (color-Tanimoto), representing 'shape-similarity' based on overlap of molecular volume and 'feature-similarity' based on alignment of 'compatible' functional groups, respectively, were used to evaluate the degree of similarity of the fatty acids to punicic acid. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Plant-derived non-essential fatty acids are important dietary nutrients, and some are purported to have chemopreventive properties against various cancers, including that of the prostate. In this study, we determined the ability of seven dietary C-18 fatty acids to cause cytotoxicity and induce apoptosis in various types of human prostate cancer cells. These fatty acids included jacaric and punicic acid found in jacaranda and pomegranate seed oil, respectively, three octadecatrienoic geometric isomers (alpha- and beta-calendic and catalpic acid) and two mono-unsaturated C-18 fatty acids (trans- and cis-vaccenic acid). Jacaric acid and four of its octadecatrienoic geoisomers selectively induced apoptosis in hormone-dependent (LNCaP) and -independent (PC-3) human prostate cancer cells, whilst not affecting the viability of normal human prostate epithelial cells (RWPE-1). Jacaric acid induced concentration- and time-depedent LNCaP cell death through activation of intrinsic and extrinsic apoptotic pathways resulting in cleavage of PARP-1, modulation of pro- and antiapoptotic Bcl-2 family of proteins and increased cleavage of caspase-3, -8 and -9. Moreover, activation of a cell death-inducing signalling cascade involving death receptor 5 was observed. Jacaric acid induced apoptosis in PC-3 cells by activation of the intrinsic pathway only. The spatial conformation cis, trans, cis of jacaric and punicic acid was shown to play a key role in the increased potency and efficacy of these two fatty acids in comparison to the five other C-18 fatty acids tested. Three-dimensional conformational analysis using the PubChem Database ( showed that the cytotoxic potency of the C-18 fatty acids was related to their degree of conformational similarity to our cytotoxic reference compound, punicic acid, based on optimized shape (ST) and feature (CT) similarity scores, with jacaric acid being most 'biosimilar' (STST-opt=0.81; CTCT-opt=0.45). This 3-D analysis of structural similarity enabled us to rank geoisomeric fatty acids according to cytotoxic potency, whereas a 2-D positional assessment of cis/trans structure did not. Our findings provide mechanistic evidence that nutrition-derived non-essential fatty acids have chemopreventive biological activities and Exhibit 3-D structure-activity relationships that could be exploited to develop new strategies for the prevention or treatment of prostate cancer regardless of hormone dependency.
    Phytomedicine: international journal of phytotherapy and phytopharmacology 02/2013; 20(8). DOI:10.1016/j.phymed.2013.01.012 · 3.13 Impact Factor
  • Source
    • "PubChem3D generates a 3-D conformer model description for each record in the PubChem Compound database, when it satisfies the following conditions [13]: (1) not too large (with 50 or fewer non-hydrogen atoms); (2) not too flexible (with no more than 15 rotatable bonds); (3) has only a single covalent unit (i.e., not a salt or mixture); (4) consists of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I); (5) contains only atom-types recognized by the Merck Molecular Force Field (MMFF94s) [16,17]; and (6) five or fewer undefined atom (R,S) and bond (E,Z) stereo centers. This 3-D description can be employed to enhance existing PubChem search and analysis methodologies by means of 3-D similarity [10], helping the user identify useful structure-activity relationships that might go unrecognized by the PubChem 2-D similarity method. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules. Results The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (STST-opt) and combo-Tanimoto (ComboTST-opt), and color-optimized color-Tanimoto (CTCT-opt) and combo-Tanimoto (ComboTCT-opt). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively). Conclusion This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.
    Journal of Cheminformatics 01/2013; 5(1):1. DOI:10.1186/1758-2946-5-1 · 4.55 Impact Factor
  • Source
    • "The PubChem3D project [5-11] augments the utility of PubChem, by adding computed three-dimensional (3-D) descriptions to about 90% of the small molecules contained in the PubChem Compound database [6,11]. Each of these may include multiple 3-D conformations that are sampled to remove redundancy, guaranteeing a minimum (non-hydrogen atom pair-wise) root-mean-square distance (RMSD) between conformers. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces “on average”, although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way.
    Journal of Cheminformatics 11/2012; 4(1):28. DOI:10.1186/1758-2946-4-28 · 4.55 Impact Factor
Show more