PubChem3D: Conformer generation

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA. .
Journal of Cheminformatics (Impact Factor: 4.54). 01/2011; 3(1):4. DOI: 10.1186/1758-2946-3-4
Source: PubMed

ABSTRACT PubChem, an open archive for the biological activities of small molecules, provides search and analysis tools to assist users in locating desired information. Many of these tools focus on the notion of chemical structure similarity at some level. PubChem3D enables similarity of chemical structure 3-D conformers to augment the existing similarity of 2-D chemical structure graphs. It is also desirable to relate theoretical 3-D descriptions of chemical structures to experimental biological activity. As such, it is important to be assured that the theoretical conformer models can reproduce experimentally determined bioactive conformations. In the present study, we investigate the effects of three primary conformer generation parameters (the fragment sampling rate, the energy window size, and force field variant) upon the accuracy of theoretical conformer models, and determined optimal settings for PubChem3D conformer model generation and conformer sampling.
Using the software package OMEGA from OpenEye Scientific Software, Inc., theoretical 3-D conformer models were generated for 25,972 small-molecule ligands, whose 3-D structures were experimentally determined. Different values for primary conformer generation parameters were systematically tested to find optimal settings. Employing a greater fragment sampling rate than the default did not improve the accuracy of the theoretical conformer model ensembles. An ever increasing energy window did increase the overall average accuracy, with rapid convergence observed at 10 kcal/mol and 15 kcal/mol for model building and torsion search, respectively; however, subsequent study showed that an energy threshold of 25 kcal/mol for torsion search resulted in slightly improved results for larger and more flexible structures. Exclusion of coulomb terms from the 94s variant of the Merck molecular force field (MMFF94s) in the torsion search stage gave more accurate conformer models at lower energy windows. Overall average accuracy of reproduction of bioactive conformations was remarkably linear with respect to both non-hydrogen atom count ("size") and effective rotor count ("flexibility"). Using these as independent variables, a regression equation was developed to predict the RMSD accuracy of a theoretical ensemble to reproduce bioactive conformations. The equation was modified to give a minimum RMSD conformer sampling value to help ensure that 90% of the sampled theoretical models should contain at least one conformer within the RMSD sampling value to a "bioactive" conformation.
Optimal parameters for conformer generation using OMEGA were explored and determined. An equation was developed that provides an RMSD sampling value to use that is based on the relative accuracy to reproduce bioactive conformations. The optimal conformer generation parameters and RMSD sampling values determined are used by the PubChem3D project to generate theoretical conformer models.

Download full-text


Available from: Evan Bolton, Aug 27, 2015
  • Source
    • "(Bolton et al., 2011a). The first ten most diverse conformations (of which the first is the theoretically most energy-minimized conformation in a vacuum) (Bolton et al., 2011b; Hawkins et al., 2010) of each fatty acid were compared to those ten of punicic acid, which was chosen as reference molecule (Gasmi and Sanderson, 2010). The two types of resultant similarity scores (we use the terminology of Bolton et al. throughout), ST (shape-Tanimoto) and CT (color-Tanimoto), representing 'shape-similarity' based on overlap of molecular volume and 'feature-similarity' based on alignment of 'compatible' functional groups, respectively, were used to evaluate the degree of similarity of the fatty acids to punicic acid. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Plant-derived non-essential fatty acids are important dietary nutrients, and some are purported to have chemopreventive properties against various cancers, including that of the prostate. In this study, we determined the ability of seven dietary C-18 fatty acids to cause cytotoxicity and induce apoptosis in various types of human prostate cancer cells. These fatty acids included jacaric and punicic acid found in jacaranda and pomegranate seed oil, respectively, three octadecatrienoic geometric isomers (alpha- and beta-calendic and catalpic acid) and two mono-unsaturated C-18 fatty acids (trans- and cis-vaccenic acid). Jacaric acid and four of its octadecatrienoic geoisomers selectively induced apoptosis in hormone-dependent (LNCaP) and -independent (PC-3) human prostate cancer cells, whilst not affecting the viability of normal human prostate epithelial cells (RWPE-1). Jacaric acid induced concentration- and time-depedent LNCaP cell death through activation of intrinsic and extrinsic apoptotic pathways resulting in cleavage of PARP-1, modulation of pro- and antiapoptotic Bcl-2 family of proteins and increased cleavage of caspase-3, -8 and -9. Moreover, activation of a cell death-inducing signalling cascade involving death receptor 5 was observed. Jacaric acid induced apoptosis in PC-3 cells by activation of the intrinsic pathway only. The spatial conformation cis, trans, cis of jacaric and punicic acid was shown to play a key role in the increased potency and efficacy of these two fatty acids in comparison to the five other C-18 fatty acids tested. Three-dimensional conformational analysis using the PubChem Database ( showed that the cytotoxic potency of the C-18 fatty acids was related to their degree of conformational similarity to our cytotoxic reference compound, punicic acid, based on optimized shape (ST) and feature (CT) similarity scores, with jacaric acid being most 'biosimilar' (STST-opt=0.81; CTCT-opt=0.45). This 3-D analysis of structural similarity enabled us to rank geoisomeric fatty acids according to cytotoxic potency, whereas a 2-D positional assessment of cis/trans structure did not. Our findings provide mechanistic evidence that nutrition-derived non-essential fatty acids have chemopreventive biological activities and Exhibit 3-D structure-activity relationships that could be exploited to develop new strategies for the prevention or treatment of prostate cancer regardless of hormone dependency.
    Phytomedicine: international journal of phytotherapy and phytopharmacology 02/2013; 20. DOI:10.1016/j.phymed.2013.01.012 · 2.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The shape diversity of 16.4 million biologically relevant molecules from the PubChem Compound database and their 1.46 billion diverse conformers was explored as a function of molecular volume. The diversity of shape space was investigated by determining the shape similarity threshold to achieve a maximum on the count of reference shapes per unit of conformer volume. The rate of growth in shape space, as represented by a decreasing shape similarity threshold, was found to be remarkably smooth as a function of volume. There was no apparent correlation between the count of conformers per unit volume and their diversity, meaning that a single reference shape can describe the shape space of many chemical structures. The ability of a volume to describe the shape space of lesser volumes was also examined. It was shown that a given volume was able to describe 40-70% of the shape diversity of lesser volumes, for the majority of the volume range considered in this study. The relative growth of shape diversity as a function of volume and shape similarity is surprisingly uniform. Given the distribution of chemicals in PubChem versus what is theoretically synthetically possible, the results from this analysis should be considered a conservative estimate to the true diversity of shape space.
    Journal of Cheminformatics 03/2011; 3(1):9. DOI:10.1186/1758-2946-3-9 · 4.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features. The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes. The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.
    Journal of Cheminformatics 05/2011; 3:13. DOI:10.1186/1758-2946-3-13 · 4.54 Impact Factor
Show more