Article

PubChem3D: Similar conformers

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA. .
Journal of Cheminformatics (Impact Factor: 4.54). 05/2011; 3:13. DOI: 10.1186/1758-2946-3-13
Source: PubMed

ABSTRACT PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features.
The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes.
The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.

Download full-text

Full-text

Available from: Evan Bolton, Jun 23, 2015
0 Followers
 · 
103 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Promiscuous inhibition of the human ether-à-go-go-related gene (hERG) potassium channel by drugs poses a major risk for life threatening arrhythmia and costly drug withdrawals. Current knowledge of this phenomenon is derived from a limited number of known drugs and tool compounds. However, in a diverse, naïve chemical library, it remains unclear which and to what degree chemical motifs or scaffolds might be enriched for hERG inhibition. Here we report electrophysiology measurements of hERG inhibition and computational analyses of >300,000 diverse small molecules. We identify chemical 'communities' with high hERG liability, containing both canonical scaffolds and structurally distinctive molecules. These data enable the development of more effective classifiers to computationally assess hERG risk. The resultant predictive models now accurately classify naïve compound libraries for tendency of hERG inhibition. Together these results provide a more complete reference map of characteristic chemical motifs for hERG liability and advance a systematic approach to rank chemical collections for cardiotoxicity risk.
    PLoS ONE 02/2015; 10(2):e0118324. DOI:10.1371/journal.pone.0118324 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Plant-derived non-essential fatty acids are important dietary nutrients, and some are purported to have chemopreventive properties against various cancers, including that of the prostate. In this study, we determined the ability of seven dietary C-18 fatty acids to cause cytotoxicity and induce apoptosis in various types of human prostate cancer cells. These fatty acids included jacaric and punicic acid found in jacaranda and pomegranate seed oil, respectively, three octadecatrienoic geometric isomers (alpha- and beta-calendic and catalpic acid) and two mono-unsaturated C-18 fatty acids (trans- and cis-vaccenic acid). Jacaric acid and four of its octadecatrienoic geoisomers selectively induced apoptosis in hormone-dependent (LNCaP) and -independent (PC-3) human prostate cancer cells, whilst not affecting the viability of normal human prostate epithelial cells (RWPE-1). Jacaric acid induced concentration- and time-depedent LNCaP cell death through activation of intrinsic and extrinsic apoptotic pathways resulting in cleavage of PARP-1, modulation of pro- and antiapoptotic Bcl-2 family of proteins and increased cleavage of caspase-3, -8 and -9. Moreover, activation of a cell death-inducing signalling cascade involving death receptor 5 was observed. Jacaric acid induced apoptosis in PC-3 cells by activation of the intrinsic pathway only. The spatial conformation cis, trans, cis of jacaric and punicic acid was shown to play a key role in the increased potency and efficacy of these two fatty acids in comparison to the five other C-18 fatty acids tested. Three-dimensional conformational analysis using the PubChem Database (http://pubchem.ncbi.nlm.nih.gov) showed that the cytotoxic potency of the C-18 fatty acids was related to their degree of conformational similarity to our cytotoxic reference compound, punicic acid, based on optimized shape (ST) and feature (CT) similarity scores, with jacaric acid being most 'biosimilar' (STST-opt=0.81; CTCT-opt=0.45). This 3-D analysis of structural similarity enabled us to rank geoisomeric fatty acids according to cytotoxic potency, whereas a 2-D positional assessment of cis/trans structure did not. Our findings provide mechanistic evidence that nutrition-derived non-essential fatty acids have chemopreventive biological activities and Exhibit 3-D structure-activity relationships that could be exploited to develop new strategies for the prevention or treatment of prostate cancer regardless of hormone dependency.
    Phytomedicine: international journal of phytotherapy and phytopharmacology 02/2013; DOI:10.1016/j.phymed.2013.01.012 · 2.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules. Results The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (STST-opt) and combo-Tanimoto (ComboTST-opt), and color-optimized color-Tanimoto (CTCT-opt) and combo-Tanimoto (ComboTCT-opt). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively). Conclusion This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.
    Journal of Cheminformatics 01/2013; 5(1):1. DOI:10.1186/1758-2946-5-1 · 4.54 Impact Factor