PubChem as a public resource for drug discovery

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Drug discovery today (Impact Factor: 5.96). 10/2010; 15(23-24):1052-7. DOI: 10.1016/j.drudis.2010.10.003
Source: PubMed

ABSTRACT PubChem is a public repository of small molecules and their biological properties. Currently, it contains more than 25 million unique chemical structures and 90 million bioactivity outcomes associated with several thousand macromolecular targets. To address the potential utility of this public resource for drug discovery, we systematically summarized the protein targets in PubChem by function, 3D structure and biological pathway. Moreover, we analyzed the potency, selectivity and promiscuity of the bioactive compounds identified for these biological targets, including the chemical probes generated by the NIH Molecular Libraries Program. As a public resource, PubChem lowers the barrier for researchers to advance the development of chemical tools for modulating biological processes and drug candidates for disease treatments.


Available from: Yanli Wang, May 29, 2015
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Target identification is one of the most critical steps following cell-based phenotypic chemical screens aimed at identifying compounds with potential uses in cell biology and for developing novel disease therapies. Current in silico target identification methods, including chemical similarity database searches, are limited to single or sequential ligand analysis that have limited capabilities for accurate deconvolution of a large number of compounds with diverse chemical structures. Here, we present CSNAP (Chemical Similarity Network Analysis Pulldown), a new computational target identification method that utilizes chemical similarity networks for large-scale chemotype (consensus chemical pattern) recognition and drug target profiling. Our benchmark study showed that CSNAP can achieve an overall higher accuracy (>80%) of target prediction with respect to representative chemotypes in large (>200) compound sets, in comparison to the SEA approach (60-70%). Additionally, CSNAP is capable of integrating with biological knowledge-based databases (Uniprot, GO) and high-throughput biology platforms (proteomic, genetic, etc) for system-wise drug target validation. To demonstrate the utility of the CSNAP approach, we combined CSNAP's target prediction with experimental ligand evaluation to identify the major mitotic targets of hit compounds from a cell-based chemical screen and we highlight novel compounds targeting microtubules, an important cancer therapeutic target. The CSNAP method is freely available and can be accessed from the CSNAP web server (
    PLoS Computational Biology 03/2015; 11(3):e1004153. DOI:10.1371/journal.pcbi.1004153 · 4.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats. Graphical AbstractLab notebook entries must target both visualisation by scientists and use by machine learning algorithms.
    Journal of Cheminformatics 12/2015; 7(1):9. DOI:10.1186/s13321-015-0057-7 · 4.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An histone deacetylase (HDAC) inhibitor database (HDACiDB) was constructed to enable rapid access to data relevant to the development of epigenetic modulators (HDAC inhibitors [HDACi]), helping bring precision cancer medicine a step closer. Thousands of HDACi targeting HDACs are in various stages of development and are being tested in clinical trials as monotherapy and in combination with other cancer agents. Despite the abundance of HDACi, information resources are limited. Tools for in silico experiments on specific HDACi prediction, for designing and analyzing the generated data, as well as custom-made specific tools and interactive databases, are needed. We have developed an HDACiDB that is a composite collection of HDACi and currently comprises 1,445 chemical compounds, including 419 natural and 1,026 synthetic ones having the potential to inhibit histone deacetylation. Most importantly, it will allow application of Lipinski's rule of five drug-likeness and other physicochemical property-based screening of the inhibitors. It also provides easy access to information on their source of origin, molecular properties, drug likeness, as well as bioavailability with relevant references cited. Being the first comprehensive database on HDACi that contains all known natural and synthetic HDACi, the HDACiDB may help to improve our knowledge concerning the mechanisms of actions of available HDACi and enable us to selectively target individual HDAC isoforms and establish a new paradigm for intelligent epigenetic cancer drug design. The database is freely available on the website.
    Drug Design, Development and Therapy 04/2015; 20159:2257-2264. DOI:10.2147/DDDT.S78276 · 3.03 Impact Factor