PubChem as a public resource for drug discovery

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Drug discovery today (Impact Factor: 5.96). 10/2010; 15(23-24):1052-7. DOI: 10.1016/j.drudis.2010.10.003
Source: PubMed

ABSTRACT PubChem is a public repository of small molecules and their biological properties. Currently, it contains more than 25 million unique chemical structures and 90 million bioactivity outcomes associated with several thousand macromolecular targets. To address the potential utility of this public resource for drug discovery, we systematically summarized the protein targets in PubChem by function, 3D structure and biological pathway. Moreover, we analyzed the potency, selectivity and promiscuity of the bioactive compounds identified for these biological targets, including the chemical probes generated by the NIH Molecular Libraries Program. As a public resource, PubChem lowers the barrier for researchers to advance the development of chemical tools for modulating biological processes and drug candidates for disease treatments.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats. Graphical AbstractLab notebook entries must target both visualisation by scientists and use by machine learning algorithms.
    Journal of Cheminformatics 01/2015; 7:9. DOI:10.1186/s13321-015-0057-7 · 4.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of cheminformatics in the process of choosing the appropriate leads is becoming essential in the academic environment. The role of Aurora-A kinase as a major regulator of the cellular processes and the potential of its inhibitors in the treatment of cancer renders their research a top priority in the oncology studies. In this paper, we performed a structural profiling of the Aurora-A inhibitors in the search for new leading structures in the development of future Aurora-A kinase inhibitors. Based on the statistical analyses performed, the important descriptors for the Aurora-A affinity were identified and a set of rules of thumbs were elaborated in order to screen structural databases for new Aurora-A inhibitors. Thus, the hydrogen bonding capacity and the presence of nitrogen atoms in pyrazole, pyrimidine, piperazine or piperidine scaffold are prerequisite keys in order to increase the target affinity of Aurora-A inhibitors.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many scientific questions are best approached by sharing data-collected by different groups or across large collaborative networks-into a combined analysis. Unfortunately, some of the most interesting and powerful datasets-like health records, genetic data, and drug discovery data-cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.
    PLoS ONE 02/2015; 10(2):e0117898. DOI:10.1371/journal.pone.0117898 · 3.53 Impact Factor

Full-text (2 Sources)

Available from
May 30, 2014