Journal of Chemical Information and Modeling (J CHEM INF MODEL )

Publisher: American Chemical Society, American Chemical Society

Description

Papers reporting new methodology or important applications in the fields of chemical informatics or molecular modeling are appropriate for submission to this Journal. Specific topics include: representation and computer-based searching of chemical databases; computer-aided molecular design; development of new computational methods or efficient algorithms for chemical software; biopharmaceutical chemistry including analyses of biological activity and other issues; related to drug discovery.

  • Impact factor
    4.30
    Hide impact factor history
     
    Impact factor
  • 5-year impact
    4.07
  • Cited half-life
    6.60
  • Immediacy index
    0.80
  • Eigenfactor
    0.02
  • Article influence
    0.89
  • Website
    Journal of Chemical Information and Modeling website
  • Other titles
    Journal of chemical information and modeling (Online), Journal of chemical information and modeling
  • ISSN
    1549-9596
  • OCLC
    54952610
  • Material type
    Document, Periodical, Internet resource
  • Document type
    Internet Resource, Computer File, Journal / Magazine / Newspaper

Publisher details

American Chemical Society

  • Pre-print
    • Author cannot archive a pre-print version
  • Restrictions
    • Must obtain written permission from Editor
    • Must not violate ACS ethical Guidelines
  • Post-print
    • Author cannot archive a post-print version
  • Restrictions
    • If mandated by funding agency or employer/ institution
    • If mandated to deposit before 12 months, must obtain waiver from Institution/Funding agency or use AuthorChoice
    • 12 months embargo
  • Conditions
    • On author's personal website, pre-print servers, institutional website, institutional repositories or subject repositories
    • Non-Commercial
    • Must be accompanied by set statement (see policy)
    • Must link to publisher version
    • Publisher's version/PDF cannot be used
    • If mandated sooner than 12 months, must obtain waiver from Editors or use AuthorChoice
    • Reviewed on 07/08/2014
  • Classification
    ​ white

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: The formation of a covalent bond with the target is essential for a number of successful drugs, yet tools for covalent docking without significant restrictions regarding warhead or receptor classes are rare and limited in use. In this work we present DOCKTITE, a highly versatile workflow for covalent docking in MOE combining automated warhead screening, nucleophilic side chain attachment, pharmacophore-based docking and a novel consensus scoring approach. The comprehensive validation study includes pose predictions of 35 protein/ligand complexes which resulted in a mean RMSD of 1.74 Å and a prediction rate of 71.4% with an RMSD below 2 Å, a virtual screening with an area under the curve (AUC) for the Receiver Operating Characteristics (ROC) of 0.81 and a significant correlation between predicted and experimental binding affinities (rho = 0.806, R² = 0.649, p < 0.005).
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physico-chemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted US patents from the last 40 years which have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification which correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis which recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supplementary Information.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ability of the insulin-degrading enzyme (IDE) to degrade amyloid-β 42 (Aβ42), a process regulated by ATP, has been studied as an alternative path in the development of drugs against Alzheimer's disease. In this study, we calculated the potential of mean force for the degradation of Aβ42 by IDE in the presence and absence of ATP by umbrella sampling with hybrid quantum mechanics and molecular mechanics (QM/MM) calculations, using the SCC-DFTB QM Hamiltonian and Amber ff99SB force field. Results indicate that the reaction occurs in two steps: the first step is characterized by the formation of the intermediate and the second by breaking the peptide bond of the substrate, the latter being the rate determining step. In our simulations, the activation energy barrier in the absence of ATP is 15 ± 2 kcal·mol-1, which is 7 kcal·mol-1 lower than in the presence of ATP, indicating that the presence of the nucleotide decreases the reaction rate by a about 105 times.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Molecular similarity methods have played a crucial role in the success of structure-based and computer-assisted drug design. However, with the exception of CoMSIA, the current approaches for estimating molecular similarity yield a global picture thereby providing limited information about the local spatial molecular features responsible for the variation of activity with the 3D structure. Application of molecular similarity measures, each related to the functional "pieces" of a ligand-receptor complex is advantageous over a composite molecular similarity alone and will provide more insights to rationally interpret the activity based on the receptor and ligand structural features. Building on the ideas of our previously published methodologies - CoRIA and LISA, we present here a local molecular similarity based receptor dependent - QSAR method termed CoRILISA which is a hybrid of the two approaches. The method improves on previous techniques by inclusion of receptor attributes for the calculation and comparison of similarity between molecules. For validation studies, the CoRILISA methodology was applied on three large and diverse data sets- glycogen phosphorylase b (GPb), human immunodeficiency virus - 1 protease (HIV PR) and cyclin dependent kinase 2 (CDK2) inhibitors. The statistics of the CoRILISA models was benchmarked against standard CoRIA approach and with the other published approaches. The CoRILISA models were found to be significantly better, especially in terms of the predictivity for the test set. CoRILISA is able to identify the thermodynamic properties associated with residues that define the active site and modulate the variation in the activity of the molecules. It is a useful tool in the fragment-based drug discovery approach for ligand activity prediction.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: As part of the CSAR 2013 benchmark exercise, we have implemented a hybrid docking and scoring workflow to rank 10 steroid ligands of an engineered digoxigenin-binding protein. Schrödinger's Glide docking software was used to generate poses for each steroid ligand and rank them according to both standard docking precision (SP) and extra docking precision (XP) scoring functions. The unique component of our approach was the use of a target-specific pose classifier trained to discriminate nativelike from decoy poses. To build the classifier, a single cognate ligand with a known native pose (PDB code 4J8T ) was docked multiple times into its target protein, and the generated poses were divided into two classes (nativelike and decoy) using a root-mean-square deviation threshold of 2 Å. All of the poses were characterized by the MCT-Tess descriptors of the protein-ligand interface, and random forest (RF) models were trained to discriminate the two classes of poses on the basis of their descriptors. The consensus pose classifier was then applied to the Glide-generated poses of each CSAR ligand in order to filter out those poses predicted as decoys and rerank the remaining ones using both XP and SP scoring functions. The best-scoring pose for each ligand following this filtering step was used for final ligand ranking. Overall, the ranking accuracy for the 10 ligands evaluated by the Spearman correlation coefficient was 0.64 for SP and 0.52 for XP but reached 0.75 for SP/RF consensus scoring (ranked third in the CSAR 2013 benchmark exercise). This study reconfirms that target-specific pose scoring models are capable of enhancing the reliability of structure-based molecular docking by discarding decoy poses.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The study of chromatographic retention of natural products can be used to increase their identification speed in complex biological matrices. In this work, six variables were used to study the retention behavior in reversed phase liquid chromatography of 39 sesquiterpene lactones (SL) from an in-house database using chemoinformatics tools. To evaluate the retention of the SL, retention parameters on an ODS C-18 Shimadzu column, in two different solvent systems were experimentally obtained, namely MeOH-H2O 55:45 and MeCN-H2O 35:75. The chemoinformatics approach involved three descriptor type sets (one 2D and two 3D) comprising three groups of each (four, five and six descriptors), two different training and test sets, four algorithms for variable selection (best-first, linear forward, greedy stepwise and genetic algorithm), and two modeling methods [partial least square regression (PLS) and back-propagation artificial neural network (ANN)]. The influence of the six variables used in this study was assessed in a holistic context, and influences on the best model for each solvent system were analyzed. The best set for MeOH-H2O showed acceptable correlation statistics with a training R(2) = 0.91, cross-validation Q(2) = 0.88, and external validation P(2) = 0.80 and the best MeCN-H2O model showed much higher correlation statistics with a training R(2) = 0.96, cross-validation Q(2) = 0.92, and external validation P(2) = 0.91. Consensus models were built for each chromatographic system and although all of them showed an improved statistical performance, only one for the MeCN-H2O system was able to separate isomers as well as to improve the performance. The approach described herein can therefore be used to generate reproducible and robust models for QSRR studies of natural products as well as an aid for dereplication of complex biological matrices using plant metabolomics based techniques.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Intelligent Automatic Design (IADE) is an expert system developed at Novartis to identify non-classical bioisosteres. In addition to bioisostere search, one could also use IADE to grow a fragment bound to a protein. Here, we report an evaluation of IADE as a tool for fragment growing. Three examples from the literature served as test case. In each case, IADE could generate close analogs of the published compounds and reproduce their crystallographic binding mode. This exercise validated the use of the IADE system for fragment growing. We have also gained experience to optimize the performance of IADE for this type of applications.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new in silico pKa prediction tool with outstanding prediction quality. An existing pKa prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature pKa values (∼11,000 compounds, representing literature chemical space) and ∼19,500 pKa values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new pKa values measured at Bayer. For the largest and most difficult test set with >16,000 pKa values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (R(2)) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and R(2) = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new pKa prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The substrate specificity is a key feature of enzymes determining their applicability in biomaterials and biotechnologies. Experimental testing of activities with novel substrates is a time-consuming and inefficient process, typically resulting in many failures. Here, we present an experimentally validated in silico method for the discovery of novel substrates of enzymes with known reaction mechanism. The method was developed for a model system of biotechnologically relevant enzymes, haloalkane dehalogenases. Based on the parameterization of six different haloalkane dehalogenases with 30 halogenated substrates, mechanism-based geometric criteria for reactivity approximation were defined. These criteria were subsequently applied to the previously experimentally uncharacterized haloalkane dehalogenase DmmA. The enzyme was computationally screened against 42,000 compounds, yielding 548 structurally unique compounds as potential substrates. Eight out of sixteen experimentally tested top-ranking compounds were active with DmmA, indicating a 50% success rate for the prediction of substrates. The remaining eight compounds were able to bind to the active site and inhibit enzymatic activity. These results confirmed good applicability of the method for prioritizing active compounds - true substrates and binders - for experimental testing. All validated substrates were large compounds often containing polyaromatic moieties, which have never before been considered as potential substrates for this enzyme family. Whereas four of these novel substrates were specific to DmmA, two substrates showed activity with three other tested haloalkane dehalogenases, i.e., DhaA, DbjA and LinB. Additional validation of the developed screening strategy with the dataset of over 200 known substrates of Candida antarctica lipase B confirmed its applicability for the identification of novel substrates of other biotechnologically relevant enzymes with available tertiary structure and known reaction mechanism.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on Amazon Elastic Cloud. We train models on open datasets of varying sizes for the endpoints logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large datasets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Glycosaminoglycans (GAGs) represent a class of anionic periodic linear polysaccharides, which mediate cell communication processes by interactions with their protein targets in the extracellular matrix. Due to their high flexibility, charged nature, periodicity, and polymeric nature GAGs are challenging systems for computational approaches. To deal with the length challenge, coarse-grained (CG) modeling could be a promising approach. In this work, we develop AMBER compatible CG parameters for GAGs using all-atomic (AA) molecular dynamics (MD) simulations in explicit solvent and the Boltzmann conversion approach. We compare both global and local properties of GAGs obtained in the simulations with AA and CG approaches, and we conclude that our CG model is appropriate for the MD approach of long GAG molecules at long time scales.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Alzheimer's disease is a neurodegenerative pathology with unmet clinical needs. A highly desirable approach to this syndrome would be to find a single lead that could bind to some or all of the selected biomolecules that participate in the amyloid cascade, the most accepted route for Alzheimer disease genesis. In order to circumvent the challenge posed by the sizable differences in the binding sites of the molecular targets we propose a computer assisted protocol based on a pharmacophore and a set of required interactions with the targets, which allows for the automated screening of candidates. We used a combination of docking and molecular dynamics protocols in order to discard non-binders, optimize the best candidates and provide a rationale for their potential as inhibitors. To provide a proof of concept we proceeded to screen the literature and databases, a task that allowed us to identify a set of carbazole containing compounds that initially showed affinity only for the cholinergic targets in our experimental assays. Two cycles of design based on our protocol led to a new set of analogues that were synthesized and assayed. The assay results revealed that the designed inhibitors improve their affinity for BACE-1 by more than three orders of magnitude, as well as displaying amyloid aggregation inhibition and affinity for AChE and BuChE, a result that led us to a group of multitarget amyloid cascade inhibitors that also could have a positive effect at the cholinergic level.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Determination of structural similarities between protein binding pockets is an important challenge in in silico drug design. It can help to understand selectivity considerations, predict unexpected ligand cross-reactivity and support the putative annotation of function to orphan proteins. To this end, Cavbase was developed as a tool for the automated detection, storage and classication of putative protein binding sites. In this context, binding sites are characterized as sets of pseudocenters, which denote surface-exposed physicochemical properties, and can be used to enable mutual binding-site comparisons. However, these comparisons tend to be computationally very demanding and often lead to very slow computations of the similarity measures. In this study, we propose RAPMAD (RApid Pocket MAtching using Distances), a new evaluation formalism for Cavbase entries which allows for ultrafast similarity comparisons. Protein binding sites are represented by sets of distance histograms that are both generated and compared with linear complexity. Attaining a speed of more than 20 000 comparisons per second, screenings across large datasets and even entire databases become easily feasible. We demonstrate the discriminative power and the short runtime by performing several classication and retrieval experiments. RAPMAD attains better success rates than the comparison formalism originally implemented into Cavbase or several alternative approaches developed in recent time, while requiring only a fraction of their runtime. The pratical use of our method is finally proven by a successful prospective virtual screening study that aims for the identication of novel inhibitors of the NMDA receptor.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present the ioChem-BD platform as a multi-headed tool aimed to manage large volumes of quantum chemistry results from a diverse group of already common simulation packages. The platform has an extensible structure, the key modules managing the main tasks: (i) upload of output files from common computational chemistry packages, (ii) extract meaningful data from the results, (iii) generate output summaries in user-friendly formats. A heavy use of the Chemical Mark-up Language (CML) is made in the intermediate files used by ioChem-BD. From them and using XSL techniques, we will manipulate and transform such chemical datasets to fulfill researchers' needs in the form of HTML5 reports, supporting information and other research media.
    Journal of Chemical Information and Modeling 12/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper is devoted to the analysis and visualization in 2-dimensional space of large datasets of millions of compounds using the incremental version of Generative Topographic Mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program has been applied to a database of more than 2 million compounds combining datasets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis have been proposed. Thus, to evaluate the chemical space coverage, the normalized Shannon entropy has been used. Different views of the data (property landscapes) can be obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helps to identify the regions in the chemical space populated by compounds with desirable physico-chemical profile and to identify the suppliers providing them. Datasets similarity in the latent space has been assessed by applying different metrics (Euclidian distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data subsets may be compared by considering them as individual objects on a meta-GTM map built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way of analysis and visualization of large chemical databases.
    Journal of Chemical Information and Modeling 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The VSviewer3D is simple Java tool for visual exploration of 3D virtual screening data. The VSviewer3D brings together the ability to explore numerical data, such as calculated properties and virtual screening scores, structure depiction, interactive topological and 3D similarity searching, and 3D visualization. By doing so the user is better able to quickly identify outliers, assess tractability of large numbers of compounds, visualize hits of interest, annotate hits, and mix and match interesting scaffolds. We demonstrate the utility of the VSviewer3D by describing a use case in a docking based virtual screen.
    Journal of Chemical Information and Modeling 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The drive toward more transparency in research, the growing willingness to make data openly available, and the reuse of data to maximize the return on research investment all increase the importance of being able to find information and make links to the underlying data. The use of metadata in Electronic Laboratory Notebooks (ELNs) to curate experiment data is an essential ingredient for facilitating discovery. The University of Southampton has developed a Web browser-based ELN that enables users to add their own metadata to notebook entries. A survey of these notebooks was completed to assess user behavior and patterns of metadata usage within ELNs, while user perceptions and expectations were gathered through interviews and user-testing activities within the community. The findings indicate that while some groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users are making little attempts to use it, thereby endangering their ability to recover data in the future. A survey of patterns of metadata use in these notebooks, together with feedback from the user community, indicated that while a few groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users adopt a "minimum required" approach to metadata. To investigate whether the patterns of metadata use in LabTrove were unusual, a series of surveys were undertaken to investigate metadata usage in a variety of platforms supporting user-defined metadata. These surveys also provided the opportunity to investigate whether interface designs in these other environments might inform strategies for encouraging metadata creation and more effective use of metadata in LabTrove.
    Journal of Chemical Information and Modeling 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: A compound's synthetic accessibility (SA) is an important aspect of drug design, since in some cases automatically designed compounds cannot be synthesized. There have been several reports on SA prediction, most of which have focused on the difficulties of synthetic reactions based on retro-synthesis analyses and reaction databases. We developed a new method of predicting SA using commercially available compound databases and molecular descriptors. SA was estimated from the probability of existence of substructures consisting of the compound in question, the number of symmetry atoms, the graph complexity, and the number of chiral centers of the compound. The probability of substructure existence was estimated based on a compound library. The predicted SA results reproduced the human-eye inspections with a correlation coefficient of 0.56. Since our method required a compound database and not a reaction database, it should be easy to customize prediction for compound vendors. The correlation between price and SA was also examined and found to be weak. The price should depend on the total cost of development and the other aspects.
    Journal of Chemical Information and Modeling 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Positive allosteric modulation of the ionotropic glutamate receptor GluA2 presents a potential treatment of cognitive disorders, e.g. Alzheimer's disease. In the present study, we describe the synthesis, pharmacology, and thermodynamic studies of a series of monofluoro-substituted 3,4-dihydro-2H-1,2,4-benzothiadiazine 1,1-dioxides. Measurements of ligand binding by isothermal titration calorimetry (ITC) showed similar binding affinities for the modulator series at the GluA2 LBD, but differences in the thermodynamic driving forces. Binding of 5c (7-F) and 6 (no-F) is enthalpy driven, 5a (5-F) and 5b (6-F) are entropy driven, and for 5d (8-F) both quantities were equal in size. Thermodynamic integration (TI) and one step perturbation (OSP) were used to calculate the relative binding affinity of the modulators. The OSP calculations had a higher predictive power than those from TI, and combined with the shorter total simulation time, we found the OSP method to be more effective for this setup. Furthermore, from the molecular dynamics simulations we extracted the enthalpies and entropies and along with the ITC data, this suggested that the differences in binding free energies are largely explained by the direct ligand-surrounding enthalpies. Furthermore, we used the OSP setup to predict binding affinities for a series of polysubstituted fluorine compounds and monosubstituted methyl compounds, and used these predictions to characterize the modulator binding pocket for this scaffold of positive allosteric modulators.
    Journal of Chemical Information and Modeling 11/2014;