About
134
Publications
24,863
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
57,119
Citations
Introduction
Garrett M. Morris currently works at the Department of Statistics, University of Oxford. Garrett does research in Machine Learning & AI, Cheminformatics, Computational Chemistry, Medicinal Chemistry, and Chemical Biology.
Current institution
Additional affiliations
Education
October 1988 - July 1991
University of Oxford
Field of study
- Computational Chemistry
October 1984 - July 1988
University of Oxford
Field of study
- Chemistry
Publications
Publications (134)
Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these is...
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of g...
Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-valu...
Graph neural networks (GNNs) are a natural choice to represent chemical data, due to their inherent ability to handle arbitrary input topologies. They avoid the need to convert molecules into molecular fingerprints with a fixed vector length. However, like most deep learning models, GNNs are not interpretable and common explainability methods fail...
Machine learning offers a promising approach for fast and accurate binding affin- ity predictions. However, current models often fail to generalise beyond their training data and are not robustly evaluated on a diverse range of benchmarks, limiting their application in drug discovery projects. In this work, we address these issues by intro- ducing...
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of g...
A novel class of protein misfolding characterized by either the formation of non-native noncovalent lasso entanglements in the misfolded structure or loss of native entanglements has been predicted to exist and found circumstantial support through biochemical assays and limited-proteolysis mass spectrometry data. Here, we examine whether it is poss...
The last few years have seen the development of numerous deep learning-based protein–ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often pr...
We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protea...
Pairs of almost identical molecules that exhibit a large activity difference against a given biological target are called activity cliffs and form an important source of pharmacological information. We computationally investigate the capabilities of current machine-learning-based activity-prediction models to detect activity cliffs. The models are...
CC and CXC-chemokines are the primary drivers of chemotaxis in inflammation, but chemokine network redundancy thwarts pharmacological intervention. Tick evasins promiscuously bind CC and CXC-chemokines, overcoming redundancy. Here we show that short peptides that promiscuously bind both chemokine classes can be identified from evasins by phage-disp...
The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often pr...
Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that po...
Here we introduce a novel method to interpret the predictions of graph neural networks (GNNs) based on Myerson values from cooperative game theory. Myerson values are closely related to Shapley values and thus provide an interpretability approach similar to the SHAP values. We developed the technique for applications in drug discovery, but it can b...
Introduction and methodology
Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. Howev...
Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that quantitative structure-activity relationship (QSAR) models struggle to predict ACs and that ACs thus form a major source of predi...
TRIM33 is a member of the tripartite motif (TRIM) family of proteins, some of which possess E3 ligase activity and are involved in the ubiquitin-dependent degradation of proteins. Four of the TRIM family proteins, TRIM24 (TIF1α), TRIM28 (TIF1β), TRIM33 (TIF1γ) and TRIM66, contain C-terminal plant homeodomain (PHD) and bromodomain (BRD) modules, whi...
Drug resistance caused by mutations is a public health threat for existing and emerging viral diseases. A wealth of evidence about these mutations and their clinically associated phenotypes is scattered across the literature, but a comprehensive perspective is usually lacking. This work aimed to produce a clinically relevant view for the case of He...
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand com...
TRIM33 is a member of the tripartite motif (TRIM) family of proteins, some of which possess E3 ligase activity and are involved in the ubiquitin-dependent degradation of proteins. Four of the TRIM family proteins, TRIM24 (TIF1α), TRIM28 (TIF1β), TRIM33 (TIF1γ) and TRIM66, contain C-terminal plant homeodomain (PHD) and bromodomain (BRD) modules, whi...
Graph Neural Networks (GNNs) have recently gained in popularity, challenging molecular fingerprints or SMILES-based representations as the predominant way to represent molecules for binding affinity prediction. Although simple ligand-based graphs alone are already useful for affinity prediction, better performance on multi-target datasets has been...
The SARS-CoV-2 coronavirus is the causal agent of the current global pandemic. SARS-CoV-2 belongs to an order, Nidovirales, with very large RNA genomes. It is proposed that the fidelity of coronavirus (CoV) genome replication is aided by an RNA nuclease complex, comprising the non-structural proteins 14 and 10 (nsp14–nsp10), an attractive target fo...
Drug resistance caused by mutations is a public health threat for existing and emerging viral diseases. A wealth of evidence about these mutations and their clinically-associated phenotypes is scattered across the literature, but a comprehensive perspective is usually lacking. This work aimed to produce a clinically-relevant view for the case of He...
Protein folding is a central challenge in computational biology, with important applications in molecular biology, drug discovery and catalyst design. As a hard combinatorial optimisation problem, it has been studied as a potential target problem for quantum annealing. Although several experimental implementations have been discussed in the literat...
We present a novel Siamese deep-learning model for the computational prediction of activity cliffs in chemical space. Activity cliffs are pairs of molecular compounds that are structurally very similar but exhibit an unexpectedly high difference in their potency against a given biological target. Activity cliffs thus reveal small structural compoun...
The main protease (Mpro) of SARS-CoV-2 is central to viral maturation and is a promising drug target, but little is known about structural aspects of how it binds to its 11 natural cleavage sites. We used biophysical and crystallographic data and an array of biomolecular simulation techniques, including automated docking, molecular dynamics (MD) an...
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked...
Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of...
The main protease (Mpro) of SARS-CoV-2 is central to its viral lifecycle and is a promising drug target, but little is known concerning structural aspects of how it binds to its 11 natural cleavage sites. We used biophysical and crystallographic data and an array of classical molecular mechanics and quantum mechanical techniques, including automate...
The calculation of the entropy of flexible molecules can be challenging, since the number of possible conformers can grow exponentially with molecule size and many low-energy conformers may be thermally accessible. Different methods have been proposed to approximate the contribution of conformational entropy to the molecular standard entropy, inclu...
The geometry of a molecule plays a significant role in determining its physical and chemical properties. Despite its importance, there are relatively few studies on ring puckering and conformations, often focused on small cycloalkanes, 5- and 6-membered carbohydrate rings, and specific macrocycle families. We lack a general understanding of the puc...
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.
We explore how the use of docke...
div>
Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building bl...
Herein we provide a living summary of the data generated during the COVID Moonshot project focused on the development of SARS-CoV-2 main protease (Mpro) inhibitors. Our approach uniquely combines crowdsourced medicinal chemistry insights with high throughput crystallography, exascale computational chemistry infrastructure for simulations, and machi...
div>The geometry of a molecule plays a significant role in determining its physical and chemical properties. Despite its importance, there are relatively few studies on ring puckering and conformations, often focused on small cycloalkanes, five- and six-membered carbohydrate rings and specific macrocycle families. We lack a general understanding of...
The SARS-CoV-2 coronavirus (CoV) causes COVID-19, a current global pandemic. SARS-CoV-2 belongs to an order of Nidovirales with very large RNA genomes. It is proposed that the fidelity of CoV genome replication is aided by an RNA nuclease complex, formed of non-structural proteins 14 and 10 (nsp14-nsp10), an attractive target for antiviral inhibiti...
The SARS-CoV-2 coronavirus (CoV) causes COVID-19, a current global pandemic. SARS-CoV-2 belongs to an order of Nidovirales with very large RNA genomes. It is proposed that the fidelity of CoV genome replication is aided by an RNA nuclease complex, formed of non-structural proteins 14 and 10 (nsp14-nsp10), an attractive target for antiviral inhibiti...
The calculation of the entropy of flexible molecules can be challenging, since the number of possible conformers grows exponentially with molecule size and many low-energy conformers may be thermally accessible. Different methods have been proposed to approximate the contribution of conformational entropy to the molecular standard entropy, includin...
Quantum computers can in principle solve certain problems exponentially more quickly than their classical counterparts. We have not yet reached the advent of useful quantum computation, but when we do, it will affect nearly all scientific disciplines. In this review, we examine how current quantum algorithms could revolutionize computational biolog...
Quantum computers can in principle solve certain problems exponentially more quickly than their classical counterparts. We have not yet reached the advent of useful quantum computation, but when we do, it will affect nearly all scientific disciplines. In this review, we examine how current quantum algorithms could revolutionize computational biolog...
Protein folding, the determination of the lowest-energy configuration of a protein, is an unsolved computational problem. If protein folding could be solved, it would lead to significant advances in molecular biology, and technological development in areas such as drug discovery and catalyst design. As a hard combinatorial optimisation problem, pro...
A key challenge in conformer sampling is finding low-energy conformations with a small number of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small molecule. Our approach balances between exploitation and exploration, and is more efficie...
Motivation:
Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information ab...
div>A key challenge in conformer sampling is to find low-energy conformations with a small number of energy evaluations. We have recently demonstrated Bayesian optimization as an effective method to search for energetically favorable conformations. This approach balances between exploitation and exploration , and lead to superior performance when c...
Machine learning scoring functions for protein-ligand binding affinity prediction have been found to consistently outperform classical scoring functions. Structure-based scoring functions for universal affinity prediction typically use features describing interactions derived from the protein-ligand complex, with limited information about the chemi...
We present Ligity: a hybrid ligand-structure-based, non-superpositional method for virtual screening of large databases of small molecules. Ligity uses the relative spatial distribution of Pharmacophoric Interaction Points (PIPs) derived from the conformations of small molecules. These are compared with the PIPs derived from key interaction feature...
Generating low-energy molecular conformers is a key task for many areas of computational chemistry, molecular modeling and cheminformatics. Most current conformer generation methods primarily focus on generating geometrically diverse conformers rather than finding the most probable or energetically lowest minima. Here, we present a new stochastic s...
One of the fundamental assumptions of fragment-based drug discovery is that the fragment’s binding mode will be conserved upon elaboration into larger compounds. The most common way of quantifying binding mode similarity is Root Mean Square Deviation (RMSD), but Protein Ligand Interaction Fingerprint (PLIF) similarity and shape-based metrics are so...
div>
Generating low-energy molecular conformers is a key task for many areas of computational chemistry, molecular modeling and cheminformatics. Most current conformer generation methods primarily focus on generating geometrically diverse conformers rather than finding the most probable or energetically lowest minima. Here, we present a new stocha...
Novel drugs to treat tuberculosis are required and the identification of potential targets is important. Piperidinols have been identified as potential antimycobacterial agents (MIC < 5 μg/mL), which also inhibit mycobacterial arylamine N-acetyltransferase (NAT), an enzyme essential for mycobacterial survival inside macrophages. The NAT inhibition...
The potassium efflux system, Kef, protects bacteria against the detrimental effects of electrophilic compounds via acidification of the cytoplasm. Kef is inhibited by glutathione (GSH), but activated by glutathione-S-conjugates (GS-X) formed in the presence of electrophiles. GSH and GS-X bind to overlapping sites on Kef, which are located in a cyto...
A major goal in computational chemistry has been to discover the set of rules that can accurately predict the binding affinity of any protein-drug complex, using only a single snapshot of its three-dimensional structure. Despite the continual development of structure-based models, predictive accuracy remains low, and the fundamental factors that in...
Some computational methods currently exist that are employed to infer protein targets of small molecules and can therefore be used to find new targets for existing drugs, with the goals of repositioning the molecule for a different therapeutic purpose or explaining off-target effects due to multiple targeting. Inherent limitations, however, arise f...
There is a growing recognition of the importance of cloud computing for large-scale and data-intensive applications. The distinguishing features of cloud computing and their relationship to other distributed computing paradigms are described, as are the strengths and weaknesses of the approach. We review the use made to date of cloud computing for...
Shape similarity is a key concept and requirement for molecular recognition. As a result, much research has been undertaken to develop methods to represent molecular shape and to quantify the shape similarity between molecules. A great variety of shape descriptions and similarity comparison approaches have been developed, ranging from explicit repr...
PfSUB1, a subtilisin-like protease of the human malaria parasite Plasmodium falciparum, is known to play important roles during the life cycle of the parasite and has emerged as a promising antimalarial drug target. In order to provide a detailed understanding of the origin of binding determinants of PfSUB1 substrates, we performed molecular dynami...
The use of computer-aided structure-based drug design prior to synthesis has proven to be generally valuable in suggesting improved binding analogues of existing ligands. Here we describe the application of the program AutoDock to the design of a focused library that was used in the "click chemistry in-situ" generation of the most potent non-covale...
Molecular modeling and simulation play a central role in academic and industrial research focused on physico-chemical properties and processes. The efforts carried out in this field have crystallized in a variety of models, simulation methods, and computational techniques that are examining the relationship between the structure, dynamics and funct...
A wide variety of structure- and ligand-based virtual screening approaches have been developed that aim at finding potential leads to initiate drug discovery efforts. Since each method has its strengths and weakness, combining the outcome of different structure- and ligand-based approaches can be expected to decrease the number of false positive pr...
Conformer generation has important implications in cheminformatics, particularly in computational drug discovery where the quality of conformer generation software may affect the outcome of a virtual screening exercise. We examine the performance of four freely available small molecule conformer generation tools (Balloon, Confab, Frog2, and RDKit)...
Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock,...
Water molecules used in the OppA test set.
(DOC)
Classification accuracies for conserved and displaced waters. Three water scoring energy terms were established to describe a water molecules binding energy (AutoDock Vina's hydrogen bonding term) and the water molecules' local environment with our hydrophilic and hydrophobic terms. These scores were used in 2 bagged tree classifiers that predicted...
Classification accuracies for waters displaced by polar and non-polar groups. The probabilistic classifiers were fit using all combinations of the water scores as for Table S5.
(DOC)
Results of different water docking methods (performed on structures in Table 2 of the main manuscript). The final WaterDock method was chosen as the one that predicted the most number of consensus water molecules for the lowest false positive rate. Various docking parameters were experimented with as well as different clustering methods. To demonst...
A gzipped archived of the water-placement scripts used.
(ZIP)
Full details of the molecular dynamics procedure and the docking procedures used for specific proteins.
(DOC)
The X-ray crystal structures of OppA used as the test set for the water placement method. Fourteen crystal structures of OppA were used as the WaterDock test set. This test set was chosen to match the test set used by the water prediction method Acqua Alta. The water molecules used in our study are shown in Table S4. These water molecules bridge th...
Book Description
The book covers a wide range of topics relevant to the development of drugs. Though this might prevent an in-depth analysis of very specific issues relevant in the pharmaceutical arena, it has the advantage of providing in a single volume a rather comprehensive description of the major methodological strategies available for ration...
In a previous paper, we presented the ElectroShape method, which we used to achieve successful ligand-based virtual screening. It extended classical shape-based methods by applying them to the four-dimensional shape of the molecule where partial charge was used as the fourth dimension to capture electrostatic information. This paper extends the app...
Aminoacyl tRNA synthetases, components of the translation apparatus, have alternative functions outside of translation. The structural and mechanistic basis of these alternative functions is of great interest. As an example, reverse transcription of the HIV genome is primed by a human lysine-specific tRNA (tRNA(Lys3)) that is packaged (into the vir...
We present ElectroShape, a novel ligand-based virtual screening method, that combines shape and electrostatic information into a single, unified framework. Building on the ultra-fast shape recognition (USR) approach for fast non-superpositional shape-based virtual screening, it extends the method by representing partial charge information as a four...
Human immunodeficiency virus type 1 (HIV-1) integrase is one of three virally encoded enzymes essential for replication and, therefore, a rational choice as a drug target for the treatment of HIV-1-infected individuals. In 2007, raltegravir became the first integrase inhibitor approved for use in the treatment of HIV-infected patients, more than a...
We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV prote...
This paper presents CSR, or Chiral Shape Recognition, a novel method to compute molecular similarity that builds on the Ultra-fast Shape Recognition (USR) method, but distinguishes enantiomers. It has great potential for generalisation, and was tested on the DUD dataset, where it was found a significant improvement in enrichment over USR having scr...
We describe a strategy for including ligand and protein polarization in docking that is based on the conversion of induced dipoles to induced charges. Induced charges have a distinct advantage in that they are readily implemented into a number of different computer programs, including many docking programs and hybrid QM/MM programs; induced charges...
Nitrilases are a large and diverse family of nonpeptidic C-N hydrolases. The mammalian genome encodes eight nitrilase enzymes, several of which remain poorly characterized. Prominent among these are nitrilase-1 (Nit1) and nitrilase-2 (Nit2), which, despite having been shown to exert effects on cell growth and possibly serving as tumor suppressor ge...
This unit describes how to set up and analyze ligand-protein docking calculations using AutoDock and the graphical user interface, AutoDockTools (ADT). The AutoDock scoring function is a subset of the AMBER force field that treats molecules using the United Atom model. The unit uses an X-ray crystal structure of Indinavir bound to HIV-1 protease ta...
Molecular docking is a key tool in structural molecular biology and computer-assisted drug design. The goal of ligand-protein docking is to predict the predominant binding mode(s) of a ligand with a protein of known three-dimensional structure. Successful docking methods search high-dimensional spaces effectively and use a scoring function that cor...
Feline immunodeficiency virus (FIV) shares with T-cell tropic strains of human immunodeficiency virus type 1 (HIV-1) the use of the chemokine receptor CXCR4 for cellular entry. In order to map the interaction of the FIV envelope surface unit (SU) with CXCR4, full-length FIV SU-Fc as well as constructs with deletions of extended loop L2, V3, V4, or...
We recently reported the pharmacological screening of a natural products-inspired library of spiroepoxide probes, resulting in the discovery of an agent MJE3 that displayed anti-proliferative effects in human breast cancer cells. MJE3 was found to covalently inactivate phosphoglycerate mutase-1 (PGAM1), a glycolytic enzyme with postulated roles in...
The authors describe the development and testing of a semiempirical free energy force field for use in AutoDock4 and similar grid-based docking methods. The force field is based on a comprehensive thermodynamic model that allows incorporation of intramolecular energies into the predicted free energy of binding. It also incorporates a charge-based m...
The development of resistance to anti-retroviral drugs targeted against HIV is an increasing clinical problem in the treatment of HIV-1-infected individuals. Many patients develop drug-resistant strains of the virus after treatment with inhibitor cocktails (HAART therapy), which include multiple protease inhibitors. Therefore, it is imperative that...
Structure models for the interaction of curcumin with HIV-1 integrase (IN) and protease (PR) were investigated using computational docking. Curcumin was found to bind preferentially in similar ways to the active sites of both IN and PR. For IN, the binding site is formed by residues Asp64, His67, Thr66, Glu92, Thr93, Asp116, Ser119, Asn120, and Lys...
CD134 is a primary binding receptor for feline immunodeficiency virus (FIV), and with CXCR4 facilitates infection of CD4(+) T cells. Human CD134 fails to support FIV infection. To delineate the regions important for defining virus specificity of CD134, we exchanged domains between human and feline CD134. The binding site for FIV surface glycoprotei...
Grid-based methods are widely used for evaluation of conformations in automated docking and other techniques of structure-based drug design. Traditional non-directional and directional methods for evaluating hydrogen bonds in these methods, however, yield improper interactions in cases with adjacent hydrogen bonds, such as those that mediate base p...
The W191G cavity of cytochrome c peroxidase is useful as a model system for introducing small molecule oxidation in an artificially created cavity. A set of small, cyclic, organic cations was previously shown to bind in the buried, solvent-filled pocket created by the W191G mutation. We docked these ligands and a set of non-binders in the W191G cav...
We used feline immunodeficiency virus (FIV) protease (PR) as a mutational framework to define determinants for the observed
substrate and inhibitor specificity distinctions between FIV and human immunodeficiency virus (HIV) PRs. Multiple-substitution
mutants were constructed by replacing the residues in and around the active site of FIV PR with the...
Based on the substrate transition state and our strategy to tackle the problem of drug resistance, a series of HIV/FIV protease (HIV /FIV PR) monocyclic inhibitors incorporating a 15- or 17-membered macrocycle with an equivalent P3 or P3' group and a unique unnatural amino acid, (2R, 3S)-3-amino-2-hydroxy-4-phenylbutyric acid, have been designed an...
Antibodies link antigen and immunological effector systems through the use of highly flexible linkers that connect the hypervariable antigen binding sites (Fabs) to the effector domain (Fc). The extensive flexibility of the antibody molecule permits antibodies to adapt to a vast array of antigen shapes and sizes while retaining a covalent link betw...