Ola Engkvist

Ola Engkvist
AstraZeneca | AZ · iMED, Discovery Sciences

PhD

About

258
Publications
34,225
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,054
Citations
Introduction
Ola Engkvist currently works at Discovery Sciences, IMED Biotech Unit, AstraZeneca. Ola does research in Chemo-informatics, Medicinal Chemistry and Machine Learning.
Additional affiliations
October 2004 - present
AstraZeneca
Position
  • Group Leader

Publications

Publications (258)
Preprint
p>New scientific knowledge is needed more urgently than ever, to address global challenges such as climate change, sustainability, health and societal well-being. Could artificial intelligence (AI) accelerate the scientific process to meet global challenges in time? AI is already revolutionizing individual scientific disciplines, but we argue here...
Preprint
p>New scientific knowledge is needed more urgently than ever, to address global challenges such as climate change, sustainability, health and societal well-being. Could artificial intelligence (AI) accelerate the scientific process to meet global challenges in time? AI is already revolutionizing individual scientific disciplines, but we argue here...
Article
Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utilize the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inferen...
Article
Full-text available
PROteolysis TArgeting Chimeras (PROTACs) use the ubiquitin-proteasome system to degrade a protein of interest for therapeutic benefit. Advances made in targeted protein degradation technology have been remarkable, with several molecules having moved into clinical studies. However, robust routes to assess and better understand the safety risks of PR...
Preprint
Recent developments in artificial intelligence and automation could potentially enable a new drug design paradigm: autonomous drug design. Under this paradigm, generative models provide suggestions on thousands of molecules with specific properties. However, since only a limited number of molecules can be synthesized and tested, an obvious challeng...
Preprint
A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expr...
Article
Full-text available
Reinforcement learning is a powerful paradigm that has gained popularity across multiple domains. However, applying reinforcement learning may come at the cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, cau...
Article
Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collecti...
Article
Full-text available
Despite the intuitive value of adopting the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in both academic and industrial sectors, challenges exist in resourcing, balancing long- versus short-term priorities, and achieving technical implementation. This situation is exacerbated by the unclear mechanisms by which costs and bene...
Article
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (...
Article
Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being perfo...
Preprint
In this work, we present Link-INVENT as an extension to the existing de novo molecular design platform REINVENT. We provide illustrative examples on how Link-INVENT can be applied on fragment linking, scaffold hopping, and PROTACs design case studies where the desirable molecules should satisfy a combination of different criteria. With the help of...
Article
Full-text available
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking...
Article
Full-text available
The therapeutic and research potentials of oligonucleotides (ONs) have been hampered in part by their inability to effectively escape endosomal compartments to reach their cytosolic and nuclear targets. Splice-switching ONs (SSOs) can be used with endosomolytic small molecule compounds to increase functional delivery. So far, development of these c...
Preprint
Full-text available
Reinforcement learning (RL) is a powerful paradigm that has gained popularity across multiple domains. However, applying RL may come at a cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive pe...
Article
We present machine learning models for predicting the chemical context for Buchwald‐Hartwig coupling reactions, i.e., what chemicals to add to the reactants to give a productive reaction. Using reaction data from in‐house electronic lab notebooks, we train two models: one based on single‐label data and one based on multi‐label data. Both models sho...
Preprint
We present Icolos, a workflow manager written in Python as a tool for automating complex structure-based workflows. Icolos can be used as a standalone tool, for example in virtual screening campaigns, or can be used in conjunction with deep learning-based molecular generation facilitated for example by REINVENT, a previously published de novo desig...
Preprint
Full-text available
PROTACs (PROteolysis TArgeting Chimeras) use the ubiquitin-proteasome system to degrade a protein of interest for therapeutic benefit. Advances in targeted protein degradation technology have been remarkable with several molecules moving into clinical studies. However, robust routes to assess and better understand the safety risks of PROTACs need t...
Article
Full-text available
We expand the recent work on clustering of synthetic routes and train a deep learning model to predict the distances between arbitrary routes. The model is based on an long short-term memory (LSTM) representation of a synthetic route and is trained as a twin network to reproduce the tree edit distance (TED) between two routes. The ML approach is ap...
Preprint
Full-text available
Identifying synthetic routes for molecules of interest is a crucial step when discovering new drugs or materials. To find synthetic routes, we can use computer-assisted synthesis planning using expansion policy networks trained on reaction templates extracted from patents and the literature. However, experience has shown that these networks are bia...
Preprint
Improving on the standard of care for diseases is predicated on better treatments, which in turn relies on finding and developing new drugs. However, drug discovery is a complex and costly process. Adoption of methods from machine learning has given rise to creation of drug discovery knowledge graphs which utilize the inherent interconnected nature...
Preprint
Full-text available
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking...
Preprint
Full-text available
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking...
Article
Full-text available
Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compou...
Conference Paper
Artificial Intelligence has become impactful during the last few years in chemistry and the life sciences, pushing the scientific boundaries forward as exemplified by the recent success of AlphaFold2. In this presentation I will provide an overview of how AI have impacted drug design in the last few years, where we are now and what progress we ca...
Chapter
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the...
Preprint
Full-text available
We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy around 90%, which suggests strong predictivity. T...
Preprint
Full-text available
Reinforcement learning (RL) is a powerful paradigm that has gained popularity across multiple domains. However, applying RL may come at a cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive pe...
Preprint
We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy around 90%, which suggests strong predictivity. T...
Preprint
Full-text available
Reinforcement learning (RL) is a powerful paradigm that has gained popularity across multiple domains. However, applying RL may come at a cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive pe...
Preprint
Reinforcement learning (RL) is a powerful paradigm that has gained popularity across multiple domains. However, applying RL may come at a cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive pe...
Chapter
Capsule Networks (CapsNets) is a machine learning architecture proposed to overcome some of the shortcomings of convolutional neural networks (CNNs). However, CapsNets have mainly outperformed CNNs in datasets where images are small and/or the objects to identify have minimal background noise. In this work, we present a new architecture, parallel C...
Preprint
Full-text available
Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better covera...
Preprint
Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better covera...
Preprint
Full-text available
Due to the strong relationship between desired molecular activity to its structural core, screening of focused, core sharing chemical libraries is a key step in lead optimisation. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been prop...
Preprint
Due to the strong relationship between desired molecular activity to its structural core, screening of focused, core sharing chemical libraries is a key step in lead optimisation. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been prop...
Article
Full-text available
Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements...
Preprint
Full-text available
Capsule Networks (CapsNets) is a machine learning architecture proposed to overcome some of the shortcomings of convolutional neural networks (CNNs). However, CapsNets have mainly outperformed CNNs in datasets where images are small and/or the objects to identify have minimal background noise. In this work, we present a new architecture, parallel C...
Preprint
Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compou...
Preprint
Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compou...
Preprint
Machine learning methods have proven to be effective tools for molecular design, allowing for efficient exploration of the vast chemical space via deep molecular generative models. Here, we propose a graph-based deep generative model for de novo molecular design using reinforcement learning. We demonstrate how the reinforcement learning framework c...
Article
We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol 1 as a starting point in a ligand based design context, we have shown in a retrospectiv...
Preprint
Full-text available
Computer aided synthesis planning is a rapidly growing field for suggesting synthetic routes for molecules of interest. The methods used are usually dependent on access to large datasets for training, but with a finite experimental budget there are limitations on how much data can be obtained from experiments. Active learning, which has been used i...
Preprint
Full-text available
Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better covera...
Preprint
Computer aided synthesis planning is a rapidly growing field for suggesting synthetic routes for molecules of interest. The methods used are usually dependent on access to large datasets for training, but with a finite experimental budget there are limitations on how much data can be obtained from experiments. Active learning, which has been used i...
Preprint
Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better covera...
Preprint
We expand our recent work on clustering of synthesis routes and train a deep learning model to predict the distances between arbitrary routes. The model is based on an long short-term memory (LSTM) representation of a synthesis route and is trained as a twin network to reproduce the tree edit distance (TED) between two routes. The ML approach is ap...
Preprint
Full-text available
Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being perfo...
Preprint
p>We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol 1 as a starting point in a ligand based design context, we have shown in a retrospect...
Preprint
p>In the context of small molecule property prediction, experimental errors are usually a neglected aspect during model generation. The main caveat to binary classification approaches is that they weight minority cases close to the threshold boundary equivalently in distinguishing between activity classes. For example, a pXC50 activity value of 5.1...
Preprint
Due to the strong relationship between desired molecular activity to its structural core, screening of focused, core sharing chemical libraries is a key step in lead optimisation. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been prop...
Preprint
Finding molecules with a desirable balance of multiple properties is a main challenge in drug discovery. Here, we focus on the task of molecular optimization, where a starting molecule with promising properties needs to be further optimized towards the desirable properties. Typically, chemists would apply chemical transformations to the starting mo...
Article
Full-text available
A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our cas...
Article
The understanding of the mechanism-of-action (MoA) of compounds and the prediction of potential drug targets play an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed using bioactivity data from the ExCAPE database, i...
Article
Full-text available
Malaria is a disease affecting hundreds of millions of people across the world, mainly in developing countries and especially in sub-Saharan Africa. It is the cause of hundreds of thousands of deaths each year and there is an ever-present need to identify and develop effective new therapies to tackle the disease and overcome increasing drug resista...
Preprint
Drug discovery and development is an extremely complex process, with high attrition contributing to the costs of delivering new medicines to patients. Recently, various machine learning approaches have been proposed and investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Among these techniques...
Article
Collaborative efforts between public and private entities such as academic institutions, governments, and pharmaceutical companies form an integral part of scientific research, and notable instances of such initiatives have been created within the life science community. Several examples of alliances exist with the broad goal of collaborating towar...