Content uploaded by Alexander Zhavoronkov
Author content
All content in this area was uploaded by Alexander Zhavoronkov on Apr 05, 2020
Content may be subject to copyright.
Multimodal AI Engine for Clinical Trials Outcome
Prediction: Prospective Case Study of Big Pharma
for Q2 2020
Alex Zhavoronkov1*, Roman Kudrin1, Elena Tutubalina1, Anna Kuzmina1,
Daniil Korbut1, Artur Kadurin1, Daniil Polykovskiy1, and Alexander Aliper1
1Insilico Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong.
*Correspondence: alex@insilico.com
Abstract
High failure rate of drugs in clinical trials is the main reason for the rapidly increasing
costs of drug development. Accurate prediction of clinical trials for programs in early
discovery stages may help save billions of dollars and prioritize the programs that are
more likely to benefit patients. The pharmaceutical companies usually have
substantially more information about the drugs in their clinical pipelines than is available
to external parties. However, pharmaceutical R&D is a lengthy and fragmented process
with asymmetry of information and reliance on the clinical trial design, human
experience and competitive landscape. Since 2014 our team is developing clinical trials
outcomes prediction engines utilizing artificial intelligence focusing primarily on early
stage preclinical data, performing retrospective and prospective validation internally and
with the pharmaceutical companies and financial institutions. While no validation
methodology can guarantee consistent performance in the future, systems validated
using retrospective data must be tested using prospective validation where predictions
are made before the readouts are known.
We have built and tested a novel multimodal AI engine for the prediction of clinical trial
success on multiple types of features, including small molecule descriptors,
transcriptomic data, and text-mined target, and indication representations, and clinical
trial protocols. The predictor achieved 0.88 ROC AUC on predicting phase II to phase III
transition in a quasi-prospective validation setting. In this paper, we used our model to
predict transitions for all Novartis phase II trials with expected readouts in Q2 2020. We
hope that such validation will prove our model accurate on real prospective setups. This
paper is not peer-reviewed and is not intended for making clinical trial adjustments. It
will be deposited on a pre-print server so that the predictions could be compared with
the actual clinical trial readouts in the future.
1
Introduction
It is widely accepted that the clinical development of the drug is a costly and
time-consuming process. As early as in 2012 the plummeting trend of drug development
productivity was recognized in a seminal paper by Scannel et al. [1], the phenomenon
was termed “Eroom’s Law” which is a reference to Moore’s Law that shows the
exponential growth of computational power; in the case of clinical development,
unfortunately, the expenses per drug approval grow exponentially.
The exponential cost growth leads to the fact that a satisfactory return on investment in
the pharmaceutical industry is quite difficult to uphold. Accurate prediction of clinical trial
outcomes may help optimize the pipelines of pharmaceutical companies as well as
guide the decisions of hedge funds’ and investment banks’ representatives considering
the management of investment portfolios. Since deep learning systems started
outperforming humans in multiple tasks including image recognition in 2014 deep
learning techniques are rapidly propagating into biomedicine [2]. One of the common
applications of deep learning is drug purposing and repurposing [3, 4].
There are several existing tools and techniques for clinical trial scoring, we will briefly
describe the most notable ones. PrOCTOR [5] considered a small dataset consisting of
successfully launched drugs and drugs failed due to toxic side-effects. PrOCTOR used
a machine learning scoring ensemble based on several simple drug descriptors and
drug target features. Lo et al. [6] analyzed a large dataset of clinical trials and have built
a machine learning model predicting drug development program phase transition on the
basis of features mostly reflecting clinical trial design such as the number of endpoints,
masking (open-label, double-blind, etc.), and usage of biomarker analysis. Recently,
Feijoo et al. [7] extracted structured data from eligibility criteria using text mining
techniques to predict drug development programs’ phase transitions; this model was
also based on clinical trial design characteristics. The study by Qi et al. [8] focused on
the translation of phase II trial results to phase III trial results using a recurrent neural
network model that took phase II trial results as an input. Artemov et al. [9] developed a
pipeline with deep learning techniques and biology analytical tools to predict the
outcomes of phase I/II clinical trials. This pipeline predicts the side effects of a drug and
estimates drug-induced pathway activation. It then uses the predicted side effect
probabilities and pathway activation scores as an input to train a classifier that predicts
clinical trial outcomes.
2
That said, [5] and [9] used structural and target-based information about drugs, while
[6–8] relied mostly on trial design protocols or readouts from the previous phases of
clinical trials. We sought to include the approaches from these models: an extensive
dataset, multimodal data sources, biological background.
Besides building a predictive model, we analyzed its predictions using Shapely Additive
Explanations (SHAP) [10, 11] to discover clinical trials’ weakest points. Such
explanations are most important when the model predicts that a clinical trial will fail.
From our experience, in clinical trials, retrospective quality does not receive substantial
credibility from experts. Our previous attempts at prospective validation using the very
early legacy predictors, that demonstrated unprecedented performance in
cross-validation [9], did not meet the internal expectations and resulted in a complete
redesign of the pipeline and multiple subsequent experiments and new clinical trials
prediction philosophy.
We designed our engine based on feedback received from collaborations with several
large banks, hedge funds, and big pharma representatives on clinical trial scoring. Our
clinical trials’ outcome prediction pipeline shown in Figure 1 is a part of Insilico
Medicine’s comprehensive drug discovery engine. In this work, we describe our
approach and make prospective predictions for Novartis-sponsored clinical trials with
expected readouts in Q2 2020.
3
Figure 1. Insilico Medicine drug discovery pipeline. The clinical trials engine was used
for this study. The modules from the target discovery platform were used in this study
for target scoring the virtual screening modules and medicinal chemistry filters from the
chemistry platform were used for predicting the properties of small molecules.
Clinical Trials Scoring Approach
Dataset and Model
Our work builds upon a machine learning model with multiple groups of features:
molecular descriptors; ADMETox features; drug, target and indication representations
derived from biomedical documents; statistical features based on large textual datasets
(biomedical literature, patents, and government grants); various protocol features
extracted from clinical trials; drug and target omics features (Figure 2). The dataset for
training a machine learning model consists of 3,802 unique small molecules, 1,350
unique indications, and 10,922 unique drug-indication pairs in total. Using features
specified below, we built a model for predicting the drug-indication pair phase transition.
Each molecule is associated with SMILES and a textual representation containing a
compound name. In this work, we focused on predicting whether a given drug
development program will advance from phase I to phase II and from phase II to phase
III.
Figure 2. Data sources upon which the predictor of clinical trials’ outcome was built.
4
We summed up time-dependent features over the seven years preceding the year when
the previous clinical trial phase started (e.g., phase II start year for phase II to phase III
transition). Then, we concatenated these features with time-independent features and
the final feature matrix was fed to the LightGBM [12, 13] model.
Chemoinformatics and ADMETox features
To extract features from molecules, we preprocessed molecules by neutralizing them
according to internal ad-hoc rules. Then we calculated molecular descriptors using
publicly available tools (Mordred [14], RDKit [15], MCE-18 [16]) and added ADMETox
properties predicted by proprietary tools.
Temporal drug, target and indication representations
We trained a word2vec model [17] using the Gensim library [18] on PubMed abstracts
(4.5B of words). Unsupervised word embeddings capture latent chemical and
pharmaceutical knowledge from large corpora of texts [19, 20]. We set the embedding
size to 250 and other parameters as default. First, the dataset was divided into yearly
time slices from 2012 to 2018 and trained the word2vec model on each time slice. We
then applied a linear transformation to map word2vec models onto a common space
[21].
Biomedical document statistics
We utilized statistics from various biomedical textual databases using Pharmacognitive .
1
This system provides access to databases of grants, publications, patents, and clinical
trials. We focus on features on three large domains:
(i) scientific literature from PubMed,
(ii) USPTO patents,
(iii) projects from the grant-funding Agencies of the USA, Canada, the EU, and
Australia.
The Pharmacognitive
system allows for the retrieval of statistics such as the number of
documents or overall funding per year matching a query. As queries, we used Drugs
and Indication. We extended all queries with synonyms provided by Pharmacognitive
and computed the following values for each query:
● the number of publications/patents/projects published in the particular year
● the number of publications/patents/projects published before the particular year
● the average and sum of grants’ funding published in the particular year
● the average and sum of grants’ funding published before the particular year
1 https://pharmacognitive.com/
5
Clinical trial protocols
The set included enrollment, patient age, ethnicity, number of endpoints, masking
(open-label, double-blind, etc.) and contract research organization locations.
Target omics
We derived the target network scores based on interactomic and transcriptomic data
and pathway activation/inhibition scores from Insilico’s Pandomics platform . For
2
dimensionality reduction and for pathway perturbation analysis we used the In silico
Pathway Activation Network Decomposition Analysis (iPANDA) algorithm [22] which is
commonly used for target identification and evaluating the synergetic effect of
dual-target inhibition [23] and combinations.
Quasi-prospective validation
We utilized time-stamped data to ensure that each trial representation included only the
information available before the clinical trial started to prevent information leak which
could lead to the overestimation of the model performance.
Using our pipeline we performed quasi-prospective validation by splitting all drug
development programs into a training set consisting of projects ended by the specified
year and a validation set consisting of those projects that ended after the specified year.
We report the performance for the experiments assuming the splitting year to be 2015.
Each record in the training and testing sets are associated with the earliest year for
each phase. For the predictor of phase II to phase III transition, we achieved ROC AUC
of 0.88; given the prediction threshold of 0.5 the accuracy of the prediction is 0.81, the
F1-scores are 0.85 and 0.75 for the negative and positive classes respectively on the
test set described above. We also provide ROC AUC for our predictor on specific
therapeutic areas (Figure 3).
2 https://pandomics.com/
6
Figure 3. Performance of the predictor in terms of ROC AUC for phase II to phase III
transition by therapeutic category, vertical axis depicts the number of projects for each
category.
Prospective Validation
In this paper, we make prospective predictions for the upcoming clinical trials for one
pharmaceutical company using the described multimodal clinical trials prediction model.
The pharmaceutical company was selected using the following principles:
1. The company must have many phase I and phase II clinical trials reading out in
Q2 2020 for targeted small-molecule drugs
2. Only single-drug trials were counted. Combination therapies were excluded
7
3. Only small molecules with a clear target were included
4. There should be no current engagement between the company and Insilico
Medicine
Using these criteria we selected Novartis, which has 8 clinical trials meeting the
specified criteria. We show our model’s predictions in Table 1. We predict failure of one
clinical trial (NCT03650400) and explain the model's prediction in the following section.
Table 1. Predictions of phase transitions for the analyzed Novartis clinical trials (asterisk
indicates the probability of success for the current trial).
NCT ID
Conditions
Intervention
Phases
Primary
Completion
Date
Completion
Date
Probability
of reaching
Phase II
Probability
of reaching
Phase III
NCT04109313
Chronic
Spontaneous
Urticaria
Remibrutinib
Phase II
May 12, 2020
April 5, 2022
0.98
0.77*
NCT03650400
Childhood-onset
asthma
Fevipiprant
Phase II
June 23,
2020
July 23, 2020
0.49
0.11*
NCT01760525
Solid Tumor With
p53 Wild Type
Status
CGM097
Phase I
July 1, 2020
July 1, 2020
0.92*
0.62
NCT02381886
Advanced
Malignancies
That Harbor
IDHR132
Mutations
IDH305
Phase I
June 11,
2020
June 12,
2020
0.95*
0.70
NCT01677741
Neoplasms,
Brain
Dabrafenib
Phase I
May 29, 2020
May 29,
2020
0.63*
0.45
NCT03896152
Paroxysmal
Nocturnal
Hemoglobinuria
LNP023
Phase II
April 27,
2020
March 21,
2022
0.98
0.79*
8
NCT02855164
Non-alcoholic
Steatohepatitis
(NASH)
Tropifexor
Phase II
April 3, 2020
April 3, 2020
0.98
0.82*
NCT02991807
PTEN Gene
Mutation|PTEN
Hamartoma
Tumor Syndrome
Everolimus
Phase I/II
June, 2020
December,
2021
0.66
0.58*
Prediction explanation
We used SHAP values to measure an impact of the feature groups listed above to gain
insights about the predictions. We computed five scores by adding up SHAP values for
all features in each group and adding the average predictor’s output over the test set.
Target score summarizes contributions of omics features; Drug structure score
summarizes contributions of molecular descriptors and ADMETox features. We provide
scores for the Fevipiprant for childhood-onset asthma phase II clinical trial
(NCT03650400) in Figure 4. For a given trial, target and drug structure scores are low.
These scores influenced probabilities of reaching both phase II and phase III, even
though Fevipiprant is currently in phase II for childhood-onset asthma.
9
Figure 4 Reasoning behind negative prediction for clinical trial NCT03650400. The
landscape suggests that the target choice may be the weakest point for this clinical trial.
The full set of predictions and more detailed explanations are integrated into Insilico’s
clinical trial outcome prediction dashboard (Figure 5).
Figure 5 Screenshot of Insilico’s clinical trial dashboard interface.
Conclusion
Using our proprietary engine we computed the probabilities of success for 8 current
clinical trials by Novartis expected to read out in Q2 2020. With the threshold of 0.5, we
expect 7 of them to meet their primary endpoints, leaving out 1 trial of Fevipiprant for
childhood-onset asthma which we expect to fail. Our model predicts that the target
choice may lead to Fevipiprant’s failure in the current clinical trial. We recognize the
possibility that COVID19 pandemic may delay the readouts but still hope that the
predictions will provide insight into our engine’s real-world performance and prove its
applicability.
10
Disclaimer
This paper is not peer-reviewed. This document is to be posted on a preprint server for the
purposes of date stamping and prospective validation of the clinical trial prediction engine. The
choice of a pharmaceutical company for the case study was based on the number of clinical
studies expected to read out in Q2 2020 that the AI engine may be able to predict in a
fully-automated manner. These predictions are highly speculative and should not be used to
derive conclusions about the clinical trials. No insider information from the pharmaceutical
company was used in this study. While the system was used to predict clinical trials for several
large pharmaceutical companies, Novaris was selected as there is no active collaboration
between the companies and there is absolutely no chance that any internal data was used for
making these predictions. These predictions were not provided to the hedge funds and
investment banks piloting the applications of Insilico’s AI engine in advance of the publication. In
the future we are planning to date stamp several other predictions of clinical trials outcomes.
Conflict of Interest
All of the authors of this study are affiliated with Insilico Medicine (www.insilico.com), a for-profit
company developing critical intellectual property in the field of artificial intelligence for generative
biology, chemistry, and prediction of clinical trials outcomes with the aim of complete integration
of multiple previously disconnected areas of pharmaceutical drug discovery and development.
The company is engaged in aging research to acquire a deep understanding of the longitudinal
changes in human biology and the relationship between multiple data types.
References
[1] Scannell JW, Blanckley A, Boldon H, et al. Diagnosing the decline in pharmaceutical R&D
efficiency. Nat Rev Drug Discov
2012; 11: 191–200.
[2] Mamoshina P, Vieira A, Putin E, et al. Applications of Deep Learning in Biomedicine. Mol
Pharm
2016; 13: 1445–1454.
[3] Vanhaelen Q, Mamoshina P, Aliper AM, et al. Design of efficient computational workflows
for in silico drug repurposing. Drug Discov Today
2017; 22: 210–222.
[4] Aliper A, Plis S, Artemov A, et al. Deep Learning Applications for Predicting
Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data.
Mol Pharm
2016; 13: 2524–2530.
[5] Gayvert KM, Madhukar NS, Elemento O. A Data-Driven Approach to Predicting Successes
and Failures of Clinical Trials. Cell Chem Biol
2016; 23: 1294–1301.
[6] Lo AW, Siah KW, Wong CH. Machine learning with statistical imputation for predicting drug
approvals. Harvard Data Science Review
.
[7] Feijoo F, Palopoli M, Bernstein J, et al. Key indicators of phase transition for clinical trials
11
through machine learning. Drug Discov Today
2020; 25: 414–421.
[8] Qi Y, Tang Q. Predicting Phase 3 Clinical Trial Results by Modeling Phase 2 Clinical Trial
Subject Level Data Using Deep Learning. In: Doshi-Velez F, Fackler J, Jung K, et al. (eds)
Proceedings of the 4th Machine Learning for Healthcare Conference
. Ann Arbor, Michigan:
PMLR, 2019, pp. 288–303.
[9] Artemov AV, Putin E, Vanhaelen Q, et al. Integrated deep learned transcriptomic and
structure-based predictor of clinical trials outcomes. BioRxiv
,
https://www.biorxiv.org/content/10.1101/095653v2.abstract (2016).
[10] Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with
explainable AI for trees. Nature machine intelligence
2020; 2: 2522–5839.
[11] Lundberg SM, Lee S-I. A Unified Approach to Interpreting Model Predictions. In: Guyon I,
Luxburg UV, Bengio S, et al. (eds) Advances in Neural Information Processing Systems 30
.
Curran Associates, Inc., 2017, pp. 4765–4774.
[12] Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision
Tree. In: Guyon I, Luxburg UV, Bengio S, et al. (eds) Advances in Neural Information
Processing Systems 30
. Curran Associates, Inc., 2017, pp. 3146–3154.
[13] Ke G, Meng Q, Finley T, et al. Welcome to LightGBM’s documentation! — LightGBM 2.3.2
documentation. LightGBM
.
[14] Moriwaki H, Tian Y-S, Kawashita N, et al. Mordred: a molecular descriptor calculator. J
Cheminform
2018; 10: 4.
[15] Landrum G, Others. RDKit: Open-source cheminformatics, http://www.rdkit.org (2006).
[16] Ivanenkov YA, Zagribelnyy BA, Aladinskiy VA. Are We Opening the Door to a New Era of
Medicinal Chemistry or Being Collapsed to a Chemical Singularity? Perspective. J Med
Chem
2019; 62: 10026–10043.
[17] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases
and their Compositionality. In: Burges CJC, Bottou L, Welling M, et al. (eds) Advances in
Neural Information Processing Systems 26
. Curran Associates, Inc., 2013, pp. 3111–3119.
[18] Radim Rehurek PS. Software Framework for Topic Modelling with Large Corpora. In: IN
PROCEEDINGS OF THE LREC 2010 WORKSHOP ON NEW CHALLENGES FOR NLP
FRAMEWORKS
, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.695.4595 (2010,
accessed 31 March 2020).
[19] Tshitoyan V, Dagdelen J, Weston L, et al. Unsupervised word embeddings capture latent
knowledge from materials science literature. Nature
2019; 571: 95–98.
[20] Madzhidov TI, Tutubalina EV, Miftahutdinov ZS, et al. Using semantic analysis of texts for
the identification of drugs with similar therapeutic effects. Russ Chem Bull
2017; 66:
2180–2189.
[21] Artetxe M, Labaka G, Agirre E. A robust self-learning method for fully unsupervised
12
cross-lingual mappings of word embeddings. Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers)
. Epub ahead of print
2018. DOI: 10.18653/v1/p18-1073.
[22] Ozerov IV, Lezhnina KV, Izumchenko E, et al. In silico Pathway Activation Network
Decomposition Analysis (iPANDA) as a method for biomarker development. Nat Commun
2016; 7: 13427.
[23] Ravi R, Noonan KA, Pham V, et al. Bifunctional immune checkpoint-targeted
antibody-ligand traps that simultaneously disable TGFβ enhance the efficacy of cancer
immunotherapy. Nat Commun
2018; 9: 741.
13