PreprintPDF Available

Potential 2019-nCoV 3C-like protease inhibitors designed using generative deep learning approaches

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

***** [We are putting the draft of this manuscript out to get the feedback from the medicinal chemistry community on the molecules generated by our generative chemistry pipeline targeting COVID-2019 3C-like protease in a 4-day sprint. Comments on novelty, diversity, activity, similarity to other compounds, synthetic availability, and other properties are very welcome. Several representative compounds are described in the manuscript but the majority of compounds are available in SDF file at https://insilico.com/ncov-sprint/ ] * ***** * We are planning to synthesize some compounds but there may be many compounds that may look promising and may be synthesized and tested by the many groups worldwide or serve as starting scaffolds for further optimization. * ***** * Abstract: The emergence of the 2019 novel coronavirus (COVID-19), for which there is no vaccine or any known effective treatment created a sense of urgency for novel drug discovery approaches. One of the most important COVID-19 protein targets is the 3C-like protease for which the crystal structure is known. Most of the immediate efforts are focused on drug repurposing of known clinically-approved drugs and virtual screening for the molecules available from chemical libraries that may not work well. For example, the IC50 of lopinavir, an HIV protease inhibitor, against the 3C-like protease is approximately 50 micromolar, which is far from ideal. In an attempt to address this challenge, on January 28th, 2020 Insilico Medicine decided to utilize a part of its generative chemistry pipeline to design novel drug-like inhibitors of COVID-19 and started generation on January 30th. It utilized three of its previously validated generative chemistry approaches: crystal-derived pocked-based generator, homology modelling-based generation, and ligand-based generation. Novel druglike compounds generated using these approaches were published at www.insilico.com/ncov-sprint/. Several molecules will be synthesized and tested using the internal resources; however, the team is seeking collaborations to synthesize, test, and, if needed, optimize the published molecules.
Content may be subject to copyright.
NOTE: None of the molecules have been synthesized or tested in vitro or in vivo. These are not drugs
for COVID-19 coronavirus. Expert medicinal chemists are encouraged to review and comment on the
molecules in the article and on the website.
Potential COVID-19 3C-like protease inhibitors designed using generative deep
learning approaches
Alex Zhavoronkov, Vladimir Aladinskiy, Alexander Zhebrak, Bogdan Zagribelnyy, Victor
Terentiev, Dmitry S. Bezrukov, Daniil Polykovskiy, Rim Shayakhmetov, Andrey Filimonov,
Philipp Orekhov, Yilin Yan, Olga Popova, Quentin Vanhaelen, Alex Aliper, Yan Ivanenkov
Insilico Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong
Corresponding author: Alex Zhavoronkov, email: alex@insilico.com
The emergence of the 2019 novel coronavirus (COVID-19), for which there is no vaccine or any
known effective treatment created a sense of urgency for novel drug discovery approaches.
One of the most important COVID-19 protein targets is the 3C-like protease for which the crystal
structure is known. Most of the immediate efforts are focused on drug repurposing of known
clinically-approved drugs and virtual screening for the molecules available from chemical
libraries that may not work well. For example, the IC50 of lopinavir, an HIV protease inhibitor,
against the 3C-like protease is approximately 50 micromolar, which is far from ideal. In an
attempt to address this challenge, on January 28th, 2020 Insilico Medicine decided to utilize a
part of its generative chemistry pipeline to design novel drug-like inhibitors of COVID-19 and
started generation on January 30th. It utilized three of its previously validated generative
chemistry approaches: crystal-derived pocked-based generator, homology modelling-based
generation, and ligand-based generation. Novel druglike compounds generated using these
approaches were published at www.insilico.com/ncov-sprint/. Several molecules will be
synthesized and tested using the internal resources; however, the team is seeking
collaborations to synthesize, test, and, if needed, optimize the published molecules.
Introduction
Coronaviruses (CoVs) are a large family of viruses belonging to the family Coronaviridae. The
limited number of coronaviruses known to be circulating in humans cause mild infections and
they were regarded as relatively harmless respiratory human pathogens 1. The emergence of
the severe acute respiratory syndrome coronavirus (SARS-CoV) and the Middle East
Respiratory Syndrome (MERS) virus revealed that coronaviruses can cause severe and
sometimes fatal respiratory tract infections in humans. The first known case of SARS-CoV
occurred in Foshan, China in November 2002 and new cases emerged in mainland China in
February 2003. The first emergence of MERS-CoV occurred in June 2012 in Saudi Arabia 2.
These events demonstrated that the threats of CoVs should not be underestimated and that it is
of paramount importance to advance the knowledge on the replication of these viruses and their
interactions with the hosts to develop treatments and vaccines. These successive outbreaks
also highlight the long-term threat of cross-species transmission events leading to outbreaks in
humans and the possible re-emergence of similar virus infection that should be considered
seriously 3. SARS-CoV and MERS-CoV are two major causes of severe atypical pneumonia in
humans and share important features that contribute to preferential viral replication in the lower
respiratory tract and viral immunopathology. In December 2019, atypical pneumonia cases
emerged in Wuhan, Hubei, China, with clinical presentations consistent with viral pneumonia.
The cause was quickly identified as being a novel CoV, which was named 2019 novel
coronavirus (COVID-19). Investigations of the epidemiological, clinical, laboratory and
radiological characteristics, treatment, and outcomes of patients infected by COVID-19
demonstrated that the infection caused clusters of severe respiratory illness similar to
SARS-CoV 4. Early clinical investigations showed that although the COVID-19 can cause
severe illness in some patients, it initially did not transmit readily between people. However,
more recent epidemiological data suggest the new virus has undergone human host
adaptation/evolution and has become more efficient in human to human transmission. Analysis
of COVID-19 genome sequences obtained from patients during the beginning of the outbreak
demonstrated that they are almost identical to each other and share 79.5% sequence identity to
SARS-CoV 5. The COVID-2019 is 96% identical at the whole genome level to a bat coronavirus.
The COVID-19 genomic sequence was used to perform comparative genetic and functional
analysis with the human SARS virus and coronaviruses recovered from other species.
Phylogenetic analysis of CoVs of different species indicated that COVID-19 could have
originated from Chinese horseshoe bats, but the intermediate transmission vehicle has not yet
been identified 6. According to this study, COVID-19 belongs to a novel type of bat coronavirus
owing to a high degree of variation from the human SARS virus. COVID-19 is the seventh
member of the family of CoVs that infect humans. Like SARS-CoV, COVID-19 enters target
cells through an endosomal pathway and also uses the same cell entry receptor,
Angiotensin-converting enzyme II (ACE2) 57. Detailed analysis of the interaction of receptor
binding domains (RBDs) of COVID-19 with human ACE2 indicated that the affinity of binding to
a human cell is lower than that of human SARS virus from which it was inferred that the
infectivity and pathogenicity of this new virus could be lower than the human SARS virus 6.
Single-cell RNA expression profiling of ACE2 was carried out 8. The analysis of the ACE2 RNA
expression profile in the normal human lungs showed that the ACE2 virus receptor expression
is concentrated in a small population of type II alveolar cells which is also expressed many other
genes that positively regulate COVID-19 reproduction and transmission.
CoV structure and main strategies for targeting COVID-19
The Coronaviridae family consists of four genera based on their genetic properties, including
genus Alphacoronavirus, genus Betacoronavirus, genus Gammacoronavirus, and genus
Deltacoronavirus. The coronavirus RNA genome (ranging from 26 to 32 kb) is the largest
among all RNA viruses and the viral particle is about 125 nm in diameter 9. CoVs have a
complex genome expression strategy. In addition to a role in virus replication or virus assembly,
many of the CoV proteins expressed in the infected cell contribute to the coronavirus-host
interactions. This includes interactions with the host cell to create an optimal environment for
CoV replication, alteration of the host gene expression and neutralization of the host’s antiviral
defenses. These coronavirus–host interactions are key to viral pathogenesis 10. The genes for
non-structural proteins constitute two-thirds of the CoV genome. Among the structural proteins,
4 are of special interest namely spike (S), envelope (E), membrane (M), and nucleocapsid (N).
The S, E, and M proteins are contained within the viral membrane. The M and E proteins are
involved in viral assembly, while the N protein is required for RNA genome assembly. The S
protein, a surface-located trimeric glycoprotein of CoVs, plays a functional role in viral entry into
host cells, viral infection, and pathogenesis and was considered as a major therapeutic target
for treatments and vaccines against SARS-CoV and MERS-CoV. Therapeutics investigated at
that time included peptides that block RBD-ACE2-binding and peptides that bind the S protein to
inhibit the production of functional S1 and S2 subunits and the consequent fusion of the viral
envelope with the host cell membrane 1.
Although CoVs share many similarities they also have undergone substantial genetic evolution.
Identification of promising targets for antiviral therapies and vaccines against COVID-19 should
exploit the structural similarities between SARS-CoV and COVID-19 and focus on proteins that
are highly conserved across multiple CoVs. There is an ongoing effort to ensure that all
scientific materials known about COVID-19 such as curated data and updated research reports
are available to the scientific community. For instance, the initiative
https://ghddi-ailab.github.io/Targeting2019-nCoV/ supported by the Global Health Drug
Discovery Institute (GHDDI) contains experimental data of CoV related studies, homology
models for COVID-19 targets as well as for SARS-CoV and MERS-CoV protein targets. Among
the many potential targets against SARS-CoV and several other CoVs, replication-related
enzymes, such as protease, are highly conserved 5. Drugs that inhibit conserved proteases are
capable of preventing replication and proliferation of the virus by interfering with the
post-translational processing of essential viral polypeptides. They can also reduce the risk of
mutation mediated drug-resistance. This was the case for the SARS-CoV 11, as inhibitors
targeting the main protease involved in replication and proliferation were the most effective
means to alleviate the epidemic. Once the target is identified, computational drug repurposing
procedures were launched to identify suitable drugs. Following this approach, Lopinavir and
Ritonavir, two HIV-1 protease inhibitors, were identified to be capable of inhibiting SARS-CoV
main protease 12. The SARS-CoV main protease has 96.1% of similarity with the COVID-19
main protease, hence it can be used as a homologous target for screening drugs that inhibit the
replication and proliferation of COVID-19.
In this work the selected target is the C30 Endopeptidase, also referred to as the 3C-like
proteinase or coronavirus 3C-like protease (3CLP) or coronavirus main protease (Mpro). 3CLP is
a homodimeric cysteine protease and a member of a family of enzymes found in the
Coronavirus polyprotein 13. It cleaves the polyproteins into individual polypeptides that are
required for replication and transcription 14 15. Following the translation of the messenger RNA to
yield the polyproteins, the 3CLP is first auto-cleaved from the polyproteins to become a mature
enzyme 16. The 3CLP then cleaves all the 11 remaining downstream non-structural proteins.
3CLP plays a central role in the viral replication cycle and is an attractive target against the
human SARS virus 17
Computational Approaches for COVID-19
Computational drug repurposing is an effective approach to find new indications for already
known drugs 18 19. A computational drug repurposing approach typically relies on an integrated
pipeline which includes a virtual screening of drug libraries to find suitable drug-target pairs
using methods such as molecular similarity while homology modelling is used to model the
target. Molecular docking and binding free energy calculations are used to predict drug-target
interactions and binding affinity 20. The emergence of resistance to existing antiviral drugs and
re-emerging viral infections are the biggest challenges in antiviral drug discovery. The drug
repurposing approach allows finding new antiviral agents within a short period to overcome the
challenges in antiviral therapy. Computational drug repurposing has been used to identify drug
candidates for viral infectious diseases like Ebola, ZIKA, dengue and influenza infections 21.
These methods were also used to identify potential drugs against SARS-CoV and MERS-CoV
22;23 and following the COVID-19 outbreak, computational repurposing has been applied for
COVID-19. The results of some of those investigations have already been reported. For
instance, by looking for drugs with high binding capacity with SARS-CoV main protease, 4
small molecule drugs, Prulifloxacin, Bictegravir, Nelfinavir, and Tegobuvi, were identified as
repurposing candidates against COVID-1924. These 4 molecules were selected by
high-throughput computational screening of a library of 8,000 experimental and approved drugs
and small molecules obtained from Drugbank and using the structures and sequences of
SARS-CoV main protease downloaded from the PDB database. Molecular similarity search was
performed by using a strategy based on the similar sequences of the structure-revealed
molecules. The crystal structure of the main protease monomer was used as a target protein for
molecular docking and a protein-ligand interaction analysis was performed on the resulting 690
candidates. Toxins, neurologic drugs, and antitumor drugs with strong side effects were
discarded from the initial set of 690 candidates leaving 50 molecules with the capability to bind
the SARS-CoV main protease. After filtering for approved drugs and performing further kinetic
and biochemical analysis, the four remaining drugs were Prulifloxacin, Bictegravir, Nelfinavir,
and Tegobuvi. Interestingly, Nelfinavir, an HIV-1 protease inhibitor to treat HIV, was also
predicted to be a potential inhibitor of COVID-19 main protease by another computational-based
study combining homology modelling, molecular docking and binding free energy calculation 25.
In this work, the main COVID-19 protease structures were modeled using the SARS homologue
(PDB ID: 2GTB) as a template. Molecular docking was performed and 1903 approved drugs
were tested against the model. Based on the docking score and after further three-dimensional
similarity analysis, 15 drugs were selected. 10 additional new models of the main COVID-19
protease were used for additional docking analysis of these 15 drugs. 6 drugs (Nelfinavir,
Praziquantel, Pitavastatin, Perampanel, Eszopiclone, and Zopiclone) had good binding modes
and were selected for further analysis. Binding free energy calculation was performed for 4 of
the 6 drugs and Nelfinavir was selected as the most promising candidate. In another recent
study26, the main COVID-19 protease was also used as a target to find repurposing candidates
through computational screening among clinically approved medicines. The study identified a
list of 10 commercial medicines that may form hydrogen bonds to key residues within the
binding pocket of COVID-19 main protease and may also have a higher tolerance to resistance
mutations.
Generative Chemistry Approaches
Considering the virtually unlimited number of chemical structures that can be generated de
novo
, conventional computational drug design approaches tend to include limited numbers of
fragments and/or employ sophisticated search strategies to sample hit compounds from a
predefined area of the chemical space. To enable scientists to exploit the whole drug-like
chemical space, a new type of computational methods for drug discovery has been developed
using the recent advances in deep learning (DL) and artificial intelligence (AI). Such techniques
can automatically extract high-dimensional abstract information without the need for manual
feature design and learn nonlinear mappings between molecular structures and their biological
and pharmacological properties. Deep generative models can utilize large datasets for training
and perform in silico design of de novo molecular structures with predefined properties 27. The
first model of this type, a molecular generator using an adversarial auto-encoder (AAE) to
generate molecular fingerprints, was released in early 2017 28. Since then, many architectures
were proposed to generate not just valid chemical structures, but also molecules matching
certain bioactivity and novelty profiles as well as other features of interest. Several milestones
were recently accomplished with the use of generative chemistry in drug discovery,
demonstrating that it is possible to generate molecules that can be synthesized, are active in
vitro
, metabolically stable, and elicit in vivo activity in disease-relevant models. The first example
of an in vitro active molecule obtained through generative chemistry was the JAK3 inhibitor 29.
Another generative model, Generative Tensorial Reinforcement Learning (GENTRL), generated
discoidin domain receptor DDR1 and DDR2 inhibitors. DDR1 and DDR2 inhibitors with different
property and selectivity profiles were assayed in vitro
, followed by in vivo mouse experiments
that validate the pharmacokinetics of DDR1 inhibitors 30. This experiment demonstrated that
generative chemistry is capable of finding novel molecular structures with optimized properties
which could not be found using repurposing approaches and other standard computational
methods. With a timeframe of fewer than 25 days between the initial target selection and the
generation of the lead compounds, it demonstrates that this method is also time effective.
Insilico Medicine COVID-19 Sprint Timeline and Methods
Insilico Medicine’s drug discovery system consists of three main pipelines: target discovery,
small molecule drug discovery, and predictors of clinical trial outcomes (Figure 1). This system
is designed to achieve maximum automation of drug discovery processes for a broad range of
human diseases. Our small molecule drug discovery pipeline can be used to generate inhibitors
of bacterial and viral protein targets. Multiple publications explaining the basic concepts and
approaches in generative chemistry were published by the team 28–36.
Since there is a known protease target for COVID-19 and its sequence and structure are known,
we decided to apply only the generative chemistry pipeline to generate the possible drug-like
hits.
Figure 1: Insilico Medicine drug discovery pipeline. The generative modules utilizing crystal
structure, homology modelling, and ligand-based generative chemistry pipelines were used to
generate the molecules for the 3C-like protease.
At the end of January, the news of the COVID-19 showed that the virus is substantially more
dangerous than previously thought. While multiple teams already proposed the most likely
repurposing candidates, we decided to support the ongoing efforts with a different strategy and
employed the generative chemistry approach to design novel small molecules designed
specifically against COVID-19. Using the COVID-19 3C-like protease as a target, we planned
out the generative chemistry timeline (Figure 2) starting with target selection on January 28th
and publication of the molecules from the three generative approaches on February 5th. We
also agreed with the key synthetic chemistry partner to start synthesizing and testing several
generated molecules right after publication. Three parallel approaches were utilized to generate
novel structures (pocket-based, ligand-based and homology model-based generation,
represented in Figure 3).
Figure 2: Insilico Medicine COVID-19 Small Molecule Generation Sprint Timeline
Input data and datasets
Crystal structure of COVID-19 3C-like protease
The crystal structure of COVID-19 3C-like protease was obtained from Dr. Rao’s laboratory. The
structure was solved with a 2.1-angstrom resolution in complex with the covalent inhibitor
named N3. The SARS-CoV main protease has been previously crystallized with the same
inhibitor.17 The ligand was extracted from the crystal and employed in the ligand-based
generation. Then, the binding site was annotated utilizing our proprietary pocket module to
create amino acid residues mapping suitable as input data for target structure-based
generation.
Homology modelling
The homology model of the COVID-19 3C-like protease in complex with non-covalent ligand
was built using the primary sequence corresponding to its crystal structure provided by Dr.
Rao’s laboratory (vide supra
). The X-ray structure 4MDS 37 (1.6 Å resolution) of SARS-CoV Mpro
was used as a template which was co-crystallized with a non-covalent inhibitor and had a very
high level of similarity with the COVID-19 3C-like protease (95.25% identity). The homology
modelling was performed using SWISS-MODEL 38 39. Given the almost complete identity of
COVID-19 and SARS-CoV proteases in their ligand binding sites, we further refined the
obtained homology model with the inhibitor bound in ligand pocket using position restrained
minimization with GROMACS 40 with the Cαatoms of protein and all heavy atoms of ligand
restrained by harmonic constraints (k
spring=100 kJ/mol/nm2). Two protonation states of His41
situated in the binding pocket were considered. The constructed homology model was
preprocessed for generation as described above for crystal structure.
Co-crystalized fragment
The 3D structure of the N3 inhibitor was extracted from the solved complex. The propanoate
substructure was replaced by a propenoate, then it was converted to the E-configuration to
restore the compound structure that occurred before covalent addition. The obtained
conformation was used to build the shape of the ligand as well as two pharmacophore
hypotheses using our proprietary modules. For each hypothesis, 7 pharmacophore points were
selected according to the interactions in the initial crystal structure and coverage of the
peptidomimetic scaffold. The constructed ligand shape and hypotheses were exploited for
estimating how generated structures fit the structural features essential for binding.
Protease dataset
The protease dataset was assembled with molecules active against various proteases in
enzymatic assays extracted from the Integrity database 41, Experimental Pharmacology module
and ChEMBL 42,43. The records from the ChEMBL database were downloaded with the following
activity standard types: 'Potency', 'IC50', 'K
i', 'EC50', 'K
d' (assay confidence score ≥ 8, assay type:
B, F). The activities from the Integrity database were downloaded using the following
parameters: 'IC50', 'K
i', 'EC50', 'K
d', and mass concentrations (e.g. mg/l) were converted to M
values by molecular weight. Integrity records were standardized using the pChEMBL value
format (logarithmic scale –log10 of a numeric value in M) and merged with the records from
ChEMBL. The resulting records with pChEMBL values less than 5.0 (10μM in terms of IC50)
were then removed.
The structural duplicates were filtered out after the standardization procedure and the removal
of salt parts from salt compositions. Mild medicinal chemistry filters (MCFs) were applied to filter
out highly non-drug-like molecules (e.g. metals, polycondensed aromatics, chloramines,
radicals, hydrazines, isonitriles, nitroso compounds) as well as structures containing cycles
bigger than 8 atoms and polypeptides (n≥4). The resulting dataset contained 60,293 unique
structures.
To tailor the scoring and the rewarding functions to the given problem, a protease
peptidomimetics dataset was collected from the protease dataset using SMARTS queries for
common peptidomimetic substructures, filtering compounds with pChEMBL value less than 6.0,
and suppressing the overrepresented chemotypes. The resulting protease peptidomimetics
dataset
contained 5,891 compounds.
Generative pipeline
We launched Insilico Medicine’s generative chemistry platform for every input data type: crystal
structure, homology model and co-crystalized ligand.
Figure 3: Insilico Medicine COVID-19 Small Molecule Generation Procedure
During the generative phase, a total of 28 machine learning (ML) models generated molecular
structures and optimized them with reinforcement learning (RL) employing the reward function
described below. We used different ML approaches such as generative autoencoders,
generative adversarial networks, genetic algorithms, and language models. The models
exploited various molecular representations, including fingerprints, string representations, and
graphs. Every model was optimizing the reward function to explore the chemical space, exploit
promising clusters, and generate new molecules with high scores. The rewarding function was a
weighted sum of multiple intermediate rewards: medicinal chemistry and drug-likeness scoring,
active chemistry scoring, structural scoring (fitting to ligand features and/or binding pocket),
novelty scoring, and diversity scoring.
Medicinal chemistry scoring assigned a low reward to molecules with structural alerts and a high
reward to molecules with useful substructures. Drug-likeness scoring drove the generation
towards the molecules with molecular properties that are representative for protease
peptidomimetics dataset
—logP: 1.49–6.00; Molecular weights (MW): 400–800; Number of
hydrogen bond donors (HBD): 1–10; Number of hydrogen bond acceptors (HBA): 2–10;
Topological polar surface area (TopoPSA): 80–210; MCE-1844: 40–180; Number of
stereocenters (nSC): 0–3.
Active chemistry scoring utilized self-organizing maps trained on protease peptidomimetics
dataset
. We used novelty and diversity scoring in the optimization procedure to explore the
chemical space and output a novel and diverse set of molecular structures. Generated
compounds were penalized for the similarity to the existing molecules and previously explored
clusters. We performed structural scoring with the provided crystal structure or homology model
and pharmacophore/shape scoring for structure-based and ligand-based generations,
respectively. We ran the distributed pipeline for 72 hours on the internal computing cluster with
64 NVIDIA Titan V GPUs.
Results
In this study, we used our proprietary generative chemistry pipeline utilizing the knowledge of
the crystal structure and homology model of the target protein. We launched the generative
pipeline three times for every input data type: crystal structure, homology models and
co-crystalized ligand. For each launch, the highest-ranking structures were selected for further
analysis. Figure 4 shows some representative examples from the chemical space produced by
our generative pipeline launch for the crystal structure. More compounds for generations based
on crystal structure, homology models, and co-crystallized ligand are available as described in
the section “Availability of structures”. These virtual structures display high 3D-complexity and
correspondingly high values of MCE-18, and contain stereo- and/or spiro centers (Table 1),
which are common characteristics of peptidomimetics and PPI inhibitors. We assessed the
similarity of the structures with compounds from the ChEMBL database using the search engine
on the ChEMBL website. The analysis revealed that there are no molecules with the same core
structure among the compounds with similarity values more than 0.7 (see Figure 5).
Figure 4. Representative examples of the structures generated to target the main protease of
COVID-19. Novelty was assessed using similarity search in ChEMBL Database. ChEMBL ID
numbers and maximal similarity coefficients are listed, “no” means that there are no structures
with similarity >0.7. IMPORTANT NOTE: these are sample representative molecules not
prioritized for synthesis. The list of generated molecules for which human medicinal chemistry
feedback is requested is available at: https://insilico.com/ncov-sprint/ .
Table 1. The physicochemical descriptors for the representative examples of generated
structures. MW—molecular weight, nRot—number of rotatable bonds, nAR—number of
aromatic rings, nSC—number of stereocenters, HBA—number of hydrogen bond acceptors,
HBD—number of hydrogen bond donors, MCE-18—medicinal chemistry evolution 2018
descriptor.
ID
MW
nRot
nAR
nSC
HBA
HBD
MCE-18
TopoPSA
INSCoV-001
9
3
1
4
162
115
INSCoV-002
9
3
2
3
100
79
INSCoV-003
10
3
2
7
105
163
INSCoV-004
3
2
1
5
88
108
INSCoV-005
5
3
1
1
163
108
INSCoV-006
9
3
1
4
104
120
Figure 5. The assessment of similarity between generated structures and compounds from the
ChEMBL database utilizing the tool implemented into ChEMBL search. The closest molecules
from ChEMBL with ID numbers are presented on the right as well as ChEMBL similarity scores.
IMPORTANT NOTE: these are sample representative molecules not prioritized for synthesis.
The list of generated molecules for which human medicinal chemistry feedback is requested is
available at: https://insilico.com/ncov-sprint/ .
Availability of Structures
The most recent data package is available at insilico.com/ncov-sprint. We will continue to
update the data package with new compounds during the following weeks. These data could be
used to perform subsequent computer modelling simulations or to synthesize and test the
compounds in vitro
against the COVID-19 main protease.
Conclusion and discussion
Despite the economic and societal impact of CoV infections and the likelihood of future
outbreaks of even more serious pathogenic CoVs in humans, there is still a lack of effective
antiviral strategies to treat CoVs and few options to prevent CoV infections 10. Given the high
prevalence and wide distribution of CoVs, the novel virus could emerge periodically in humans
as a consequence of frequent cross-species infections and occasional spillover events 45. The
development of effective and time-efficient computational methods for designing compounds
that can treat CoV infections is critical. In this study, we have used our integrated AI-based drug
discovery pipeline to generate novel drug compounds against COVID-19. The results
demonstrate the cost-effectiveness and time efficiency of this type of new method for the
development of novel treatments against CoV infections.
Acknowledgments:
The authors would like to thank Dr. Kerry Blanchard for valuable advice, introductions,
encouragement, and edits. Dr. Rao’s team provided the crystal structure for 6LU7. We would
like to thank WuXi AppTec team which graciously agreed to provide full support on the
synthesis and biological assay development. We would like to thank Dr. Ding Sheng and Dr.
Pan Lirong from GHDDI for providing the compounds-related databases. We would like to thank
our board members and especially Nisa Leung for supporting and encouraging this emergency
initiative which falls outside of Insilico’s scope of commercial efforts. We would like to thank
NVIDIA for their generous support and samples of the GPUs.
Conflicts of interest
All authors are affiliated with Insilico Medicine, a company developing an AI-based end-to-end
integrated pipeline for drug discovery and development and engaged in aging and cancer
research.
References
1. Song, Z. et al.
From SARS to MERS, Thrusting Coronaviruses into the Spotlight. Viruses
11, (2019).
2. de Wit, E., van Doremalen, N., Falzarano, D. & Munster, V. J. SARS and MERS: recent
insights into emerging coronaviruses. Nat. Rev. Microbiol.
14, 523–534 (2016).
3. Menachery, V. D. et al.
A SARS-like cluster of circulating bat coronaviruses shows potential
for human emergence. Nat. Med.
21, 1508–1513 (2015).
4. Huang, C. et al.
Clinical features of patients infected with 2019 novel coronavirus in Wuhan,
China. Lancet
(2020) doi:10.1016/S0140-6736(20)30183-5.
5. Zhou, P. et al.
Discovery of a novel coronavirus associated with the recent pneumonia
outbreak in humans and its potential bat origin. Microbiology
104 (2020).
6. Dong, N. et al.
Genomic and protein structure modelling analysis depicts the origin and
infectivity of COVID-19, a new coronavirus which caused a pneumonia outbreak in Wuhan,
China. Microbiology
(2020).
7. Letko, M. & Munster, V. Functional assessment of cell entry and receptor usage for lineage
B β-coronaviruses, including COVID-19. Microbiology
6117 (2020).
8. Zhao, Y. et al.
Single-cell RNA expression profiling of ACE2, the putative receptor of
Wuhan COVID-19. Bioinformatics
(2020).
9. Ji, W., Wang, W., Zhao, X., Zai, J. & Li, X. Homologous recombination within the spike
glycoprotein of the newly identified coronavirus may boost cross-species transmission from
snake to human. J. Med. Virol.
(2020) doi:10.1002/jmv.25682.
10. de Wilde, A. H., Snijder, E. J., Kikkert, M. & van Hemert, M. J. Host Factors in Coronavirus
Replication. Curr. Top. Microbiol. Immunol.
419, 1–42 (2018).
11. Xia, B. & Kang, X. Activation and maturation of SARS-CoV main protease. Protein Cell
2,
282–290 (2011).
12. Nukoolkarn, V., Lee, V. S., Malaisree, M., Aruksakulwong, O. & Hannongbua, S. Molecular
dynamic simulations analysis of ritonavir and lopinavir as SARS-CoV 3CL(pro) inhibitors. J.
Theor. Biol.
254, 861–867 (2008).
13. Fan, K. et al.
Biosynthesis, purification, and substrate specificity of severe acute respiratory
syndrome coronavirus 3C-like proteinase. J. Biol. Chem.
279, 1637–1642 (2004).
14. Thiel, V. et al.
Mechanisms and enzymes involved in SARS coronavirus genome
expression. J. Gen. Virol.
84, 2305–2315 (2003).
15. Goetz, D. H. et al.
Substrate specificity profiling and identification of a new class of inhibitor
for the major protease of the SARS coronavirus. Biochemistry
46, 8744–8752 (2007).
16. Adedeji, A. O. & Sarafianos, S. G. Antiviral drugs specific for coronaviruses in preclinical
development. Curr. Opin. Virol.
8, 45–53 (2014).
17. Yang, H. et al.
Design of wide-spectrum inhibitors targeting coronavirus main proteases.
PLoS Biol.
3, e324 (2005).
18. Vanhaelen, Q. et al.
Design of efficient computational workflows for in silico
drug
repurposing. Drug Discov. Today
22, 210–222 (2017).
19. Karaman, B. & Sippl, W. Computational Drug Repurposing: Current Trends. Curr. Med.
Chem.
26, 5389–5409 (2019).
20. Computational Methods for Drug Repurposing. Methods in Molecular Biology
(2019)
doi:10.1007/978-1-4939-8955-3.
21. Mani, D., Wadhwani, A. & Krishnamurthy, P. T. Drug Repurposing in Antiviral Research: A
Current Scenario. J. Young Pharm.
11, 117–121 (2019).
22. Dyall, J. et al.
Repurposing of clinically developed drugs for treatment of Middle East
respiratory syndrome coronavirus infection. Antimicrob. Agents Chemother.
58, 4885–4893
(2014).
23. Dyall, J. et al.
Middle East Respiratory Syndrome and Severe Acute Respiratory Syndrome:
Current Therapeutic Options and Potential Targets for Novel Therapies. Drugs
77,
1935–1966 (2017).
24. Li, Y. et al.
Therapeutic Drugs Targeting COVID-19 Main Protease by High-Throughput
Screening. Pharmacology and Toxicology
(2020).
25. Xu, Z. et al.
Nelfinavir was predicted to be a potential inhibitor of COVID-19 main protease
by an integrative approach combining homology modelling, molecular docking and binding
free energy calculation. Pharmacology and Toxicology
264 (2020).
26. Liu, X. & Wang, X.-J. Potential inhibitors for COVID-19 coronavirus M protease from
clinically approved medicines. Bioinformatics
(2020).
27. Zhavoronkov, A., Vanhaelen, Q. & Oprea, T. I. Will Artificial Intelligence for Drug Discovery
Impact Clinical Pharmacology? Clin. Pharmacol. Ther.
(2020) doi:10.1002/cpt.1795.
28. Kadurin, A. et al.
The cornucopia of meaningful leads: Applying deep adversarial
autoencoders for new molecule development in oncology. Oncotarget
vol. 8 (2016).
29. Polykovskiy, D. et al.
Entangled Conditional Adversarial Autoencoder for de Novo
Drug
Discovery. Mol. Pharm.
15, 4398–4405 (2018).
30. Zhavoronkov, A. et al.
Deep learning enables rapid identification of potent DDR1 kinase
inhibitors. Nat. Biotechnol.
37, 1038–1040 (2019).
31. Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. & Zhavoronkov, A. druGAN: An
Advanced Generative Adversarial Autoencoder Model for de Novo
Generation of New
Molecules with Desired Molecular Properties in Silico
. Mol. Pharm.
14, 3098–3104 (2017).
32. Putin, E. et al.
Adversarial Threshold Neural Computer for Molecular de Novo
Design. Mol.
Pharm.
15, 4386–4397 (2018).
33. Putin, E. et al.
Reinforced Adversarial Neural Computer for de Novo
Molecular Design. J.
Chem. Inf. Model.
58, 1194–1204 (2018).
34. Kuzminykh, D. et al.
3D Molecular Representations Based on the Wave Transform for
Convolutional Neural Networks. Mol. Pharm.
15, 4378–4385 (2018).
35. Aliper, A. et al.
Deep Learning Applications for Predicting Pharmacological Properties of
Drugs and Drug Repurposing Using Transcriptomic Data. Mol. Pharm.
13, 2524–2530
(2016).
36. Zhavoronkov, A. Artificial Intelligence for Drug Discovery, Biomarker Development, and
Generation of Novel Chemistry. Mol. Pharm.
15, 4311–4313 (2018).
37. Turlington, M. et al.
Discovery of N-(benzo[1,2,3]triazol-1-yl)-N-(benzyl)acetamido)phenyl)
carboxamides as severe acute respiratory syndrome coronavirus (SARS-CoV) 3CLpro
inhibitors: identification of ML300 and noncovalent nanomolar inhibitors with an induced-fit
binding. Bioorg. Med. Chem. Lett.
23, 6172–6177 (2013).
38. Waterhouse, A. et al.
SWISS-MODEL: homology modelling of protein structures and
complexes. Nucleic Acids Res.
46, W296–W303 (2018).
39. Guex, N., Peitsch, M. C. & Schwede, T. Automated comparative protein structure modeling
with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis
30
Suppl 1, S162–73 (2009).
40. Abraham, M. J. et al.
GROMACS: High performance molecular simulations through
multi-level parallelism from laptops to supercomputers. SoftwareX
1-2, 19–25 (2015).
41. Clarivate Analytics Integrity. https://integrity.clarivate.com/integrity/.
42. Gaulton, A. et al.
ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic
Acids Res.
40, D1100–7 (2012).
43. CHEMBL database release 25
.
http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_25 (2019)
doi:10.6019/CHEMBL.database.25.
44. Ivanenkov, Y. A., Zagribelnyy, B. A. & Aladinskiy, V. A. Are We Opening the Door to a New
Era of Medicinal Chemistry or Being Collapsed to a Chemical Singularity? J. Med. Chem.
62, 10026–10043 (2019).
45. Zhu, N. et al.
A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl.
J. Med.
(2020) doi:10.1056/NEJMoa2001017.
... The latest therapeutics dose is grouping of interferon(β-1b), lopinavir-ritonavir, and ribavirin added with proper medical attendance has successfully worked in handling mild-moderate cases of COVID-19 by shortening the extent of viral shedding and transmissibility (Evan Fan et al., 2020). For novel corona virus there is no vaccine found yet and there is urgent search for drugs for the diseases (Zhavoronkov A. et al., 2020 [33] ) ...
... The latest therapeutics dose is grouping of interferon(β-1b), lopinavir-ritonavir, and ribavirin added with proper medical attendance has successfully worked in handling mild-moderate cases of COVID-19 by shortening the extent of viral shedding and transmissibility (Evan Fan et al., 2020). For novel corona virus there is no vaccine found yet and there is urgent search for drugs for the diseases (Zhavoronkov A. et al., 2020 [33] ) ...
Article
Full-text available
Novel SARS CoV-2 Viruses has the same track record of other virulent diseases like HIV/AIDS, SARS, Ebola and MERS of 21 st century and their prevalence period, peak awareness and course of combat. COVID-19 is an emerging, novice virusesevolvedpandemic (1 st Feb 2020)and posed greater threat for mass extinction transmitting to 215 countries on earth.Present study envisages the virus, history, impact, diagnosis, and stay away from the apocalyptic virus. The study also includes its present statistics, status, the pros and cons considering the global and Indian scenario. The geriatric age groups are hard hit but younger people are not invincible. Since the pandemic is aggravating and the world is heading towards mass extinction. Necessary scientific researches to discover a vaccine and standard drugs of choiceshave becomeglobal urgency.Presently solidarity and isolation stress are being implemented to defend the pandemic. The application of Ayurveda medicine, its practices and practices followed by different countries are different. To stay away from the pandemic virus and maintaining the economic stability is discussed in context of the globe and India.
... The latest therapeutics dose is grouping of interferon(β-1b), lopinavir-ritonavir, and ribavirin added with proper medical attendance has successfully worked in handling mild-moderate cases of COVID-19 by shortening the extent of viral shedding and transmissibility (Evan Fan et al., 2020). For novel corona virus there is no vaccine found yet and there is urgent search for drugs for the diseases (Zhavoronkov A. et al., 2020 [33] ) ...
... The latest therapeutics dose is grouping of interferon(β-1b), lopinavir-ritonavir, and ribavirin added with proper medical attendance has successfully worked in handling mild-moderate cases of COVID-19 by shortening the extent of viral shedding and transmissibility (Evan Fan et al., 2020). For novel corona virus there is no vaccine found yet and there is urgent search for drugs for the diseases (Zhavoronkov A. et al., 2020 [33] ) ...
Article
Novel SARS CoV-2 Viruses has the same track record of other virulent diseases like HIV/AIDS, SARS, Ebola and MERS of 21st century and their prevalence period, peak awareness and course of combat. COVID-19 is an emerging, novice virusesevolvedpandemic (1st Feb 2020)and posed greater threat for mass extinction transmitting to 215 countries on earth.Present study envisages the virus, history, impact, diagnosis, and stay away from the apocalyptic virus. The study also includes its present statistics, status, the pros and cons considering the global and Indian scenario. The geriatric age groups are hard hit but younger people are not invincible. Since the pandemic is aggravating and the world is heading towards mass extinction. Necessary scientific researches to discover a vaccine and standard drugs of choiceshave becomeglobal urgency.Presently solidarity and isolation stress are being implemented to defend the pandemic. The application of Ayurveda medicine, its practices and practices followed by different countries are different. To stay away from the pandemic virus and maintaining the economic stability is discussed in context of the globe and India.
... We used the pdb file (6LU7) of the main protease of SARS-CoV-2 (Mpro) in the present study following removal of the inhibitor (known as N3) from the pdb file prior to using the protein's structure in our molecular docking studies with the phytochemicals. Monomeric form of the protease was used for molecular docking (Zhavoronkov et al., 2020). Any water or other molecules bound to the monomeric structure was removed prior to molecular docking. ...
Article
Despite the huge loss of lives and massive disruption of the world economy by the COVID-19 pandemic caused by SARS-CoV-2, scientists are yet to come out with an effective therapeutic against this viral disease. Several vaccines have obtained ‘emergency approval’, but difficulties are being faced in the even distribution of vaccines amongst high- and low-income countries. On top of it, comorbidities associated with COVID-19 like diabetes, hypertension and malaria can seriously impede the treatment of the main disease, thus increasing the fatality rate. This is more so in the context of sub-Saharan African and south Asian countries. Our objective was to demonstrate that a single plant containing different phytoconstituents may be used for treatment of COVID-19 and comorbidities. Towards initial selection of a plant, existing scientific literature was scanned for reported relevant traditional uses, phytochemicals and pharmacological activities of a number of plants and their phytoconstituents pertaining to treatment of COVID-19 symptoms and comorbidities. Molecular docking studies were then performed with phytochemicals of the selected plant and SARS-CoV-2 components – Mpro, and spike protein receptor binding domain and hACE2 interface using AutoDock Vina. We showed that crude extracts of an indigenous African plant, Costus afer having traditional antidiabetic and antimalarial uses, has phytochemicals with high binding affinities for Mpro, and/or spike protein receptor binding domain and hACE2 interface; the various phytochemicals with predicted high binding energies include aferoside C, dibutyl phthalate, nerolidol, suginal, and ±-terpinene, making them potential therapeutics for COVID-19. The results suggest that crude extracts and phytochemicals of C. afer can function as a treatment modality for COVID-19 and comorbidities like especially diabetes and malaria.
... We used the pdb file (6LU7) of the main protease of SARS-CoV-2 (Mpro) in the present study following removal of the inhibitor (known as N3) from the pdb file prior to using the protein's structure in our molecular docking studies with the phytochemicals. Monomeric form of the protease was used for molecular docking (Zhavoronkov et al., 2020). Any water or other molecules bound to the monomeric structure was removed prior to molecular docking. ...
Article
Despite the huge loss of lives and massive disruption of the world economy by the COVID -19 pandemic caused by SARS -CoV-2, scientists are yet to come out with an effective therapeutic against this viral disease . Several vaccines have obtained 'emergency approval ', but difficulties are being faced in the even distribution of vaccines amongst high- and low- income countries . On top of it, comorbidities associated with COVID -19 like diabetes, hypertension and malaria can seriously impede the treatment of the main disease, thus increasing the fatality rate . This is more so in the context of sub -Saharan African and south Asian countries . Our objective was to demonstrate that a single plant containing different phytoconstituents may be used for treatment of COVID -19 and comorbidities . Towards initial selection of a plant, existing scientific literature was scanned for reported relevant traditional uses, phytochemicals and pharmacological activities of a number of plants and their phytoconstituents pertaining to treatment of COVID-19 symptoms and comorbidities. Molecular docking studies were then performed with phytochemicals of the selected plant and SARS-CoV-2 components - Mpro, and spike protein receptor binding domain and hACE2 interface using AutoDock V ina. We showed that crude extracts of an indigenous African plant, Costus afer having traditional antidiabetic and antimalarial uses, has phytochemicals with high binding affinities for Mpro, and /or spike protein receptor binding domain and hACE2 interface; the various phytochemicals with predicted high binding energies include aferoside C, dibutyl phthalate, nerolidol, suginal, and ± -terpinene, making them potential therapeutics for COVID -19. The results suggest that crude extracts and phytochemicals of C. afer can function as a treatment modality for COVID -19 and comorbidities like especially diabetes and malaria .
... Determine the availability of antiviral drug to tackle SARS-COV-2 [165] SVM, RF, MLP, LR, and XGBoost Potential antibodies discovery for COVID-19 [169] MLP and ANFIS Detect nucleic acid based on CRISPR [170] GAN Develop formation of drug compound for COVID -19 outcomes and solve the urgent problem as well. In designing the analysis applications based on statistical approach, one of the most important and challenging task is availability of high-quality samples at real time. ...
Article
Full-text available
The recent COVID-19 pandemic, which broke at the end of the year 2019 in Wuhan, China, has infected more than 98.52 million people by today (January 23, 2021) with over 2.11 million deaths across the globe. To combat the growing pandemic on urgent basis, there is need to design effective solutions using new techniques that could exploit recent technology, such as machine learning, deep learning, big data, artificial intelligence, Internet of Things, for identification and tracking of COVID-19 cases in near real time. These technologies have offered inexpensive and rapid solution for proper screening, analyzing, prediction and tracking of COVID-19 positive cases. In this paper, a detailed review of the role of AI as a decisive tool for prognosis, analyze, and tracking the COVID-19 cases is performed. We searched various databases including Google Scholar, IEEE Library, Scopus and Web of Science using a combination of different keywords consisting of COVID-19 and AI. We have identified various applications, where AI can help healthcare practitioners in the process of identification and monitoring of COVID-19 cases. A compact summary of the corona virus cases are first highlighted, followed by the application of AI. Finally, we conclude the paper by highlighting new research directions and discuss the research challenges. Even though scientists and researchers have gathered and exchanged sufficient knowledge over last couple of months, but this structured review also examined technological perspectives while encompassing the medical aspect to help the healthcare practitioners, policymakers, decision makers, policymakers, AI scientists and virologists to quell this infectious COVID-19 pandemic outbreak.
Preprint
div> One of the most important SARS-CoV-2 protein targets for therapeutics is the 3C-like protease (main protease, Mpro). In our previous work1 we used the first Mpro crystal structure to become available, 6LU7. On February 4, 2020 Insilico Medicine released the first potential novel protease inhibitors designed using a de novo, AI-driven generative chemistry approach. Nearly 100 X-ray structures of Mpro co-crystallized both with covalent and non-covalent ligands have been published since then. Here we utilize the recently published 6W63 crystal structure of Mpro complexed with a non-covalent inhibitor and combined two approaches used in our previous study: ligand-based and crystal structure-based. We published 10 representative structures for potential development with 3D representation in PDB format and welcome medicinal chemists for broad discussion and generated output analysis. The molecules in SDF format and PDB-models for generated protein-ligand complexes are available here and at https://insilico.com/ncov-sprint/. Medicinal chemistry VR analysis was provided by Nanome team and the video of VR session is available at https://bit.ly/ncov-vr. </div
Preprint
Full-text available
One of the most important SARS-CoV-2 protein targets for therapeutics is the 3C-like protease (main protease, Mpro). In our previous work 1 we used the first Mpro crystal structure to become available, 6LU7. On February 4, 2020 Insilico Medicine released the first potential novel protease inhibitors designed using a de novo , AI-driven generative chemistry approach. Nearly 100 X-ray structures of Mpro co-crystallized both with covalent and non-covalent ligands have been published since then. Here we utilize the recently published 6W63 crystal structure of Mpro complexed with a non-covalent inhibitor and combined two approaches used in our previous study: ligand-based and crystal structure–based. We published 10 representative structures for potential development with 3D representation in PDB format and welcome medicinal chemists for broad discussion and generated output analysis. The pre-print and the molecules in SDF format are available at https://insilico.com/ncov-sprint/ and in the COVID-19 section on the ResearchGate preprint server. Selected ligand-protein complexes were additionally assessed duringa VR session kindly carried out by the Nanome team. A recording of medicinal chemistry analysis of Insilico Medicine AI-generated compounds inside Nanome software is available at https://bit.ly/ncov-vr link.
Article
Full-text available
Over the past 20 years, several coronaviruses have crossed the species barrier into humans, causing outbreaks of severe, and often fatal, respiratory illness. Since SARS-CoV was first identified in animal markets, global viromics projects have discovered thousands of coronavirus sequences in diverse animals and geographic regions. Unfortunately, there are few tools available to functionally test these viruses for their ability to infect humans, which has severely hampered efforts to predict the next zoonotic viral outbreak. Here, we developed an approach to rapidly screen lineage B betacoronaviruses, such as SARS-CoV and the recent SARS-CoV-2, for receptor usage and their ability to infect cell types from different species. We show that host protease processing during viral entry is a significant barrier for several lineage B viruses and that bypassing this barrier allows several lineage B viruses to enter human cells through an unknown receptor. We also demonstrate how different lineage B viruses can recombine to gain entry into human cells, and confirm that human ACE2 is the receptor for the recently emerging SARS-CoV-2. This study describes the development of an approach to rapidly screen lineage B betacoronaviruses, such as SARS-CoV and the recently emerged SARS-CoV-2, for receptor usage and their ability to infect cell types from different species. Using it, they confirm human ACE2 as the receptor for SARs-CoV-2 and show that host protease processing during viral entry is a significant barrier for viral entry.
Article
Full-text available
In December 2019, a cluster of patients with pneumonia of unknown cause was linked to a seafood wholesale market in Wuhan, China. A previously unknown betacoronavirus was discovered through the use of unbiased sequencing in samples from patients with pneumonia. Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed another clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily. Different from both MERS-CoV and SARS-CoV, 2019-nCoV is the seventh member of the family of coronaviruses that infect humans. Enhanced surveillance and further investigation are ongoing. (Funded by the National Key Research and Development Program of China and the National Major Project for Control and Prevention of Infectious Disease in China.).
Article
Full-text available
Background: A recent cluster of pneumonia cases in Wuhan, China, was caused by a novel betacoronavirus, the 2019 novel coronavirus (2019-nCoV). We report the epidemiological, clinical, laboratory, and radiological characteristics and treatment and clinical outcomes of these patients. Methods: All patients with suspected 2019-nCoV were admitted to a designated hospital in Wuhan. We prospectively collected and analysed data on patients with laboratory-confirmed 2019-nCoV infection by real-time RT-PCR and next-generation sequencing. Data were obtained with standardised data collection forms shared by the International Severe Acute Respiratory and Emerging Infection Consortium from electronic medical records. Researchers also directly communicated with patients or their families to ascertain epidemiological and symptom data. Outcomes were also compared between patients who had been admitted to the intensive care unit (ICU) and those who had not. Findings: By Jan 2, 2020, 41 admitted hospital patients had been identified as having laboratory-confirmed 2019-nCoV infection. Most of the infected patients were men (30 [73%] of 41); less than half had underlying diseases (13 [32%]), including diabetes (eight [20%]), hypertension (six [15%]), and cardiovascular disease (six [15%]). Median age was 49·0 years (IQR 41·0-58·0). 27 (66%) of 41 patients had been exposed to Huanan seafood market. One family cluster was found. Common symptoms at onset of illness were fever (40 [98%] of 41 patients), cough (31 [76%]), and myalgia or fatigue (18 [44%]); less common symptoms were sputum production (11 [28%] of 39), headache (three [8%] of 38), haemoptysis (two [5%] of 39), and diarrhoea (one [3%] of 38). Dyspnoea developed in 22 (55%) of 40 patients (median time from illness onset to dyspnoea 8·0 days [IQR 5·0-13·0]). 26 (63%) of 41 patients had lymphopenia. All 41 patients had pneumonia with abnormal findings on chest CT. Complications included acute respiratory distress syndrome (12 [29%]), RNAaemia (six [15%]), acute cardiac injury (five [12%]) and secondary infection (four [10%]). 13 (32%) patients were admitted to an ICU and six (15%) died. Compared with non-ICU patients, ICU patients had higher plasma levels of IL2, IL7, IL10, GSCF, IP10, MCP1, MIP1A, and TNFα. Interpretation: The 2019-nCoV infection caused clusters of severe respiratory illness similar to severe acute respiratory syndrome coronavirus and was associated with ICU admission and high mortality. Major gaps in our knowledge of the origin, epidemiology, duration of human transmission, and clinical spectrum of disease need fulfilment by future studies. Funding: Ministry of Science and Technology, Chinese Academy of Medical Sciences, National Natural Science Foundation of China, and Beijing Municipal Science and Technology Commission.
Article
Full-text available
As the field of artificial intelligence and machine learning (AI/ML) for drug discovery is rapidly advancing, we address the question "what is the impact of recent AI/ML trends in the area of Clinical Pharmacology". We address difficulties and AI/ML developments for target identification, their use in generative chemistry for small molecule drug discovery, and the potential role of AI/ML in clinical trial outcome evaluation. We briefly discuss current trends in the use of AI/ML in healthcare and the impact of AI/ML context of the daily practice of clinical pharmacologists.
Article
Full-text available
We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-molecule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days. Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice. A machine learning model allows the identification of new small-molecule kinase inhibitors in days.
Article
Full-text available
The paradigm of “drug likeness” dramatically altered the behavior of the medicinal chemistry community for a long time. In recent years, scientists have empirically found a significant increase in key properties of drugs that have moved structures closer to the periphery or the outside of the rule-of-five “cage”. Herein, we show that for the past decade, the number of molecules claimed in patent records by major pharmaceutical companies has dramatically decreased, which may lead to a “chemical singularity”. New compounds containing fragments with increased 3D complexity are generally larger, slightly more lipophilic and more polar. A core difference between this study and recently published papers is that we consider the nature and quality of sp3-rich frameworks rather than sp3 count. We introduce the original descriptor MCE-18, which stands for Medicinal Chemistry Evolution, 2018, and this measure can effectively score molecules by novelty in terms of their cumulative sp3 complexity.
Article
Full-text available
Coronaviruses (CoVs) have formerly been regarded as relatively harmless respiratory pathogens to humans. However, two outbreaks of severe respiratory tract infection, caused by the severe acute respiratory syndrome coronavirus (SARS-CoV) and the Middle East respiratory syndrome coronavirus (MERS-CoV), as a result of zoonotic CoVs crossing the species barrier, caused high pathogenicity and mortality rates in human populations. This brought CoVs global attention and highlighted the importance of controlling infectious pathogens at international borders. In this review, we focus on our current understanding of the epidemiology, pathogenesis, prevention, and treatment of SARS-CoV and MERS-CoV, as well as provides details on the pivotal structure and function of the spike proteins (S proteins) on the surface of each of these viruses. For building up more suitable animal models, we compare the current animal models recapitulating pathogenesis and summarize the potential role of host receptors contributing to diverse host affinity in various species. We outline the research still needed to fully elucidate the pathogenic mechanism of these viruses, to construct reproducible animal models, and ultimately develop countermeasures to conquer not only SARS-CoV and MERS-CoV, but also these emerging coronaviral diseases.
Article
The current outbreak of viral pneumonia in the city of Wuhan, China, was caused by a novel coronavirus designated 2019‐nCoV by the World Health Organization, as determined by sequencing the viral RNA genome. Many initial patients were exposed to wildlife animals at the Huanan seafood wholesale market, where poultry, snake, bats, and other farm animals were also sold. To investigate possible virus reservoir, we have carried out comprehensive sequence analysis and comparison in conjunction with relative synonymous codon usage (RSCU) bias among different animal species based on the 2019‐nCoV sequence. Results obtained from our analyses suggest that the 2019‐nCoV may appear to be a recombinant virus between the bat coronavirus and an origin‐unknown coronavirus. The recombination may occurred within the viral spike glycoprotein, which recognizes a cell surface receptor. Additionally, our findings suggest that 2019‐nCoV has most similar genetic information with bat coronovirus and most similar codon usage bias with snake. Taken together, our results suggest that homologous recombination may occur and contribute to the 2019‐nCoV cross‐species transmission. Research Highlights • Taken together, our results suggest that homologous recombination may occur and contribute to the 2019‐nCoV cross‐species transmission.