PreprintPDF Available

Deep Learning Based Drug Screening for Novel Coronavirus 2019-nCov

  • Shenzhen Institutes of Advanced Technology
  • Bharath Institute of Higher Education and Research
  • Bangabandhu Sheikh Mujibur Rahman Science and Technology University

Abstract and Figures

A novel coronavirus called 2019-nCoV was recently found in Wuhan, Hubei Province of China, and now is spreading across China and other parts of the world. Although there are some drugs to treat 2019-nCoV, there is no proper scientific evidence about its activity on the virus. It is of high significance to develop a drug that can combat the virus effectively to save valuable human lives. It usually takes a much longer time to develop a drug using traditional methods. For 2019-nCoV, it is now better to rely on some alternative methods such as deep learning to develop drugs that can combat such a disease effectively since 2019-nCoV is highly homologous to SARS-CoV. In the present work, we first collected virus RNA sequences of 18 patients reported to have 2019-nCoV from the public domain database, translated the RNA into protein sequences, and performed multiple sequence alignment. After a careful literature survey and sequence analysis, 3C-like protease is considered to be a major therapeutic target and we built a protein 3D model of 3C-like protease using homology modeling. Relying on the structural model, we used a pipeline to perform large scale virtual screening by using a deep learning based method to accurately rank/identify protein-ligand interacting pairs developed recently in our group. Our model identified potential drugs for 2019-nCoV 3C-like protease by performing drug screening against four chemical compound databases (Chimdiv, Targetmol-Approved_Drug_Library, Targetmol-Natural_Compound_Library, and Targetmol-Bioactive_Compound_Library) and a database of tripeptides. Through this paper, we provided the list of possible chemical ligands (Meglumine, Vidarabine, Adenosine, D-Sorbitol, D-Mannitol, Sodium_gluconate, Ganciclovir and Chlorobutanol) and peptide drugs (combination of isoleucine, lysine and proline) from the databases to guide the experimental scientists and validate the molecules which can combat the virus in a shorter time.
Content may be subject to copyright.
Deep learning based drug screening for novel coronavirus 2019-nCov
Haiping Zhang1, Konda Mani Saravanan1, Yang Yang2, Md. Tofazzal Hossain1,6, Junxin Li3,
Xiaohu Ren4, Yi Pan5, Yanjie Wei1*
1Center for High Performance Computing, Joint Engineering Research Center for Health Big
Data Intelligent Analysis Technology
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen,
Guangdong, PR China 518055
2Shenzhen Key Laboratory of Pathogen and Immunity, Guangdong Key Laboratory for
Diagnosis and Treatment of Emerging Infectious Diseases, State Key Discipline of Infectious
Disease, Second Hospital Affiliated to Southern University of Science and Technology,
Shenzhen Third People's Hospital, Shenzhen, 518112, China
3Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and
Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences,
1068 Xueyuan Boulevard, University City of Shenzhen, XiliNanshan, Shenzhen, 518055,
4Institute of Toxicology, Shenzhen Center for Disease Control and Prevention, No 8
Longyuan Road, Nanshan District, Shenzhen, 518055, China
5Department of Computer Science, Georgia State University, Atlanta, United States of
America 30302-5060
6University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District,B
eijing, P.R.China 100049
Corresponding Author:
A novel coronavirus called 2019-nCoV was recently found in Wuhan, Hubei Province of 2
China, and now is spreading across China and other parts of the world. Although there are 3
some drugs to treat 2019-nCoV, there is no proper scientific evidence about its activity on the 4
virus. It is of high significance to develop a drug that can combat the virus effectively to save 5
valuable human lives. It usually takes a much longer time to develop a drug using traditional 6
methods. For 2019-nCoV, it is now better to rely on some alternative methods such as deep 7
learning to develop drugs that can combat such a disease effectively since 2019-nCoV is 8
highly homologous to SARS-CoV. In the present work, we first collected virus RNA 9
sequences of 18 patients reported to have 2019-nCoV from the public domain database, 10
translated the RNA into protein sequences, and performed multiple sequence alignment. After 11
a careful literature survey and sequence analysis, 3C-like protease is considered to be a major 12
therapeutic target and we built a protein 3D model of 3C-like protease using homology 13
modeling. Relying on the structural model, we used a pipeline to perform large scale virtual 14
screening by using a deep learning based method to accurately rank/identify protein-ligand 15
interacting pairs developed recently in our group. Our model identified potential drugs for 16
2019-nCoV 3C-like protease by performing drug screening against four chemical compound 17
databases (Chimdiv, Targetmol-Approved_Drug_Library, 18
Targetmol-Natural_Compound_Library, and Targetmol-Bioactive_Compound_Library) and a 19
database of tripeptides. Through this paper, we provided the list of possible chemical ligands 20
(Meglumine, Vidarabine, Adenosine, D-Sorbitol, D-Mannitol, Sodium_gluconate, 21
Ganciclovir and Chlorobutanol) and peptide drugs (combination of isoleucine, lysine and 22
proline) from the databases to guide the experimental scientists and validate the molecules 23
which can combat the virus in a shorter time. 24
Keywords 26
Coronavirus; Deep learning; Drug screening; homology modeling; 3C-like protease 27
Introduction 1
In December 2019, a severe respiratory illness similar to severe acute respiratory 2
syndrome coronavirus emerged in Wuhan, Hubei, China, and spreading all over the world 3
with high mortality. In the past, beta coronaviruses, severe acute respiratory syndrome 4
coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) 5
respectively have caused high mortality rates and became a threat to human life [1]. The most 6
recent outbreak of the viral pneumonia was first disclosed by the Wuhan Municipal Health 7
Commission [2, 3], the World Health Organization (WHO) was alarmed about the outbreak 8
of pneumonia by the Chinese Officials [4]. The novel coronavirus (2019-nCoV) was isolated 9
from 27 patients who were initially reported and the number of patients was subsequently 10
revised to 31498 as of March 23, 2020, with 3267 death [5]. The current 2019-nCoV 11
outbreak has some common features like the SARS outbreak: both happened in winter, linked 12
to live animal markets, and caused by unknown coronaviruses [2, 5]. 13
Fever, cough, and shortness of breath are the symptoms in common cases whereas 14
pneumonia, severe acute respiratory syndrome and kidney failure are being reported as the 15
symptoms in severe cases [4]. Most of the 2019-nCoV patients are linked to the Huanan 16
Seafood Wholesale Market where several wildlife animals including bats, snakes as well as 17
poultry are sold. So far, no specific wildlife animal is identified as the host of the novel 18
coronavirus. Bat is considered as the native host of the novel coronavirus (2019-nCoV) 19
although there are other hosts in transmission from bats to humans [5]. The Spring Festival 20
travel rush has accelerated the spread, so it is of top priority to prevent the spread, develop a 21
new drug to combat it, and cure the patients in time. Knowledge of current 2019-nCoV can 22
be learned from previous SARS-CoV. For SARS-CoV, a variety of modern machine learning 23
methods in particular deep neural networks were used for drug discovery and development. 24
These methods take advantage of bigger datasets compiled from high throughput screening 25
data and perform prediction of bioactivities of a target with high accuracy [6]. 26
The genetic sequences of 2019-nCoV have shown similarities to SARS-CoV (79.5%) [7, 27
8]. The S protein and 3C-like protease are potential drug targets. The S protein is the main 28
target of neutralizing antibodies, and antibodies binding with this protein have the potential to 29
stop the virus entry into host cells [9]. The 3C-like protease catalyzes a chemical reaction 30
which is important in SARS coronavirus replicase polyprotein processing [10, 11]. The 31
neutralizing antibodies against S protein of SARS have been obtained from human patients 32
and the anti-SARS-CoV S antibody triggered fusogenic conformational changes [9]. This 33
provides an important clue to prevent virus entry into host cells by antibodies or peptides. 34
The 3C-like protease inhibitors also have potential to prevent coronavirus maturation, and 35
series of unsaturated esters inhibitors against 3C-like protease of SARS-CoV was deposited 36
in PDB database (Crystal structures of SARS-Cov 3C-like protease complexed with a series 37
of unsaturated esters, Protein Databank Identifier: 3TIT). 38
One can also use these previous SARS inhibitors to design the inhibitor against 39
2019-nCoV. Based on the increasing protein-ligand complex structures, the deep learning 40
algorithms for identifying/predicting potential binding compounds for a given target became 41
possible [12, 13]. In addition to small molecular chemical compounds, scientists also rely on 42
peptide/antibody to combat the virus due to stronger binding affinity. In the post-genomics 43
era, a Dense Fully Convolutional Neural Network (DFCNN) model is more effective, faster 44
and cheaper for drug discovery, because the deep layers of the model can learn more features 1
from the data and perform an accurate prediction. By using these techniques, an antimalarial 2
drug “pyrimethamine” was discovered against Dihydrofolate reductase (DHFR) enzyme and 3
another drug BPM31510 is in a phase II trial involving humans with advanced pancreatic 4
cancer [14–16]. Hence we believe that the integrated applications of such machine learning 5
models as a pipeline for drug discovery has implications in therapeutic drug targeting. 6
Considering all the above facts, in the present work, we consider 2019-nCov_3C-like 7
protease as a potential target and built a structural model after systematically analyzing its 8
sequence features. We built a pipeline with a deep learning based method developed in our 9
group by representing molecules as vectors to identify potential drugs (peptides or small 10
ligands) against the protein target of the 2019-nCoV virus [13]. Our method is extremely fast 11
in virtual drug screening and it takes less than a day to finish the virtual screen over millions 12
of protein-ligand or protein-peptide predictions, whereas traditional docking methods take 13
several weeks with the help of a supercomputer. Although, 2019-nCoV outbreak is a major 14
challenge for clinicians [17], we believe the proposed potential drug list can help them to 15
validate the drug that relieves symptoms or even cures the disease rapidly. 16
Materials & Methods 18
Dataset and SequenceAlignment 19
We retrieved the virus RNA sequences from Global Initiative on Sharing All Influenza 20
Data (GISAID) database [18] and the sequences are aligned with a focus on the interested S 21
protein and ligand binding region of 2019-nCov_3C-like protease. The amino acid sequence 22
is translated from the RNA sequence by Translate web tool ( 23
We used 18 patient’s sequences in this work (EPI_ISL_402119 to EPI_ISL_404228). Details 24
of the sequences and acknowledgement to the authors who submitted the data to the server is 25
presented in the Supplementary Table S1. Multiple sequence alignment is performed by using 26
Clustal Omega program [19]. 27
Homology modeling of 2019-nCov_3C-like protease 28
The structural model of 2019-nCov_3C-like protease was built by using Modeller 9.9 29
[20]. The SARS coronavirus 3C-like protease was used as a template (PDB ID: 3TNT) which 30
has about 96.07% amino acid sequence identity. The software outputs multiple predicted 31
structures and they are ranked according to the Discrete Optimized Protein Energy (DOPE) 32
score [21]. The quality of the model was validated by looking at the stereo chemical quality 33
on Ramachandran map. The model was further optimized by PROCHECK [22], ERPAT [23] 34
and Qmean [24] and the final optimized structural model is considered for further analysis. 35
A deep learning model is used to virtual screen large databases 36
In our previous work, we built a Dense Fully Convolutional Neural Network (DFCNN) 37
deep learning model to reverse search drug targets. Here we apply this model to perform 38
large scale virtual screening. Since the method is shown to have relatively higher accuracy 39
and efficiency, it is very suitable for applying to such an emerging disease outbreak. The 40
DFCNN is a densely fully connected neural network, and the densely network (similar to 41
DenseNet, but replace the convolution layer to fully connected layer) allows deep layer 42
without the gradient vanishing problem. The deeper layers make it to learn more abstract 43
features from the data. The training data of DFCNN is from PDB bind database [25], for 44
which we define the crystal protein-ligand PDB complexes as positive and cross-docking 1
complexes as negative. The detail process to build the deep learning model is described in our 2
recently published work to virtual screen targets by inputting a small molecule by using a 3
vector type of representation [13]. The overall workflow of the proposed method is shown in 4
Figure 1. DFCNN model has two advantages over many other methods such as independent 5
of docking simulation and the training dataset includes nonbinding decoys. The independence 6
of the docking simulation makes it extremely fast, while the inclusion of nonbinding decoys 7
during training makes the model robust in the real application scenarios. 8
Virtual screening against Chimdiv database 9
The structural model of the ligand binding region of 2019-nCov_3C-like protease is used 10
as the target protein structure. We define the residues with a cutoff distance of 1 nm from the 11
known ligand as a pocket (binding site is defined based on the ligand from the template PDB 12
3TNT is used). The ligand database is taken from the chimdiv company 13
( which contains around 1000,000 compounds. We first used the 14
DFCNN model to perform large scale virtual screening. The mean and deviation of the 15
training dataset were used during data normalization for a more stable performance. In the 16
second stage, the top prediction by DFCNN model was chosen for an autodock vina based 17
docking simulation. The docking result was visualized and examined by the discovery studio 18
visualizer [26]. Finally, we provide a proposed compound list that has the potential to bind 19
protein pocket. 20
Virtual screening against Targetmol-Approved_Drug_Library, 21
Targetmol-Natural_Compound_Library, and Targetmol-Bioactive_Compound_Library 22
The Targetmol-Approved_Drug_Library, Targetmol-Natural_Compound_Library, and 23
Targetmol-Bioactive_Compound_Library contain about 2040, 1680, and 5370 compounds 24
respectively. We have applied DFCNN model to perform virtual screening against these 3 25
libraries for 2019-nCov_3C-like protease. The compounds with high DFCNN scores are 26
recommended as the potential inhibitors for further experimental validation. 27
Virtual screening against tripeptide database 28
Tri-amino acid peptide database is firstly built, with a total size of 8000. Each amino acid 29
in the tripeptide database was converted into a molecule vector by Mol2vec [27]. For each 30
peptide, the sum of its amino acid vector was used to represent this peptide’s vector. Protein 31
pocket is defined as residues with a cutoff distance of 1 nm from the known ligand. The 32
pocket is then converted into Vector. The pocket and peptide vector are then concatenated 33
into one line as input with a maximum dimension of 600. We will use the same model as 34
DFCNN, a densely fully connected model that is trained by a protein-ligand dataset from the 35
PDB bind database. Since the ligand and peptides are composed of chemical groups, the 36
model trained on the protein-ligand complexes should also be suitable for protein-small 37
peptide interaction. 38
Results 40
Sequence alignment and homology modeling 41
18 patient’s RNA sequences obtained from GISAID public domain database are 42
translated into protein sequences by using translate tool. The ligand binding sites of the 43
template protein (3TNT) is considered as reference to define pocket region of our homology 44
model. We have checked the mutations in the pocket region of 2019-nCov_3C-like protease, 1
and the sequences have 100% similarity with the virus from 18 different patients. This 2
indicates the virus is highly conserved in this region, and it is suitable for designing drugs by 3
targeting this site. The alignment of S-protein epitope regions also shows high conservation 4
among the patients (Supplementary Figure S1). From the figure, it is observed that the RNA 5
sequence EPI_ISL_402132 has a point mutation at 32nd position where the codon of 6
phenylalanine is replaced by isoleucine. 2019-nCoV_3C-like protease is also aligned to 7
SARS-CoV protease by Clustal Omega [19]. The aligned sequence is shown in Figure 2. 8
There are 276 amino acid residues in both of the proteins. The figure indicates high similarity 9
between 2019-nCov and SARS-CoV, which is consistent with the findings by Xu et al (2020) 10
[5]. Using the X-ray crystallographic structure of SARS coronavirus 3C-like protease solved 11
at 1.59Å resolution, a theoretical protein model is built for 2019-nCoV_3C-like protease 12
using modeler software. Figure 3A shows the crystallographic structure of 13
SARS_coronavirus_3C-like protease and 3B shows the homology model of 14
2019-nCoV_3C-like protease. There are only four mutations (T35V, A46S, S94A and K180N) 15
between SARS_coronavirus_3Clike protease and 2019-nCoV_3C-like protease shown in 16
Figure 3A and B. In the Figure, the mutated residues are marked with blue color. Figure 3C 17
shows the model structure with known SARS_coronavirus_3C-like protease inhibitor. The 18
binding pocket and two dimensional ligand interaction pattern of the target protein is shown 19
with reference to the template. There are 23 protein-ligand interactions observed including 15 20
hydrogen bonds, one disulphide bond and few pi stacking interactions which is shown in 21
figure 3D. The pocket extracted from the model is used for further analysis of large-scale 22
virtual screening. 23
Virtual screening against 4 small molecular compound databases 24
Chemdiv dataset, widely used for large scale virtual screening, contains a large amount 25
(~1000,000) of drug-like compounds or drug leads. The potential drug candidates with the 26
highest score (Autodock vina score and our deep learning model score) from the Chemdiv 27
dataset are presented in Table 1. Interestingly, the compound with identifier “C998-0189” has 28
a top vina score compared to other six compounds listed. The name of the compound is 29
-2-yl)-N~1~-[3-(trifluoromethyl)phenyl]glycinamide with molecular formula 31
C22H22F3N3O3S2. The molecular weight of the compound is 497.6 g/mol and the 32
compound satisfies most of the drug-likeness parameters including Lipinski’s filters. The 33
other five recommended compounds also have reasonable vina scores around 7.5 with 34
important stabilizing interactions. 35
The top 100 predictions by our deep learning model against the database are shown in 36
Supplementary Table S2. The top five compounds with Chimdiv identifier 8017-4328, 37
8017-4325, 8002-7777, 8004-0123 and 8010-0095 respectively are listed with the high 38
DFCNN score. Three other well known compound libraries were screened in the present 39
work, including Targetmol-Approved_Drug_Library, Targetmol-Natural_Compound_Library 40
and Targetmol-Bioactive_Compound_Library. It is worth to test whether there is any natural 41
compound that can combat the virus by inhibiting 2019-nCov_3C-like protease. Table 2 42
shows the screening result for Targetmol-Natural compound library. The compounds with a 43
DFCNN score higher than 0.997 are listed in Table 2, and it is found that Adenosine, 44
Vidarabine, Mannitol, Dulcitol, D-Sorbitol, D-Mannitol, Allitol, Sodium_gluconate are the 1
top predictions (Table 2). Natural products are often active ingredients of known herb 2
medicine, and relatively safe because of long history usage. If it is proved by an experiment 3
that is effective to the target, patients can easily access it by taking corresponding herb 4
medicine. There are about 8 compounds with the score of 0.999 and about 20 compounds 5
with the score of 0.998 which are presented in Table S2. As indicated above, most of the 6
drugs listed by our model are antiviral drugs and hence it can be tested against nCoV-2019 7
and can be validated in the clinical lab within a short time. 8
The screening result for Targetmol-Approved Drug library is shown in Table 3. The 9
compounds with a DFCNN score higher than 0.997 are listed in Table 3. We randomly 10
considered drugs from potential drugs list and performed a systematic literature search. It is 11
found that Meglumine, Vidarabine, Adenosine, D-Sorbitol, D-Mannitol, Sodium_gluconate, 12
Ganciclovir and Chlorobutanol respectively are top predictions according to the DFCNN 13
score (Table 3). Interestingly, we found most of the drugs in the list such as meglumine, 14
Ganciclovir and Vidarabine respectively show antiviral activity. The list of all the compounds 15
above score 0.990 is provided in Table S4. The screening result for 16
Targetmol-Bioactive_Compound_Library is shown in Table 4. The compounds with a 17
DFCNN score higher than 0.997 is listed in Table 4. Bioactive compounds are a type of 18
chemicals that can found in plants and some foods and have been studied in the prevention of 19
various diseases. It is worth to check whether any of them can act on the target protein. We 20
found compounds such as Vidarabine, Adenosine, Dulcitol, D-Sorbitol, D-Mannitol, 21
Ganciclovir and 5'-Deoxyadenosine are the top predictions in the Targetmol-Bioactive 22
compounds (Table 4). The list of compounds all the compounds above score 0.99 is provided 23
in Table S5. The list in Table 4 has narrowed down the hit compounds for later drug 24
development stages, such as molecular dynamics simulation, or even directly experimental 25
validation for finding bioactive compounds against 2019-nCov_3C-like protease. 26
Virtual screening against database of tripeptides 27
Peptides have the potential to exert higher binding affinity and specificity than small 28
molecular chemical compounds meanwhile small peptides are easier to be synthesized 29
compared with small molecules and antibodies. Since the known ligands of SARS_3C-like 30
protease are compounds similar to tripeptides and the combination of 20 amino acids for 31
tripeptide is also affordable for our method, we decide to perform virtual screening on the 32
tripeptides. The screened tripeptides with a DFCNN score higher than 0.995 (0.997, 0.996 33
and 0.995) for the 2019-nCov_3C-like protease is shown in Table 5. A higher value indicates 34
the peptide can most likely bind with the pocket of the 2019-nCov_3C-like protease. Our 35
method found that the peptides formed by I, K, P amino acids have the highest possibility to 36
bind in the pocket. The combinations by G, K, L or G, K, K or K, P, V are also found to be 37
favorable binding partners predicted by DFCNN (Table 5). The list of all tripeptides above 38
score 0.99 is provided in Table S6. The combination of short peptides and its composition 39
play a crucial role in affecting the overall conformation of protein [28, 29]. It was found that 40
the tripeptide, pentapeptide and octapeptides are believed as a promising candidates for drug 41
development of infectious diseases [30, 31]. Since these peptides are relatively easy to 42
produce, many of the top predictions can be validated by the experimental techniques in a 43
very fast and less expensive manner. 44
Conclusion 1
Designing small compound or peptide drugs to cure the 2019-nCoV is extremely urgent. 2
Effective and safe drugs are required for treating deadly viral disease which caused an 3
epidemic outbreak all over the globe. Researchers use different modern technologies to 4
combat such diseases and deep learning is one among them with faster prediction and 5
achieves greater than ~80% accuracy. With the extremely high speed and relatively high 6
accuracy, our DFCNN model for 3C-like protease-ligand interaction analysis is suitable to 7
overcome the challenge of screening tens of thousands of drugs in a short time in a certain 8
emergency situations, such as 2019-nCov outbreak. Our deep learning model based on 9
DFCNN is a data-driven model, which learns 3C-like protease-ligand interaction from known 10
binding and non-binder data. The model use the binding pocket of 3C-like protease-ligand 11
conformation instead of whole conformation of the complex, hence our model is so fast and 12
accurate compared to all other molecular docking procedures. 13
The identified potential 3C-like protease-ligand pairs can be subjected to MD simulation 14
to further check the binding stability and atomic interaction pattern, or even the binding free 15
energy with techniques such as metadynamics to narrow down the candidate list. A variety of 16
repurposed drugs and investigational drugs have been identified in the past. Screening 17
National Medical products Administration (NMPA) approved drug libraries and other 18
chemical libraries have identified novel agents. Hundreds of clinical trials involving 19
remdesivir, chloroquine, favipiravir, chloroquine, convalescent plasma, TCM and other 20
interventions are planned or underway. In this connection, we have performed a deep learning 21
based drug screening and provided potential compound and tripeptide lists for 22
2019-nCov_3C-like protease. Since the inhibitor candidates provided are on-market drugs, 23
the list provided can help to facilitate the 2019-nCov_3C-like protease drug development and 24
could be used immediately. 25
References 27
1. Huang C, Wang Y, Li X, et al (2020) Clinical features of patients infected with 2019 28
novel coronavirus in Wuhan, China. Lancet 395(10223):497-506. 29 30
2. Lu H, Stratton CW, Tang Y (2020) Outbreak of Pneumonia of Unknown Etiology in 31
Wuhan China: the Mystery and the Miracle. J Med Virol 92(4):401-402. 32 33
3. Thompson R (2020) Pandemic potential of 2019-nCoV. Lancet Infect Dis 20(3):P280. 34 35
4. Hui DS, I Azhar E, Madani TA, et al (2020) The continuing 2019-nCoV epidemic 36
threat of novel coronaviruses to global health — The latest 2019 novel coronavirus 37
outbreak in Wuhan, China. Int. J. Infect. Dis 91:264-266. 38 39
5. Xintian Xu, Ping Chen, Jingfang Wang, Jiannan Feng, Hui Zhou, Xuan Li, Wu Zhong 40
PH (2020) Evolution of the novel coronavirus from the ongoing Wuhan outbreak and 41
modeling of its spike protein for risk of human transmission. Sci CHINA Life Sci 63: 42
457-460. 43
6. Ekins S, Puhl AC, Zorn KM, et al (2019) Exploiting machine learning for end-to-end 44
drug discovery and development. Nat. Mater 18:435-441. 1 2
7. Zhou P, Yang X-L, Wang X-G, et al (2020) Discovery of a novel coronavirus 3
associated with the recent pneumonia outbreak in humans and its potential bat origin. 4
Nature 579:270-273. 5
8. Lu R, Zhao X, Li J, et al (2020) Genomic characterisation and epidemiology of 2019 6
novel coronavirus: implications for virus origins and receptor binding. Lancet 7
395(10224):565-574. 8
9. Walls AC, Xiong X, Park YJ, et al (2019) Unexpected Receptor Functional Mimicry 9
Elucidates Activation of Coronavirus Fusion. Cell 176(5):1026-1039. 10 11
10. Goetz DH, Choe Y, Hansell E, et al (2007) Substrate specificity profiling and 12
identification of a new class of inhibitor for the major protease of the SARS 13
Coronavirus. Biochemistry 46(30):8744-8752. 14
11. Kim Y, Lovell S, Tiew K-C, et al (2012) Broad-Spectrum Antivirals against 3C or 15
3C-Like Proteases of Picornaviruses, Noroviruses, and Coronaviruses. J Virol 16
86(21):11754-11762. 17
12. Zhang H, Liao L, Saravanan KM, et al (2019) DeepBindRG: a deep learning based 18
method for estimating effective protein–ligand affinity. PeerJ 7:e7362. 19 20
13. Zhang H, Liao L, Cai Y, et al (2019) IVS2vec: A tool of Inverse Virtual Screening 21
based on word2vec and deep learning techniques. Methods 166:57-65. 22 23
14. Fleming N (2018) How artificial intelligence is changing drug discovery. Nature 24
557:S55-S57. 25
15. Liu Z, Du J, Fang J, et al (2019) DeepScreening: a deep learning-based screening web 26
server for accelerating drug discovery. Database (Oxford).2019;1-11. 27 28
16. Chen H, Engkvist O, Wang Y, et al (2018) The rise of deep learning in drug discovery. 29
Drug Discov. Today 23(6):1241-1250. 30
17. Russell CD, Millar JE, Baillie JK (2020) Clinical evidence does not support 31
corticosteroid treatment for 2019-nCoV lung injury. Lancet 395:473–475. 32 33
18. Shu Y, McCauley J (2017) GISAID: Global initiative on sharing all influenza data 34
from vision to reality. Eurosurveillance 22(13):30494. 35 36
19. Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many 37
protein sequences. Protein Sci 27(1):135-145. 38
20. Fiser A, Šali A (2003) MODELLER: Generation and Refinement of Homology-Based 39
Protein Structure Models. Methods Enzymol 374:461–491. 40 41
21. Shen M, Sali A (2006) Statistical potential for assessment and prediction of protein 42
structures. Protein Sci 15(11):2507-2524. 43
22. Laskowski R a., MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a 44
program to check the stereochemical quality of protein structures. J Appl Crystallogr 1
26:283–291. 2
23. Colovos C, Yeates TO (1993) Verification of protein structures: Patterns of nonbonded 3
atomic interactions. Protein Sci 2(9):1511-1519. 4 5
24. Benkert P, Tosatto SCE, Schomburg D (2008) QMEAN: A comprehensive scoring 6
function for model quality assessment. Proteins 71:261–277. 7 8
25. Liu Z, Li Y, Han L, et al (2015) PDB-wide collection of binding data: Current status of 9
the PDBbind database. Bioinformatics 31(3):405-412. 10 11
26. Accelrys: Materials Studio is a Software Environment for Molecular Modeling (2009) 12
Dassault Systèmes BIOVIA,. Discovery. 13
27. Jaeger S, Fulle S, Turk S (2018) Mol2vec: Unsupervised Machine Learning Approach 14
with Chemical Intuition. J Chem Inf Model 58(1):27-35. 15 16
28. Santos S, Torcato I, Castanho MARB (2012) Biomedical applications of dipeptides 17
and tripeptides. Biopolymers 98(4):288-293. 18
29. Saravanan KM, Selvaraj S (2012) Search for identical octapeptides in unrelated 19
proteins: Structural plasticity revisited. Biopolymers 98(1):11-26. 20 21
30. Wendler J, Schröder BO, Ehmann D, et al (2018) Tu1860 - A Novel Octapeptide as a 22
Promising Candidate for Antibiotic Drug Development and Host Derived Microbiome 23
Regulation. Gastroenterology 154(6):S-1040. 24 25
31. Saravanan KM, Dunker AK, Krishnaswamy S (2017) Sequence Fingerprints 26
Distinguish Erroneous from Correct Predictions of Intrinsically Disordered Protein 27
Regions. J Biomol Struct Dyn 36(16):4338-4351. 28 29
Figure 1. The workflow of virtual screening of small chemical compounds and tripeptides 1
against the 2019-nCov_3C-like protease. 2
Figure 2. The sequence alignment of SARS_coronaivrus_3C-like protease and 1
2019-nCov_3C-like protease. 2
Figure 3. The structural model of 2019-nCov_3C-like protease and its template. In panels A 1
and B, the modeled 2019-nCov_3C-like protease and SARS_3C-like protease are shown with 2
the mutated four residues marked with blue color. The ligand from the PDB 3TNT is 3
transferred to the modeled structure (Panel C) and based on residue distance from the 4
transferred ligand, we define the pocket (Panel D). The interaction between the ligand and the 5
modeled 2019-nCov_3C-like protease is also shown (Panel D). 6
Table 1. The selected compounds that may inhibit 2019-nCov_3C-like protease based on the 1
DFCNN score and autodock vina score. 2
Chemdiv ID Vina score
DeepBindVec Recommendation
C998-0189 -8.5 >0.995 Recommended
C998-0197 -7.9 >0.995 Can Try
C998-0090 -7.8 >0.995 Can Try
C998-0948 -7.7 >0.995 Recommended
C998-1046 -7.6 >0.995 Recommended
D076-0195 -7.3 >0.995 Recommended
Table 2. The potential drug candidates selected from the Targetmol-Natural compound 1
library. 2
Natural Compound DFCNN score
llitol;Sodium_gluconate score>=0.999
;Phospho(enol)pyruvic_acid_monopotassium 0.999>Score>=0.998
osine_monohydrate 0.998>Score>=0.997
Table 3. The potential drug candidates selected from the Targetmol-Approved Drug library 1
Approved Drug name DFCNN score
m_gluconate;Ganciclovir;Chlorobutanol score>=0.999
Cladribine;Entecavir;Ubenimex 0.999>Score>=0.998
hydrochloride;Imazalil;Atenolol 0.998>Score>=0.997
Table 4. The potential drug candidates selected from the Targetmol-Bioactive compounds. 1
Bioactive Compound DFCNN score
-DEOXYADENOSINE score>=0.999
Table 5. The predicted tripeptide that have high possibility (DFCNN score >=0.99) to bind 1
with the pocket of 2019-nCov_3C-like protease by DFCNN score. 2
Peptide sequence DFCNN score
KKA;KPV;KVP;PKV;PVK;VKP;VPK 0.997>Score>=0.996
LK;LKL;KLL 0.996>Score>=0.995
... Zhang et al. [1] used an in-depth learning approach for COVID-19 suppository profiling by gathering RNA-Seq disease with GISAID data by screening RNA-Seq records into protein sequences and then building a 3D protein model with homology modelling. COVID-19 primary protease is a critical, valuable goal and is occupied on medication screening grounded on the exhibited COVID-19 protease structure. ...
A leading widespread upsurge was identified as a severe acute respiratory syndrome (SARS) coronavirus “COVID-19.” It has blowout a global threat to human existence, throughout the world with millions of recognized instances and griefs, originating from Wuhan, China, to every other part of the universe by demeaning activities put in place to regulate it. Various actions are ongoing to combat the increase of this dangerous disease, such as health precautions and measures, money, substructures, databases, protective devices, and medications, among other necessities, yet interminable upsurge of the disease post a constant interruption to the universe. Several widespread innovative forecast methods have emerged in predicting COVID-19 globally to obtain keen results and impressive, relevant preemptive procedures. This study aims to apply a machine learning method for prediction of COVID-19 incidence, using KPCA-SVM. Its objectives are to use recent cases and their gene data, by imploring KPCA to fetch relevant latent components. The reduced output is classified and evaluated in terms of the performance metrics. This study is implemented in MATLAB. The algorithms used for the prediction are KPCA and SVM. The results are evaluated using accuracy, sensitivity, specificity, F-score, Matthews correlation coefficient, precision, and negative predictive value. This study uses the KPCA to fetch relevant information for the enormous data and classified using the L-SVM and SVM-RBF; it achieved 93% and 87% accuracy, respectively. The necessity to identify suitable prognostic suggestions for COVID-19 must regulate the difficulties in apprehending the disease’s increase. In this investigation, a machine learning prediction approach is projected for COVID-19 to convey the importance of principal frameworks for enhancements and evolving quicker and capable conduct for evaluating, classifying, and predicting health status concerning actions and symptoms observed, to help healthcare persons recognize and record incidence to verify qualified healthcare across nations.
... Patankar et al. [50] used the LSTM deep learning method to calculate medications to explore possible drug candidates for the treatment of COVID-19. Zhang et al. [51] established a deep learning model based on a dense fully convolutional neural network (DFCNN) to implement protein-ligand interaction pairs for further discovery of protease medicines related to COVID-19. ...
Full-text available
The COVID-19 has resulted in catastrophic situation and the deaths of millions of people all over the world. In this paper, the predictions of epidemiological propagation models, such as SIR and SEIR, are introduced to analyze the earlier COVID-19 propagation. The deep learning methods combined with transfer learning are familiar with classification-detection approaches based on chest X-ray and CT images are presented in detail. Besides, deep learning approaches have also been applied to lung ultrasound (LUS), which has been shown to be more sensitive than chest X-ray and CT images in detecting COVID-19. In the absence of a vaccine, the machine learning-related approaches are applied to analyze vaccine candidates in the realm of biology and medicine. The telehealth system played a major role in combating the pandemic from all aspects and reducing contact with patients during this period. Natural language processing-related methods are utilized to analyze tweets related to the COVID-19 epidemic on social media, and further analyze public sentiment and subject modeling, so as to arrange corresponding measures to appease public sentiment. In particular, this survey is to summarize and analyze the contributions made in various fields during the COVID-19 pandemic by considering both the contribution of deep learning in chest X-ray and CT images, as well as the application of the latest LUS during the COVID-19 pandemic. Telehealth and the importance of public sentiment analysis during a pandemic were also described in detail.
Full-text available
The coronavirus is caused by the infection of the SARS-CoV-2 virus: it represents a complex and new condition, considering that until the end of December 2019 this virus was totally unknown to the international scientific community. The clinical management of patients with the coronavirus disease has undergone an evolution over the months, thanks to the increasing knowledge of the virus, symptoms and efficacy of the various therapies. Currently, however, there is no specific therapy for SARS-CoV-2 virus, know also as Coronavirus disease 19, and treatment is based on the symptoms of the patient taking into account the overall clinical picture. Furthermore, the test to identify whether a patient is affected by the virus is generally performed on sputum and the result is generally available within a few hours or days. Researches previously found that the biomedical imaging analysis is able to show signs of pneumonia. For this reason in this paper, with the aim of providing a fully automatic and faster diagnosis, we design and implement a method adopting deep learning for the novel coronavirus disease detection, starting from computed tomography medical images. The proposed approach is aimed to detect whether a computed tomography medical images is related to an healthy patient, to a patient with a pulmonary disease or to a patient affected with Coronavirus disease 19. In case the patient is marked by the proposed method as affected by the Coronavirus disease 19, the areas symptomatic of the Coronavirus disease 19 infection are automatically highlighted in the computed tomography medical images. We perform an experimental analysis to empirically demonstrate the effectiveness of the proposed approach, by considering medical images belonging from different institutions, with an average time for Coronavirus disease 19 detection of approximately 8.9 s and an accuracy equal to 0.95.
Pandemic new severe acute respiratory syndrome coronavirus (SARS-CoV-2) virus has increased throughout the world. There is no effective treatment against this virus until now. Since its appearance in Wuhan, China in December 2019, SARS-CoV-2 becomes the largest challenge the world is opposite today, including the discovery of an antiviral drug for this virus. Several viral proteins have been prioritized as SARS-CoV-2 antiviral drug targets, among them the papain-like protease (PLpro) and the main protease (Mpro). Inhibition of these proteases would target viral replication, viral maturation and suppression of host innate immune responses. Potential candidates have been identified to show inhibitory effects against Mpro, both in biochemical assays and viral replication in cells. There are different molecules such as lopinavir and favipiravir considerably inhibit the activity of Mpro in vitro. Different studies have shown that structurally improved favipiravir and other similar compounds can inhibit SARS-CoV-2 main protease. In this work, we study the interactions between favipiravir with Mg12O12 and Zn12O12 nanoclusters by density functional theory (DFT) and quantum mechanics atoms in molecules (QMAIM) methods to summarize the ability to load favipiravir onto Mg12O12 and Zn12O12 nanoclusters. Favipiravir-Mg12O12 and favipiravir-Zn12O12 lowest structures complexes were chosen to dock inside the SARS-CoV-2 main protease by molecular docking study. The molecular docking analysis revealed that the binding affinity of Mg12O12 and Zn12O12 nanoclusters inside the Mpro receptor is larger than that of favipiravir. Also, the loading of favipiravir on the surface of Mg12O12 and Zn12O12 nanoclusters increased the binding affinity against the Mpro receptor. Subsequently, 100 ns molecular dynamics simulation of the favipiravir-Mg12O12, and favipiravir-Zn12O12 docked inside the Mpro complexes established that favipiravir-Mg12O12, forms the most stable complex with the Mpro. Further molecular mechanics Poisson Boltzmann surface area (MMPBSA) analyses using the MD trajectories also demonstrated the higher binding affinity of favipiravir-Mg12O12 inside the Mpro. In summary, this study demonstrates a new way to characterize leads for novel anti-viral drugs against SARS-CoV-2, by improving the drug ability of favipiravir via loading it on Mg12O12 and Zn12O12 nanoclusters.
COVID-19 is an evolving respiratory transmittable disease, and it holds all daily activity worldwide as a global pandemic. It appeared in the city of Wuhan (China) in November 2019 and slowly started spreading to the rest of the world. The number of cases keeps increasing drastically, leading to a shortage of medical resources and testing kids worldwide. As the physicians facing this problem, several scientists and specialists in Artificial Intelligent (AI) are rendering their support to healthcare professionals in the early detection of COVID-19 using chest X-ray image samples to determine the level of severity at a low cost. This paper proposed Genetic Deep Learning Convolutional Neural Network (GDCNN) architecture that includes Huddle Particle Swarm Optimization as an alternative to Gradient descent. Huddle PSO performs better when clubbed with GDCNN architecture. Based on publicly available datasets, trained chest X-ray images are used to predict and identify various pneumonia diseases. The proposed model performed better with an accuracy of 97.23%, a sensitivity of 98.62%, specificity of 97.0%, and precision of 93.0%. The proposed model act as a tool for earlier detection of COVID-19. In the future, we plan to apply the proposed model for the larger dataset and to predict various lung diseases.
The hypothesis that we intend to investigate here is that the extent of the impact of Covid-19 on a given country can be explained starting from a set of indicators and by using machine learning methodologies. The purpose of this chapter is not to find a way to solve the problem in an optimal way. Rather, we aim at performing a preliminary study to verify whether the aforementioned hypothesis is viable. Should it turn out so, we wish to get awareness both of which are the problems that must be solved in order to arrive at a (sub-)optimal solution, and of what are the possible limitations of the method. We firstly create a suitable data set of indicators starting from different sources available on the internet. Then, we apply onto it an evolutionary algorithm that is able to extract a set of IF–THEN decision rules allowing us to relate the values of the parameters for the different countries to the different levels of impact of Covid-19 on them.
PurposeThe appearance of the 2019 novel coronavirus (Covid-19), for which there is no treatment or a vaccine, formed a sense of necessity for new drug discovery advances. The pandemic of NCOV-19 (novel coronavirus-19) has been engaged as a public health disaster of overall distress by the World Health Organization. Different pandemic models for NCOV-19 are being exploited by researchers all over the world to acquire experienced assessments and impose major control measures. Among the standard techniques for NCOV-19 global outbreak prediction, epidemiological and simple statistical techniques have attained more concern by researchers. Insufficiency and deficiency of health tests for identifying a solution became a major difficulty in controlling the spread of NCOV-19. To solve this problem, deep learning has emerged as a novel solution over a dozen of machine learning techniques. Deep learning has attained advanced performance in medical applications. Deep learning has the capacity of recognizing patterns in large complex datasets. They are identified as an appropriate method for analyzing affected patients of NCOV-19. Conversely, these techniques for disease recognition focus entirely on enhancing the accurateness of forecasts or classifications without the ambiguity measure in a decision. Knowing how much assurance present in a computer-based health analysis is necessary for gaining clinicians’ expectations in the technology and progress treatment consequently. Today, NCOV-19 diseases are the main healthcare confront throughout the world. Detecting NCOV-19 in X-ray images is vital for diagnosis, treatment, and evaluation. Still, analytical ambiguity in a report is a difficult yet predictable task for radiologists.Method In this paper, an in-depth analysis has been performed on the significance of deep learning for Covid-19 and as per the standard search database, this is the first review research work ever made concentrating particularly on Deep Learning for NCOV-19.Conclusion The main aim behind this research work is to inspire the research community and to innovate novel research using deep learning. Moreover, the outcome of this detailed structured review on the impact of deep learning in covid-19 analysis will be helpful for further investigations on various modalities of diseases detection, prevention and finding novel solutions.
Full-text available
The outbreak of a novel febrile respiratory disease called COVID-19, caused by a newfound coronavirus SARS-CoV-2, has brought a worldwide attention. Prioritizing approved drugs is critical for quick clinical trials against COVID-19. In this study, we first manually curated three Virus-Drug Association (VDA) datasets. By incorporating VDAs with the similarity between drugs and that between viruses, we constructed a heterogeneous Virus-Drug network. A novel Random Walk with Restart method (VDA-RWR) was then developed to identify possible VDAs related to SARS-CoV-2. We compared VDA-RWR with three state-of-the-art association prediction models based on fivefold cross-validations (CVs) on viruses, drugs and virus-drug associations on three datasets. VDA-RWR obtained the best AUCs for the three fivefold CVs, significantly outperforming other methods. We found two small molecules coming together on the three datasets, that is, remdesivir and ribavirin. These two chemical agents have higher molecular binding energies of − 7.0 kcal/mol and − 6.59 kcal/mol with the domain bound structure of the human receptor angiotensin converting enzyme 2 (ACE2) and the SARS-CoV-2 spike protein, respectively. Interestingly, for the first time, experimental results suggested that navitoclax could be potentially applied to stop SARS-CoV-2 and remains to further validation.
Several emerging technologies were introduced to tackle the unprecedented crisis of the new COVID-19. Remarkable emerging technologies are outlined, such as machine and deep learning, Internet of things, cloud and fog computing, and blockchain technology. Those emerging technologies have been explored to support the solution proposed to ensure the integration of these technologies to fight the pandemic. Also, numerous emerging technologies used for the COVID-19 fight have been highlighted. Finally, the impact of COVID-19 is discussed, and applications showing how to mitigate this impact using the emerging technologies are outlined.
Although the COVID-19 pandemic continues to expand, researchers around the world are working to understand, diminish, and curtail its spread. The primary fields of research include investigating transmission of COVID-19, promoting its identification, designing potential vaccines and therapies, and recognizing the pandemic’s socio-economic impacts. Deep Learning (DL), which uses either deep learning architectures or hierarchical approaches to learning, is developed a machine learning class since 2006. The exponential growth and availability of data and groundbreaking developments in hardware technology have led to the rise of new distributed and learning studies. Throughout this chapter, we discuss how deep learning can contribute to these goals by stepping up ongoing research activities, improving the efficiency and speed of existing methods, and proposing original lines of research.
Full-text available
Since December 2019, a total of 41 cases of pneumonia of unknown etiology have been confirmed in Wuhan city, Hubei Province, China. Wuhan city is a major transportation hub with a population of more than 11 million people. Most of the patients visited a local fish and wild animal market last month. At a national press conference held today, Dr. Jianguo Xu, an academician of the Chinese Academy of Engineering, who led a scientific team announced that a new‐type coronavirus, tentatively named by World Health Organization as the 2019‐new coronavirus (2019‐nCoV), had caused this outbreak (1).
Full-text available
Deep learning contributes significantly to researches in biological sciences and drug discovery. Previous studies suggested that deep learning techniques have shown superior performance to other machine learning algorithms in virtual screening, which is a critical step to accelerate the drug discovery. However, the application of deep learning techniques in drug discovery and chemical biology are hindered due to the data availability, data further processing and lacking of the user-friendly deep learning tools and interface. Therefore, we developed a user-friendly web server with integration of the state of art deep learning algorithm, which utilizes either the public or user-provided dataset to help biologists or chemists perform virtual screening either the chemical probes or drugs for a specific target of interest. With DeepScreening, user could conveniently construct a deep learning model and generate the target-focused de novo libraries. The constructed classification and regression models could be subsequently used for virtual screening against the generated de novo libraries, or diverse chemical libraries in stock. From deep models training to virtual screening, and target focused de novo library generation, all those tasks could be finished with DeepScreening. We believe this deep learning-based web server will benefit to both biologists and chemists for probes or drugs discovery.
Full-text available
Proteins interact with small molecules to modulate several important cellular functions. Many acute diseases were cured by small molecule binding in the active site of protein either by inhibition or activation. Currently, there are several docking programs to estimate the binding position and the binding orientation of protein–ligand complex. Many scoring functions were developed to estimate the binding strength and predict the effective protein–ligand binding. While the accuracy of current scoring function is limited by several aspects, the solvent effect, entropy effect, and multibody effect are largely ignored in traditional machine learning methods. In this paper, we proposed a new deep neural network-based model named DeepBindRG to predict the binding affinity of protein–ligand complex, which learns all the effects, binding mode, and specificity implicitly by learning protein–ligand interface contact information from a large protein–ligand dataset. During the initial data processing step, the critical interface information was preserved to make sure the input is suitable for the proposed deep learning model. While validating our model on three independent datasets, DeepBindRG achieves root mean squared error (RMSE) value of pKa (−logKd or −logKi) about 1.6–1.8 and R value around 0.5–0.6, which is better than the autodock vina whose RMSE value is about 2.2–2.4 and R value is 0.42–0.57. We also explored the detailed reasons for the performance of DeepBindRG, especially for several failed cases by vina. Furthermore, DeepBindRG performed better for four challenging datasets from DUD.E database with no experimental protein–ligand complexes. The better performance of DeepBindRG than autodock vina in predicting protein–ligand binding affinity indicates that deep learning approach can greatly help with the drug discovery process. We also compare the performance of DeepBindRG with a 4D based deep learning method “pafnucy”, the advantage and limitation of both methods have provided clues for improving the deep learning based protein–ligand prediction model in the future.
Full-text available
Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis.
Full-text available
Phylogenetic analysis has demonstrated that some positive-sense RNA viruses can be classified into the picornavirus-like supercluster, which includes picornaviruses, caliciviruses, and coronaviruses. These viruses possess 3C or 3C-like proteases (3Cpro or 3CLpro, respectively), which contain a typical chymotrypsin-like fold and a catalytic triad (or dyad) with a Cys residue as a nucleophile. The conserved key sites of 3Cpro or 3CLpro may serve as attractive targets for the design of broad-spectrum antivirals for multiple viruses in the supercluster. We previously reported the structure-based design and synthesis of potent protease inhibitors of Norwalk virus (NV), a member of the Caliciviridae family. We report herein the broad-spectrum antiviral activities of three compounds possessing a common dipeptidyl residue with different warheads, i.e., an aldehyde (GC373), a bisulfite adduct (GC376), and an α-ketoamide (GC375), against viruses that belong to the supercluster. All compounds were highly effective against the majority of tested viruses, with half-maximal inhibitory concentrations in the high nanomolar or low micromolar range in enzyme- and/or cell-based assays and with high therapeutic indices. We also report the high-resolution X-ray cocrystal structures of NV 3CLpro-, poliovirus 3Cpro-, and transmissible gastroenteritis virus 3CLpro- GC376 inhibitor complexes, which show the compound covalently bound to a nucleophilic Cys residue in the catalytic site of the corresponding protease. We conclude that these compounds have the potential to be developed as antiviral therapeutics aimed at a single virus or multiple viruses in the picornavirus-like supercluster by targeting 3Cpro or 3CLpro.
Since the SARS outbreak 18 years ago, a large number of severe acute respiratory syndrome related coronaviruses (SARSr-CoV) have been discovered in their natural reservoir host, bats. Previous studies indicated that some of those bat SARSr-CoVs have the potential to infect humans. Here we report the identification and characterization of a novel coronavirus (nCoV-2019) which caused an epidemic of acute respiratory syndrome in humans, in Wuhan, China. The epidemic, started from December 12th, 2019, has caused 198 laboratory confirmed infections with three fatal cases by January 20th, 2020. Full-length genome sequences were obtained from five patients at the early stage of the outbreak. They are almost identical to each other and share 79.5% sequence identify to SARS-CoV. Furthermore, it was found that nCoV-2019 is 96% identical at the whole genome level to a bat coronavirus. The pairwise protein sequence analysis of seven conserved non-structural proteins show that this virus belongs to the species of SARSr-CoV. The nCoV-2019 virus was then isolated from the bronchoalveolar lavage fluid of a critically ill patient, which can be neutralized by sera from several patients. Importantly, we have confirmed that this novel CoV uses the same cell entry receptor, ACE2, as SARS-CoV.
Inverse virtual screening is an important technique in the early stage of drug development. This technique can provide preliminary clues for unknown molecules, which is useful in the following researches. In this work, combining with Word2vec, a natural language processing technique, dense fully connected neural network (DFCNN) algorithm is utilized to build up a prediction model. This model is able to perform a binary classification. Based on the query molecule, the input protein candidates can be classified into two subsets. One set is that potential targets with high possibilities to bind with the query molecule and the other one is that the proteins with low possibilities to bind with the query molecule. This model is named as IVS2vec. IVS2vec also can output a score reflecting binding possibility of the association between a protein and a molecule, which is useful to improve efficiency of research. We applied IVS2vec on several databases related to drug development. The results illustrated that IVS2vec can be used to detect possible therapeutic targets. In addition, it also can find targets related to adverse drug reactions. This is useful to improve medication safety and repurpose drugs. Moreover, IVS2vec can give a very fast speed to perform prediction jobs. It is suitable for processing a large number of compounds in chemical database. We also find that IVS2vec has potential capabilities outperform other state-of-the-art docking tools such as Autodock vina. In this study, IVS2vec brings many convincing results than Autodock vina in the reverse target searching case of Quercetin.
Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.