Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
1
Note: This paper has been revised after peer review, so that it can be considered technically
1
correct.
2
3
Complete genome characterisation of a novel coronavirus
4
associated with severe human respiratory disease in Wuhan,
5
China
6
Fan Wu1,6, Su Zhao2,6, Bin Yu3,6, Yan-Mei Chen1,6, Wen Wang4,6, Yi Hu2,6, Zhi-Gang Song1,6,
7
Zhao-Wu Tao2, Jun-Hua Tian3, Yuan-Yuan Pei1, Ming-Li Yuan2, Yu-Ling Zhang1, Fa-Hui
8
Dai1, Yi Liu1, Qi-Min Wang1, Jiao-Jiao Zheng1, Lin Xu1, Edward C. Holmes5, Yong-Zhen
9
Zhang1,4*
10
11
1Shanghai Public Health Clinical Center & School of Public Health, Fudan University,
12
Shanghai, China.
13
2Department of Pulmonary and Critical Care Medicine, The Central Hospital of Wuhan,
14
Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430014,
15
China.
16
3Wuhan Center for Disease Control and Prevention, Wuhan, Hubei, China
17
4Department of Zoonosis, National Institute for Communicable Disease Control and
18
Prevention, Chinese Center for Disease Control and Prevention, Changping, Beijing, China.
19
5Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and
20
Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney,
21
Australia.
22
23
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
2
6These authors contributed equally: Fan Wu, Su Zhao, Bin Yu, Yan-Mei Chen, Wen Wang, Yi
24
Hu, Zhi-Gang Song. *e-mail: zhangyongzhen@shphc.org.cn
25
26
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
3
Emerging and re-emerging infectious diseases, such as SARS, MERS, Zika and highly
27
pathogenic influenza present a major threat to public health1-3. Despite intense research
28
effort, how, when and where novel diseases appear are still the source of considerable
29
uncertainly. A severe respiratory disease was recently reported in the city of Wuhan,
30
Hubei province, China. At the time of writing, at least 62 suspected cases have been
31
reported since the first patient was hospitalized on December 12nd 2019. Epidemiological
32
investigation by the local Center for Disease Control and Prevention (CDC) suggested
33
that the outbreak was associated with a sea food market in Wuhan. We studied seven
34
patients who were workers at the market, and collected bronchoalveolar lavage fluid
35
(BALF) from one patient who exhibited a severe respiratory syndrome including fever,
36
dizziness and cough, and who was admitted to Wuhan Central Hospital on December
37
26th 2019. Next generation metagenomic RNA sequencing4 identified a novel RNA virus
38
from the family Coronaviridae designed WH-Human-1 coronavirus (WHCV).
39
Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that
40
WHCV was most closely related (89.1% nucleotide similarity similarity) to a group of
41
Severe Acute Respiratory Syndrome (SARS)-like coronaviruses (genus Betacoronavirus,
42
subgenus Sarbecovirus) previously sampled from bats in China and that have a history
43
of genomic recombination. This outbreak highlights the ongoing capacity of viral spill-
44
over from animals to cause severe disease in humans.
45
46
Seven patients, comprising five men and two women, were hospitalized at the Central
47
Hospital of Wuhan from December 14 through December 28, 2019. The median age of the
48
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
4
patients was 43, ranging from 31 to 70 years old. The clinical characteristics of the patients
49
are shown in Table 1. Fever and cough were the most common symptoms. All patients had
50
fever with body temperatures ranging from 37.2oC to 40oC. Patients 1, 2, 5, 6 and 7 had
51
cough, while patients 1, 2 and 7 presented with severe cough with phlegm at onset of illness.
52
Patients 4 and 5 also complained of chest tightness and dyspnea. Patients 1, 3, 4 and 6
53
experienced dizziness and patient 3 felt weakness. No neurological symptoms were observed
54
in any of the patients. Bacterial culture revealed the presence of Streptococcus bacteria in
55
throat swabs from patients 3, 4 and 7. Combination antibiotic, antiviral and glucocorticoid
56
therapy were administered. Unfortunately, patient 1 and 4 showed respiratory failure: patient
57
1 was given high flow noninvasive ventilation, while patient 4 was provided with nasal/face
58
mask ventilation (Table 1).
59
Epidemiological investigation by the Wuhan CDC revealed that all the suspected cases
60
were linked to individuals working in a local indoor seafood market. Notably, in addition to
61
fish and shell fish, a variety of live wild animals including hedgehogs, badgers, snakes, and
62
birds (turtledoves) were available for sale in the market before the outbreak began, as well as
63
animal carcasses and animal meat. No bats were available for sale. While the patients might
64
have had contact with wild animals in the market, none recalled exposure to live poultry.
65
Patient 1 was a 41-year-old man with no history of hepatitis, tuberculosis or diabetes.
66
He was admitted and hospitalized in Wuhan Central Hospital 6 days after the onset of illness.
67
The patient reported fever, chest tightness, unproductive cough, pain and weakness for one
68
week on presentation. Physical examination of cardiovascular, abdominal and neurologic
69
examination was normal. Mild lymphopenia (less than 900 cells per cubic milli-meter) was
70
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
5
observed, but white blood cell and blood platelet count was normal in a complete blood count
71
(CBC) test. Elevated levels of C-reactive protein (CRP, 41.4 mg/L of blood, reference range
72
0-6 mg/L) was observed and levels of aspartate aminotransferase, lactic dehydrogenase, and
73
creatine kinase were slightly elevated in blood chemistry tests. The patient had mild
74
hypoxemia with oxygen levels of 67mmHg by the Arterial Blood Gas (ABG) Test. On the
75
first day of admission (day 6 after the onset of illness), chest radiographs were abnormal with
76
air-space shadowing such a ground-glass opacities, focal consolidation and patchy
77
consolidation in both lungs (Figure 1). Chest computed tomographic (CT) scans revealed
78
bilateral focal consolidation, lobar consolidation and patchy consolidation, especially in the
79
lower lung. A chest radiograph revealed a bilateral diffuse patchy and fuzzy shadow on day 5
80
after admission (day 11 after the onset of illness). Preliminary aetiological investigation
81
excluded the presence of influenza virus, Chlamydia pneumoniae and Mycoplasma
82
pneumoniae by commercial pathogen antigen detection kits and confirmed by PCR. Other
83
common respiratory pathogens, including adenovirus, were also negative by qPCR (Figure
84
S1). The condition of the patient did not improve after three days of treatment with combined
85
antiviral and antibiotic therapy. He was admitted to the intensive care unit (ICU) and
86
treatment with a high flow non-invasive ventilator was initiated. The patient was transferred
87
to another hospital in Wuhan for further treatment 6 days after admission.
88
To investigate the possible aetiologic agents associated this disease, we collected
89
bronchoalveolar lavage fluid (BALF) from patient 1 and performed deep meta-transcriptomic
90
sequencing. All the clinical specimens were handled in a biosafety level 3 laboratory at the
91
Shanghai Public Health Clinical Center. Total RNA was extracted from 200μl BAL fluid and
92
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
6
a meta-transcriptomic library was constructed for pair-end (150 bp) sequencing using an
93
Illumina MiniSeq as previously described4-7. In total, we generated 56,565,928 sequence
94
reads that were de novo assembled and screened for potential aetiologic agents. Of the
95
384,096 contigs assembled by Megahit8, the longest (30,474 nucleotides [nt]) had high
96
abundance and was closely related to a bat SARS-like coronavirus isolate - bat-SL-CoVZC45
97
(GenBank Accession MG772933) - previously sampled in China, with a nt identity of 89.1%
98
(Table S1 and S2). The genome sequence of this novel virus, as well as its termini, were
99
determined and confirmed by RT-PCR9 and 5'/3' RACE kits (TaKaRa), respectively. This
100
new virus was designated as WH-Human 1 coronavirus (WHCV) (and has also been referred
101
to as '2019-nCoV') and its whole genome sequence (29,903 nt) has been assigned GenBank
102
accession number MN908947. Remapping the RNA-seq data against the complete genome of
103
WHCV resulted in an assembly of 123,613 reads, providing 99.99% genome coverage at a
104
mean depth of 6.04X (range: 0.01X -78.84X) (Figure S2). The viral load in the BALF sample
105
was estimated by quantitative PCR (qPCR) to be 3.95×108 copies/mL (Figure S3).
106
The viral genome organization of WHCV was characterized by sequence alignment
107
against two representative members of the genus Betacoronavirus: a human-origin
108
coronavirus (SARS-CoV Tor2, AY274119) and a bat-origin coronavirus (Bat-SL-CoVZC45,
109
MG772933) (Figure 2). The un-translational regions (UTR) and open reading frame (ORF) of
110
WHCV were mapped based on this sequence alignment and ORF prediction. The WHCV
111
viral genome was similar to these two coronaviruses (Figure 2 and Table S3), with a gene
112
order 5'-replicase ORF1ab-S-envelope(E)-membrane(M)-N-3'. WHCV has 5' and 3' terminal
113
sequences typical of the betacoronaviruses, with 265 nt at the 5' terminal and 229 nt at the 3'
114
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
7
terminal region. The predicted replicase ORF1ab gene of WHCV is 21,291 nt in length and
115
contained 16 predicted non-structural proteins (Table S4), followed by (at least) 13
116
downstream ORFs. Additionally, WHCV shares a highly conserved domain (LLRKNGNKG:
117
amino acids 122-130) with SARS-CoV in nsp1. The predicted S, ORF3a, E, M and N genes
118
of WHCV are 3,822, 828, 228, 669 and 1,260 nt in length, respectively. In addition to these
119
ORFs regions that are shared by all members of the subgenus Sarbecovirus, WHCV is similar
120
to SARS-CoV in that it carries a predicted ORF8 gene (366 nt in length) located between the
121
M and N ORF genes. The functions of WHCV ORFs were predicted based on those of known
122
coronaviruses and given in Table S5. In a manner similar to SARS CoV Tor2, a leader
123
transcription regulatory sequence (TRS) and nine putative body TRSs could be readily
124
identified upstream of the 5' end of ORF, with the putative conserved TRS core sequence
125
appeared in two forms – the ACGAAC or CUAAAC (Table S6).
126
To determine the evolutionary relationships between WHCV and previously identified
127
coronaviruses, we estimated phylogenetic trees based on the nucleotide sequences of the
128
whole genome sequence, non-structural protein genes ORF1a and 1b, and the main structural
129
proteins encoded by the S, E, M and N genes (Figures 3 and S4). In all phylogenies WHCV
130
clustered with members of the subgenus Sarbecovirus, including the SARS-CoV responsible
131
for the global SARS pandemic of 2002-20031,2, as well as a number of SARS-like
132
coronaviruses sampled from bats. However, WHCV changed topological position within the
133
subgenus Sarbecovirus depending on which gene was used, suggestive of a past history of
134
recombination in this group of viruses (Figures 3 and S4). Specifically, in the S gene tree
135
(Figure S4), WHCV was most closely related to the bat coronavirus bat-SL-CoVZC45 with
136
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
8
82.3% amino acid (aa) identity (and ~77.2% aa identity to SARS CoV; Table S3), while in the
137
ORF1b phylogeny WHCV fell in a basal position within the subgenus Sarbecovirus (Figure
138
3). This topological division was also observed in the phylogenetic trees estimated for
139
conserved domains in the replicase polyprotein pp1ab (Figure S5).
140
To better understand the potential of WHCV to infect humans, the receptor-binding
141
domain (RBD) of its spike protein was compared to those in SARS-CoVs and bat SARS-like
142
CoVs. The RBD sequences of WHCV were more closely related to those of SARS-CoVs
143
(73.8%-74.9% aa identity) and SARS-like CoVs including strains Rs4874, Rs7327 and
144
Rs4231 (75.9%-76.9% aa identity) that are able to use the human ACE2 receptor for cell entry
145
(Table S7)10. In addition, the WHCV RBD was only one amino acid longer than the SARS-
146
CoV RBD (Figure 4a). In contrast, other bat SARS-like CoVs including the Rp3 strain that
147
cannot use human ACE211, had amino acid deletions at positions 473-477 and 460-472
148
compared to the SARS-CoVs (Figure 4a). The previously determined12 crystal structure of
149
SARS-CoV RBD complexed with human ACE2 (PDB 2AJF) revealed that regions 473-477
150
and 460-472 directly interact with human ACE2 and hence may be important in determining
151
species specificity (Figure 4b). We predicted the three-dimension protein structures of WHCV,
152
Rs4874 and Rp3 RBD domains by protein homology modelling using the SWISS-MODEL
153
server and compared them to the crystal structure of SARS-CoV RBD domains (PDB 2GHV)
154
(Figure 4, c-f). In accord with the sequence alignment, the predicted protein structures of
155
WHCV and Rs4874 RBD domains were closely related to that of SARS-CoVs and different
156
from the predicted structure of the RBD domain from Rp3. In addition, the N-terminus of
157
WHCV S protein is more similar to that of SARS-CoV rather than other human coronaviruses
158
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
9
(HKU1 and OC43) (Figure S6) that can bind to sialic acid13. In sum, the high similarities of
159
amino acid sequences and predicted protein structure between WHCV and SARS-CoV RBD
160
domains suggest that WHCV may efficiently use human ACE2 as a cellular entry receptor,
161
perhaps facilitating human-to-human transmission10, 14-15.
162
To further characterize putative recombination events in the evolutionary history of the
163
sarbecoviruses the whole genome sequence of WHCV and four representative coronaviruses -
164
Bat SARS-like CoV Rp3, CoVZC45, CoVZXC21 and SARS-CoV Tor2 - were analysed using
165
the Recombination Detection Program v4 (RDP4)16. Although the similarity plots suggested
166
possible recombination events between WHCV and SARS CoVs or SARS-like CoVs (Figure
167
S7), there was no significant evidence for recombination across the genome as a whole.
168
However, some evidence for past recombination was detected in the S gene of WHCV and
169
SARS CoV and bat SARS-like CoVs (WIV1 and RsSHC014) (p<3.147×10-3 to p<9.198×10-
170
9), with similarity plots suggesting the presence of recombination break points at nucleotides
171
1,029 and 1,652 that separated the WHCV S gene into three regions (Figure 5). In
172
phylogenies of the fragment nt 1 to 1029 and nt 1652 to the end of the sequence, WHCV was
173
most closely related to Bat-SL-CoVZC45 and Bat-SL-CoVZXC21, whereas in the region nt
174
1030 to 1651 (the RBD region) WHCV grouped with SARS CoV and bat SARS-like CoVs
175
(WIV1 and RsSHC014) that are capable of direct human transmission14,17.
176
Coronaviruses are associated with a number of infectious disease outbreaks in humans,
177
including SARS in 2002/3 and MERS in 20121,18. Four other coronaviruses - human
178
coronaviruses HKU1, OC43, NL63 and 229E - are also associated with respiratory disease19-
179
22. Although SARS-like coronaviruses have been widely identified in mammals including bats
180
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
10
since 2005 in China9,23-25, the exact origin of human-infected coronaviruses remains unclear.
181
Herein, we describe a novel coronavirus - WHCV - in BALF from a patient experiencing
182
severe respiratory disease in Wuhan, China. Phylogenetic analysis suggested that WHCV
183
represents a novel virus within genus Betacoronavirus (subgenus Sarbecovirus) and hence
184
that exhibits some genomic and phylogenetic similarity to SARS-CoV1, particularly in the
185
RBD. These genomic and clinical similarities to SARS, as well as its high abundance in
186
clinical samples, provides evidence for an association between WHCV and the ongoing
187
outbreak of respiratory disease in Wuhan.
188
The identification of multiple SARS-like-CoVs in bats led to the idea that these animals
189
act as the natural reservoir hosts of these viruses19,20. Although SARS-like viruses have been
190
identified widely in bats in China, viruses identical to SARS-CoV have not yet been
191
documented. Notably, WHCV is most closely related to bat coronaviruses, even exhibiting
192
100% aa similarity to Bat-SL-CoVZC45 in the nsp7 and E proteins. Hence, these data suggest
193
that bats are a possible reservoir host of WHCV. However, as a variety of animal species were
194
for sale in the market when the disease was first reported, more work is needed to determine
195
the natural reservoir and any intermediate hosts of WHCV.
196
197
Acknowledgements This study was supported by the Special National Project on
198
investigation of basic resources of China (Grant SQ2019FY010009) and the National Natural
199
Science Foundation of China (Grants 81861138003 and 31930001). ECH is supported by an
200
ARC Australian Laureate Fellowship (FL170100022).
201
202
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
11
Author Contributions Y. -Z.Z. conceived and designed the study. S.Z, Y.H, Z.-W.T. and M.-
203
L.Y. performed the clinical work and sample collection. B.Y and J.-H.T. performed
204
epidemiological investigation and sample collection. F.W, Z.-G.S., L.X., Y.-Y.P., Y.-L.Z., F.-
205
H.D., Y.L., J.-J.Z. and Q.-M.W. performed the experiments. Y.-M.C., W.W., F.W., E.C.H. and
206
Y. -Z.Z. analysed the data. Y.-Z.Z. E.C.H. and F.W. wrote the paper with input from all
207
authors. Y.-Z.Z. led the study.
208
209
Correspondence and requests for materials should be addressed to Y. -Z.Z.
210
(zhangyongzhen@shphc.org.cn or zhangyongzhen@icdc.cn).
211
212
Competing interests The authors declare no competing interests.
213
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
12
References
214
1. Drosten, C., et al., Identification of a novel coronavirus in patients with severe acute
215
respiratory syndrome. N. Engl. J. Med. 348,1967–1976 (2003).
216
2. Wolfe, N.D., Dunavan, C.P., Diamond, J. Origins of major human infectious diseases.
217
Nature. 447, 279-283 (2007).
218
3. Ventura, C.V., Maia, M., Bravo-Filho, V., Góis, A.L. & Belfort, R. Jr. Zika virus in Brazil
219
and macular atrophy in a child with microcephaly. Lancet. 387, 228 (2016).
220
4. Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature. 540, 539-543 (2016).
221
5. Shi, M., et al. The evolutionary history of vertebrate RNA viruses. Nature. 556,197-202
222
(2018).
223
6. Yadav, P.D., et al. Nipah virus sequences from humans and bats during Nipah outbreak,
224
Kerala, India, 2018. Emerg. Infect. Dis. 25, 1003-1006 (2019).
225
7. McMullan, L.K., et al. Characterisation of infectious Ebola virus from the ongoing
226
outbreak to guide response activities in the Democratic Republic of the Congo: a
227
phylogenetic and in vitro analysis. Lancet. Infect. Dis. 19, 1023-1032 (2019).
228
8. Li, D., Liu, C.M., Luo, R., Sadakane, K. & Lam, T.W. MEGAHIT: An ultra-fast single-
229
node solution for large and complex metagenomics assembly via succinct de Bruijn
230
graph. Bioinformatics 31, 1674-1676 (2015).
231
9. Wang, W., et al. Discovery, diversity and evolution of novel coronaviruses sampled from
232
rodents in China. Virology. 474, 19-27 (2015).
233
10. Hu, B. et al. Discovery of a rich gene pool of bat SARS-related coronaviruses provides
234
new insights into the origin of SARS coronavirus. PLoS Pathog. 13: e1006698 (2017).
235
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
13
11. Ren, W. et al. Difference in receptor usage between severe acute respiratory syndrome
236
(SARS) coronavirus and SARS-like coronavirus of bat origin. J Virol. 82:1899-1907
237
(2008).
238
12. Li, F., Li, W., Farzan, M., Harrison, S.C. Structure of SARS coronavirus spike receptor-
239
binding domain complexed with receptor. Science. 309, 1864-1868 (2005).
240
13. Hulswit, R.J.G., et al. Human coronaviruses OC43 and HKU1 bind to 9-O-acetylated
241
sialic acids via a conserved receptor-binding site in spike protein domain A. Proc Natl
242
Acad Sci USA., 116, 2681-2690 (2019).
243
14. Ge, X.Y. et al. Isolation and characterization of a bat SARS-like coronavirus that uses the
244
ACE2 receptor. Nature. 503: 535-538 (2013).
245
15. Yang, X.L., et al. Isolation and characterization of a novel bat coronavirus closely related
246
to the direct progenitor of severe acute respiratory syndrome coronavirus. J Virol. 90:
247
3253-3256 (2016).
248
16. Martin, D.P., Lemey, P., Lott, M., Moulton, V., Posada, D., Lefeuvre, P. RDP3: a flexible
249
and fast computer program for analyzing recombination. Bioinformatics 26:2462–2463
250
(2010).
251
17. Menachery, V.D., et al. A SARS-like cluster of circulating bat coronaviruses shows
252
potential for human emergence. Nat Med. 21:1508-1513 (2015).
253
18. Bermingham, A., et al. Severe respiratory illness caused by a novel coronavirus, in a
254
patient transferred to the United Kingdom from the Middle East, September 2012. Euro.
255
Surveill. 17, 20290 (2012).
256
19. Hamre, D. & Procknow, J.J. A new virus isolated from the human respiratory tract. Proc.
257
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
14
Soc. Exp. Biol. Med. 121, 190–193 (1966).
258
20. McIntosh, K., Becker, W.B., Chanock, R.M. Growth in suckling-mouse brain of "IBV-
259
like" viruses from patients with upper respiratory tract disease. Proc Natl Acad Sci USA.
260
58, 2268-73(1967).
261
21. van der Hoek, L., et al. Identification of a new human coronavirus. Nat. Med.10, 368–373
262
(2004).
263
22. Woo, P.C., et al. Characterization and complete genome sequence of a novel coronavirus,
264
coronavirus HKU1, from patients with pneumonia. J. Virol.79,884–895 (2005).
265
23. Li, W., et al. Bats are natural reservoirs of SARS-like coronaviruses. Science 310, 676–
266
679 (2005).
267
24. Lau S.K., et al. Severe acute respiratory syndrome coronavirus- like virus in Chinese
268
horseshoe bats. Proc.Natl.Acad.Sci.U.S.A.102, 14040–14045 (2005).
269
25. Wang, W., et al. Discovery of a highly divergent coronavirus in the Asian house shrew
270
from China illuminates the origin of the Alphacoronaviruses. J. Virol. 91, e00764-17
271
(2017).
272
273
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
15
Figure legends
274
275
Figure 1. Chest radiographs of patient 1. a. b. c. d. Chest computed tomographic scans of
276
Patient 1 were obtained on the day of admission (day 6 after the onset of illness). Bilateral
277
focal consolidation, lobar consolidation, and patchy consolidation were clearly observed,
278
especially in the lower lung. e. Chest radiograph of patient 1 was obtained on day 5 after
279
admission (day 11 after the onset of illness). Bilateral diffuse patchy and fuzzy shadow were
280
observed.
281
282
Figure 2. Genome organization of SARS and SARS-like CoVs including Tor2, CoVZC45
283
and WHCV determined here.
284
285
Figure 3. Maximum likelihood phylogenetic trees of nucleotide sequences of the ORF1a,
286
ORF1b, E and M genes of WHCV and related coronaviruses. Numbers (>70) above or below
287
branches indicate percentage bootstrap values for the associated nodes. The trees were mid-
288
point rooted for clarity only. The scale bar represents the number of substitutions per site.
289
290
Figure 4. Analysis of receptor-binding domain (RBD) of the spike (S) protein of WHCV
291
coronavirus. (a) Amino acid sequence alignment of SARS-like CoV RBD sequences. Three
292
bat SARS-like CoVs, which could efficiently utilize the human ACE2 as receptor, had an
293
RBD sequence of similar size to SARS-CoV, and WHCV contains a single Val 470 insertion.
294
The key amino acid residues involved in the interaction with human ACE2 are marked with a
295
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
16
brown box. In contrast, five bat SARS-like CoVs had amino acid deletions at two motifs
296
(amino acids 473-477 and 460-472) compared with those of SARS-CoV, and Rp3 has been
297
reported not to use ACE2.11 (b) The two motifs (aa 473-477 and aa 460-472) are shown in red
298
on the crystal structure of the SARS-CoV spike RBD complexed with receptor human ACE2
299
(PDB 2AJF). Human ACE2 is shown in blue and the SARS-CoV spike RBD is shown in
300
green. Important residues in human ACE2 that interact with SARS-CoV spike RBD are
301
marked. (c) Predicted protein structures of RBD of WHCV spike protein based on target-
302
template alignment using ProMod3 on the SWISS-MODEL server. The most reliable models
303
were selected based on GMQE and QMEAN Scores. Template: 2ghw.1.A, GMQE: 0.83;
304
QMEAN:-2.67. Motifs resembling amino acids 473-477 and 460-472 of the SARS-CoV spike
305
protein are shown in red. (d) Predicted structure of RBD of SARS-like CoV Rs4874.
306
Template: 2ghw.1.A, GMQE:0.99; QMEAN:-0.72. Motifs resembling amino acids 473-477
307
and 460-472 of the SARS-CoV spike protein are shown in red. (e) Predicted structure of the
308
RBD of SARS-like CoV Rp3. Template: 2ghw.1.A, GMQE:0.81, QMEAN:-1.50. (f) Crystal
309
structure of RBD of SARS-CoV spike protein (green) (PDB 2GHV). Motifs of amino acids
310
473-477 and 460-472 are shown in red.
311
312
Figure 5. Possible recombination events in the S gene of sarbecoviruses. A sequence
313
similarity plot (upper panel) reveals two putative recombination break-points shown by black
314
dashed lines, with their locations indicated at the bottom. The plot shows S gene similarity
315
comparisons of the WHCV (query) against SARS-CoV Tor2 and bat SARS-like CoVs WIV1,
316
Rf1 and CoVZC45. Phylogenies of major parental region (1-1028 and 1653-3804) and minor
317
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
17
parental region (1029-1652) are shown below the similarity plot. Phylogenies were estimated
318
using a ML method and mid-point rooted for clarity only. Numbers above or below branches
319
indicate percentage bootstrap values.
320
321
Methods
322
Cases and collection of clinical data and samples
323
Patients presenting with acute onset of fever (>37.5℃), cough, and chest tightness, and who
324
were admitted to Wuhan Central Hospital in Wuhan city, China, were considered as suspected
325
cases. During admission, bronchoalveolar lavage fluid (BALF) was collected and stored at -
326
80℃ until further processing. Demographic, clinical and laboratory data were retrieved from
327
the clinical records of the confirmed patients. The study was reviewed and approved by the
328
ethics committee of the National Institute for Communicable Disease Control and Prevention,
329
Chinese Center for Disease Control and Prevention (CDC).
330
331
RNA library construction and sequencing
332
Total RNA was extracted from the BALF sample of patient 1 using the RNeasy Plus
333
Universal Mini Kit (Qiagen) following the manufacturer’s instructions. The quantity and
334
quality of the RNA solution was assessed using a Qbit machine and an Agilent 2100
335
Bioanalyzer (Agilent Technologies) before library construction and sequencing. An RNA
336
library was then constructed using the SMARTer Stranded Total RNA-Seq Kit v2 (TaKaRa,
337
Dalian, China). Ribosomal RNA (rRNA) depletion was performed during library construction
338
following the manufacturer’s instructions. Paired-end (150 bp) sequencing of the RNA library
339
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
18
was performed on the MiniSeq platform (Illumina). Library preparation and sequencing were
340
carried out at the Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
341
Data processing and viral agent identification
342
Sequencing reads were first adaptor- and quality-trimmed using the Trimmomatic program26.
343
The remaining reads (56,565,928 reads) were assembled de novo using both the Megahit
344
(version 1.1.3)8 and Trinity program (version 2.5.1)27 with default parameter settings. Megahit
345
generated a total of 384,096 assembled contigs (size range: 200-30,474 nt), while Trinity
346
generated 1,329,960 contigs with a size range of 201 to 11,760 nt. All of these assembled
347
contigs were compared (using blastn and Diamond blastx) against the entire non-redundant
348
nucleotide (Nt) and protein (Nr) database, with e-values set to 1×10-10 and 1×10-5,
349
respectively. To identify possible aetiologic agents present in the sequence data, the
350
abundance of the assembled contigs was first evaluated as the expected counts using the
351
RSEM program28 implemented in Trinity. Non-human reads (23,712,657 reads), generated by
352
filtering host reads using the human genome (human release 32, GRCh38.p13, downloaded
353
from Gencode) by Bowtie229, were used for the RSEM abundance assessment.
354
As the longest contigs generated by Megahit (30,474 nt) and Trinity (11,760 nt) both
355
had high similarity to the bat SARS-like coronavirus isolate bat-SL-CoVZC45 and were at
356
high abundance (Table S1 and S2), the longer one (30,474 nt) that covered almost the whole
357
virus genome was used for primer design for PCR confirmation and genome termini
358
determination. Primers used in PCR, qPCR and RACE experiments are listed in Table S8. The
359
PCR assay was conducted as described previously9 and the complete genome termini was
360
determined using the Takara SMARTer RACE 5'/3' kit (TaKaRa) following the
361
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
19
manufacturer’s instructions. Subsequently, the genome coverage and sequencing depth were
362
determined by remapping all of the adaptor- and quality-trimmed reads to the whole genome
363
of WHCV using Bowtie229 and Samtools30.
364
The viral loads of WHCV in BALF of patient 1 were determined by quantitative real-
365
time RT-PCR with Takara One Step PrimeScript™ RT-PCR Kit (Takara RR064A) following
366
the manufacturer’s instructions. Real-time RT-PCR was performed using 2.5μl RNA with
367
8pmol of each primer and 4pmol probe under the following conditions: reverse transcription
368
at 42oC for 10 minutes, and 95oC for 1 minute, followed by 40 cycles of 95℃ for 15 seconds
369
and 60℃ for 1 minute. The reactions were performed and detected by ABI 7500 Real-Time
370
PCR Systems. PCR product covering the Taqman primers and probe region was cloned into
371
pLB vector using the Lethal Based Simple Fast Cloning Kit (TIAGEN) as standards for
372
quantitative viral load test.
373
Virus genome characterization and phylogenetic analysis
374
For the newly identified virus genome, the potential open reading frames (ORFs) were
375
predicted and annotated using the conserved signatures of the cleavage sites recognized by
376
coronavirus proteinases, and were processed in the Lasergene software package (version 7.1,
377
DNAstar). The viral genes were aligned using the L-INS-i algorithm implemented in MAFFT
378
(version 7.407)31.
379
Phylogenetic analyses were then performed using the nucleotide sequences of various
380
CoV gene data sets: (i) Whole genome, (ii) ORF1a, (iii) ORF1b, (iv) nsp5 (3CLpro), (v)
381
RdRp (nsp12), (vi) nsp13 (Hel), (vii) nsp14 (ExoN), (viii) nsp15 (NendoU), (ix) nsp16 (O-
382
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
20
MT), (x) spike (S), and the (xi) nucleocapsid (N). Phylogenetic trees were inferred using the
383
Maximum likelihood (ML) method implemented in the PhyML program (version 3.0)32, using
384
the Generalised Time Reversible substitution (GTR) model and Subtree Pruning and
385
Regrafting (SPR) branch-swapping. Bootstrap support values were calculated from 1,000
386
pseudo-replicate trees. The best-fit model of nucleotide substitution was determined using
387
MEGA (version 5)33. Amino acid identities among sequences were calculated using the
388
MegAlign program implemented in the Lasergene software package (version 7.1, DNAstar).
389
Genome recombination analysis
390
Potential recombination events in the history of the sarbecoviruses were assessed using both
391
the Recombination Detection Program v4 (RDP4)16 and Simplot (version 3.5.1)34. The RDP4
392
analysis was conducted based on the complete genome (nucleotide) sequence, employing the
393
RDP, GENECONV, BootScan, maximum chi square, Chimera, SISCAN, and 3SEQ methods.
394
Putative recombination events were identified with a Bonferroni corrected p-value cut-off of
395
0.01. Similarity plots were inferred using Simplot to further characterize potential
396
recombination events, including the location of breakpoints.
397
Analysis of RBD domain of WHCV spike protein
398
An amino acid sequence alignment of WHCV, SARS-CoVs, bat SARS-like CoVs RBD
399
sequences was performed using MUSCLE35. The predicted protein structures of the spike
400
protein RBD were estimated based on target-template alignment using ProMod3 on SWISS-
401
MODEL server (https://swissmodel.expasy.org/). The sequences of the spike RBD domains of
402
WHCV, Rs4874 and Rp3 were searched by BLAST against the primary amino acid sequence
403
contained in the SWISS-MODEL template library (SMTL, last update: 2020-01-09, last
404
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
21
included PDB release: 2020-01-03). Models were built based on the target-template alignment
405
using ProMod3. The global and per-residue model quality were assessed using the QMEAN
406
scoring function36. The PDB files of the predicted protein structures were displayed and
407
compared with the crystal structures of SARS-CoV spike RBD (PDB 2GHV)37 and the crystal
408
of structure of SARS-CoV spike RBD complexed with human ACE2 (PDB 2AJF)12.
409
410
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
22
References
411
26. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina
412
sequence data. Bioinformatics 30, 2114-2120 (2014).
413
27. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a
414
reference genome. Nat. Biotechnol. 29, 644–652 (2011).
415
28. Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq gene
416
expression estimation with read mapping uncertainty. Bioinformatics 26, 493-500 (2010).
417
29. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods.
418
9, 357–359 (2012).
419
30. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 15,
420
2978-2079 (2009).
421
31. Katoh, K. & Standley, D.M. MAFFT multiple sequence alignment software version 7:
422
improvements in performance and usability. Mol. Biol. Evol. 30, 772-780 (2013).
423
32. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood
424
phylogenies: assessing the performance of PhyML 3.0. Syst.Biol. 59, 307-321 (2010).
425
33. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum
426
likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28,
427
2731–2739 (2011).
428
34. Lole, K.S. et al. Full-length human immunodeficiency virus type 1 genomes from
429
subtype C-infected seroconverters in India, with evidence of intersubtype recombination.
430
J. Virol. 73, 152–160 (1999).
431
35. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high
432
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
23
throughput. Nucleic Acids Res. 32, 1792-1797 (2004).
433
36. Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and
434
complexes. Nucleic Acids Res. 46, W296-W303 (2018).
435
37. Hwang, W.C. et al. Structural basis of neutralization by a human anti-severe acute
436
respiratory syndrome spike protein antibody, 80R. J Biol Chem. 281, 34610-34616
437
(2006).
438
439
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
24
Supplementary legends
440
441
Supplementary Tables
442
Table S1. The top 50 abundant assembled contigs generated using the Megahit program.
443
Table S2. The top 80 abundant assembled contigs generated using the Trinity program.
444
Table S3. Amino acid identities of the selected predicted gene products between the novel
445
coronavirus (WHCV) and known betacoronaviruses.
446
Table S4. Cleavage products of the replicase polyproteins of WHCV.
447
Table S5. Predicted gene functions of WHCV ORFs.
448
Table S6. Coding of potential and putative transcription regulatory sequences of the genome
449
sequence of WHCV.
450
Table S7. Amino acid identities of the RBD sequence between SARS- and bat SARS-like
451
CoVs.
452
Table S8. PCR primers used in this study.
453
454
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
25
Supplementary Figures
455
Figure S1. Detection of other respiratory pathogens by qPCR.
456
Figure S2. Mapped read count plot showing the coverage depth per base of the WHCV
457
genome.
458
Figure S3. Detection of WHCV in clinical samples by RT-qPCR. (a) Specificity of the
459
WHCV primers used in RT-qPCR. Test samples comprised clinical samples that are positive
460
for at least one of the following viruses: Influenza A virus (09H1N1 and H3N2), Influenza B
461
virus, Human adenovirus, Respiratory syncytial virus, Rhinovirus, Parainfluenza virus type 1-
462
4, Human bocavirus, Human metapneumovirus, Coronavirus OC43, Coronavirus NL63,
463
Coronavirus 229E and Coronavirus HKU1. (b-c) Standard curve. (d) Amplification curve of
464
WHCV.
465
Figure S4. Maximum likelihood phylogenetic trees of the nucleotide sequences of the whole
466
genome, S and N genes of WHCV and related coronaviruses. Numbers (>70) above or below
467
branches indicate percentage bootstrap values. The trees were mid-point rooted for clarity
468
only. The scale bar represents the number of substitutions per site.
469
Figure S5. Maximum likelihood phylogenetic trees of the nucleotide sequences of the 3CL,
470
RdRp, Hel, ExoN, NendoU, and O-MT genes of WHCV and related coronaviruses. Numbers
471
(>70) above or below branches indicate percentage bootstrap values. The trees were mid-point
472
rooted for clarity only. The scale bar represents the number of substitutions per site.
473
Figure S6. Amino acid sequence comparison of the N-terminal domain (NTD) of spike
474
protein of WHCV, bovine coronavirus (BCoV), mouse hepatitis virus (MHV) and human
475
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
26
coronavirus (HCoV OC43 and HKU1) that can bind to sialic acid and the SARS-CoVs that
476
cannot. The key residues13 for sialic acid binding on BCoV, MHV, HCoV OC43 and HKU1
477
were marked with a brown box.
478
Figure S7. A sequence similarity plot of WHCV, SARS- and bat SARS-like CoVs revealing
479
putative recombination events.
480
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
27
Table 1. Clinical symptoms and patient data
481
Characteristic
Patient 1
Patient 2
Patient 3
Patient 4
Patient 5
Patient 6
Patient 7
Age (Year)
41
44
42
70
31
51
43
Sex
M
F
M
F
M
M
M
Date of illness onset
Dec 20,2019
Dec 22,2019
Dec 24,2019
Dec 24,2019
Dec 21,2019
Dec 16,2109
Dec 14,2019
Date of admission
Dec 26,2019
Dec 22,2019
Dec 28,2019
Dec 28,2019
Dec 28,2019
Dec 27,2019
Dec 14,2019
Fever
+
+
+
+
+
+
+
Body Temperature (oC)
38.4
37.3
39
37.9
38.7
37.2
38
Cough
+
+
+
+
+
+
+
Sputum Production
+
+
-
-
-
-
+
Dizzy
+
-
+
+
-
+
-
Weakness
+
-
+
-
-
-
-
Chest tightness
+
-
-
+
+
-
-
Dyspnea
+
-
-
+
+
+
-
Bacterial culture
-
-
streptococcus
pneumoniae
streptococcus
pneumoniae
-
-
streptococcus
pneumoniae
Glucocorticoid therapy
No
No
Yes
Yes
Yes
Yes
No
Antibiotic therapy
Cefoselis
Ceftazidime,
Levofloxacin
Cefminox
Cefminox,
moxifloxacin
Cefminox
No
No
Antiviral therapy
Oseltamivir
No
Oseltamivir,
ganciclovir
Oseltamivir,
ganciclovir
Oseltamivir,
ganciclovir
Oseltamivir,
ganciclovir
No
Oxygen therapy
mechanical
ventilation
No
No
Mask
No
No
No
482
483
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
28
484
Figure 1. Chest radiographs of patient 1. a. b. c. d. Chest computed tomographic scans of
485
Patient 1 were obtained on the day of admission (day 6 after the onset of illness). Bilateral
486
focal consolidation, lobar consolidation, and patchy consolidation were clearly observed,
487
especially in the lower lung. e. Chest radiograph of patient 1 was obtained on day 5 after
488
admission (day 11 after the onset of illness). Bilateral diffuse patchy and fuzzy shadow were
489
observed.
490
491
a b
c d
e
e
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
29
492
Figure 2. Genome organization of SARS and SARS-like CoVs including Tor2, CoVZC45
493
and WHCV determined here.
494
495
E
67a 7b
8a8b
9a 9b
2000 4000
6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000
SM N
1a
1b
3a
3b
E69a 9b
3a
7a
8N
S
1a
1b
M
3b
7b
bat-SL-CoVZC45
SARS-CoV Tor2
E6
7a7b
9a 9b
1a
1b
S
3a
M
10
N
3b 8
10
WH-Human 1
29802 bp
29751 bp
29903 bp
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
30
496
Figure 3. Maximum likelihood phylogenetic trees of nucleotide sequences of the ORF1a,
497
ORF1b, E and M genes of WHCV and related coronaviruses. Numbers (>70) above or below
498
branches indicate percentage bootstrap values for the associated nodes. The trees were mid-
499
point rooted for clarity only. The scale bar represents the number of substitutions per site.
500
501
ORF1a ORF1b
ChRCoV_HKU24
HCoV_OC43
HCoV_HKU1
MHV
Embecovirus
EriCoV
MERS-CoV
Ty-BatCoV-HKU4
Pi-BatCoV_HKU5
Merbecovirus
Ro-BatCoV_GCCDC1
BtRt-BetaCoV/GX2018
Ro-BatCoV_HKU9
Nobecovirus
Hibecovirus
Bat_Hp-BetaCoV
WH-Human 1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
Bat coronavirus BM48-31/BGR/2008
Bat SARS coronavirus HKU3
BtCoV/279/2005
Bat SARS CoV Rm1
BtCoV Rp/Shaanxi2011
BtRf-BetaCoV/JL2012
Bat coronavirus JTMC15
BtRf-BetaCoV/HeB2013
BtRf-BetaCoV/SX2013
BtCoV/273/2005
Bat SARS CoV Rf1
BtRl-BetaCoV/SC2018
BtCoV Cp/Yunnan2011
LYRa11
Bat SARS CoV Rp3
BtRs-BetaCoV/GX2013
Bat SARS-like CoV YNLF_34C
Civet SARS CoV SZ3
SARS-CoV TOR2
SARS CoV BJ01
SARS coronavirus WH20
Bat SARS CoV Rs672
BtRs-BetaCoV/YN2018A
BtRs-BetaCoV/YN2013
Bat SARS-like CoV RsSHC014
Bat SARS-like CoV Rf4092
Bat SARS-like CoV Rs4231
BtRs-BetaCoV/YN2018C
BtRs-BetaCoV/YN2018B
BtRs-BetaCoV/YN2018D
Bat SARS-like CoV Rs4874
Bat SARS-like CoV WIV16
Bat SARS-like CoV Rs3367
Bat SARS-like CoV WIV1
Sarbecovirus
100
100
100
100
100
100
100
100
92
90
100
98
100
100
97
99
96
99
100
100
86
99
100
100
100
98
100
0.5
ChRCoV_HKU24
HCoV_OC43
HCoV_HKU1
MHV
Embecovirus
EriCoV
MERS-CoV
Ty-BatCoV-HKU4
Pi-BatCoV_HKU5
Merbecovirus
Ro-BatCoV_GCCDC1
BtRt-BetaCoV/GX2018
Ro-BatCoV_HKU9
Nobecovirus
Hibecovirus
Bat_Hp-BetaCoV
WH-Human 1
Bat coronavirus BM48-31/BGR/2008
Bat SARS coronavirus HKU3
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
BtCoV/279/2005
Bat SARS CoV Rm1
BtCoV Rp/Shaanxi2011
BtRf-BetaCoV/JL2012
Bat coronavirus JTMC15
BtRf-BetaCoV/HeB2013
BtRf-BetaCoV/SX2013
Bat SARS CoV Rf1
BtCoV/273/2005
BtCoV Cp/Yunnan2011
LYRa11
BtRl-BetaCoV/SC2018
Bat SARS-like CoV YNLF_34C
Civet SARS CoV SZ3
SARS CoV BJ01
SARS-CoV TOR2
SARS coronavirus WH20
Bat SARS-like CoV Rs4231
Bat SARS-like CoV Rs4874
Bat SARS-like CoV WIV16
BtRs-BetaCoV/YN2013
BtRs-BetaCoV/YN2018C
BtRs-BetaCoV/YN2018B
BtRs-BetaCoV/YN2018D
Bat SARS-like CoV Rf4092
Bat SARS-like CoV RsSHC014
Bat SARS-like CoV Rs3367
Bat SARS-like CoV WIV1
BtRs-BetaCoV/GX2013
Bat SARS CoV Rp3
BtRs-BetaCoV/YN2018A
Bat SARS CoV Rs672
Sarbecovirus
100
100
100
100
100
100
100
99
100
100
100
100
100
99
96
98
96
100
97
99
94
89
96
97
100
99
100
96
100
100
98
100
0.5
M
HCoV_HKU1
MHV
ChRCoV_HKU24
HCoV_OC43
Embecovirus
EriCoV
Ty-BatCoV-HKU4
Pi-BatCoV_HKU5
MERS-CoV
Merbecovirus
Ro-BatCoV_HKU9
BtRt-BetaCoV/GX2018
Ro-BatCoV_GCCDC1
Nobecovirus
Hibecovirus
Bat_Hp-BetaCoV
WH-Human 1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
Bat coronavirus BM48-31/BGR/2008
BtCoV Cp/Yunnan2011
LYRa11
BtRs-BetaCoV/GX2013
BtCoV/279/2005
Bat SARS CoV Rm1
Bat SARS coronavirus HKU3
Bat SARS CoV Rp3
BtCoV Rp/Shaanxi2011
BtRs-BetaCoV/YN2013
BtRs-BetaCoV/YN2018C
Bat SARS-like CoV Rs4231
BtRl-BetaCoV/SC2018
Civet SARS CoV SZ3
SARS coronavirus WH20
SARS CoV BJ01
SARS-CoV TOR2
BtRs-BetaCoV/YN2018A
BtRf-BetaCoV/HeB2013
BtRf-BetaCoV/SX2013
BtRf-BetaCoV/JL2012
Bat coronavirus JTMC15
BtCoV/273/2005
Bat SARS CoV Rf1
Bat SARS-like CoV YNLF_34C
Bat SARS-like CoV Rf4092
Bat SARS-like CoV Rs4874
Bat SARS-like CoV Rs3367
Bat SARS-like CoV WIV1
Bat SARS-like CoV WIV16
BtRs-BetaCoV/YN2018D
BtRs-BetaCoV/YN2018B
Bat SARS-like CoV RsSHC014
Bat SARS CoV Rs672
88
98
99
97
81
99
74
93
89
97
75
99
99
82
77
87
99
88
79
99
98
91
100
80
0.5
Sarbecovirus
E
Ro-BatCoV_GCCDC1
BtRt-BetaCoV/GX2018
Ro-BatCoV_HKU9
Nobecovirus
HCoV_HKU1
MHV
ChRCoV_HKU24
HCoV_OC43
Embecovirus
EriCoV
MERS-CoV
Ty-BatCoV-HKU4
Pi-BatCoV_HKU5
Merbecovirus
Hibecovirus
Bat_Hp-BetaCoV
Bat coronavirus BM48-31/BGR/2008
WH-Human 1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
Bat SARS-like CoV RsSHC014
Bat SARS-like CoV WIV16
Bat SARS-like CoV WIV1
Bat SARS-like CoV Rs3367
Bat SARS-like CoV Rs4874
Bat SARS CoV Rs672
BtRf-BetaCoV/JL2012
Bat coronavirus JTMC15
BtCoV/273/2005
Bat SARS CoV Rf1
BtRf-BetaCoV/HeB2013
BtRf-BetaCoV/SX2013
BtRs-BetaCoV/YN2018A
LYRa11
Bat SARS CoV Rp3
Bat SARS coronavirus HKU3
BtCoV Rp/Shaanxi2011
BtCoV/279/2005
Bat SARS CoV Rm1
BtRs-BetaCoV/YN2013
BtCoV Cp/Yunnan2011
BtRs-BetaCoV/GX2013
Bat SARS-like CoV Rf4092
SARS-CoV TOR2
SARS CoV BJ01
Bat SARS-like CoV YNLF_34C
SARS coronavirus WH20
Civet SARS CoV SZ3
Bat SARS-like CoV Rs4231
BtRs-BetaCoV/YN2018B
BtRs-BetaCoV/YN2018D
BtRl-BetaCoV/SC2018
BtRs-BetaCoV/YN2018C
Sarbecovirus
99
98
70
91
96
93
89
87
75
92
82
84
83
74
89
91
71
89
77
0.5
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
31
502
(a)
(b)
(c) (d)
(e) (f)
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
32
Figure 4. Analysis of receptor-binding domain (RBD) of the spike (S) protein of WHCV
503
coronavirus. (a) Amino acid sequence alignment of SARS-like CoV RBD sequences. Three
504
bat SARS-like CoVs, which could efficiently utilize the human ACE2 as receptor, had an
505
RBD sequence of similar size to SARS-CoV, and WHCV contains a single Val 470 insertion.
506
The key amino acid residues involved in the interaction with human ACE2 are marked with a
507
brown box. In contrast, five bat SARS-like CoVs had amino acid deletions at two motifs
508
(amino acids 473-477 and 460-472) compared with those of SARS-CoV, and Rp3 has been
509
reported not to use ACE2.11 (b) The two motifs (aa 473-477 and aa 460-472) are shown in red
510
on the crystal structure of the SARS-CoV spike RBD complexed with receptor human ACE2
511
(PDB 2AJF). Human ACE2 is shown in blue and the SARS-CoV spike RBD is shown in
512
green. Important residues in human ACE2 that interact with SARS-CoV spike RBD are
513
marked. (c) Predicted protein structures of RBD of WHCV spike protein based on target-
514
template alignment using ProMod3 on the SWISS-MODEL server. The most reliable models
515
were selected based on GMQE and QMEAN Scores. Template: 2ghw.1.A, GMQE: 0.83;
516
QMEAN:-2.67. Motifs resembling amino acids 473-477 and 460-472 of the SARS-CoV spike
517
protein are shown in red. (d) Predicted structure of RBD of SARS-like CoV Rs4874.
518
Template: 2ghw.1.A, GMQE:0.99; QMEAN:-0.72. Motifs resembling amino acids 473-477
519
and 460-472 of the SARS-CoV spike protein are shown in red. (e) Predicted structure of the
520
RBD of SARS-like CoV Rp3. Template: 2ghw.1.A, GMQE:0.81, QMEAN:-1.50. (f) Crystal
521
structure of RBD of SARS-CoV spike protein (green) (PDB 2GHV). Motifs of amino acids
522
473-477 and 460-472 are shown in red.
523
524
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;
33
525
Figure 5. Possible recombination events in the S gene of sarbecoviruses. A sequence
526
similarity plot (upper panel) reveals two putative recombination break-points shown by black
527
dashed lines, with their locations indicated at the bottom. The plot shows S gene similarity
528
comparisons of the WHCV (query) against SARS-CoV Tor2 and bat SARS-like CoVs WIV1,
529
Rf1 and CoVZC45. Phylogenies of major parental region (1-1028 and 1653-3804) and minor
530
parental region (1029-1652) are shown below the similarity plot. Phylogenies were estimated
531
using a ML method and mid-point rooted for clarity only. Numbers above or below branches
532
indicate percentage bootstrap values.
533
Bat-SL-CoVZC45
Rf1
SARS CoV Tor2
WIV1
Window: 200 bp, Step: 10 bp, GapStrip: On, Kimura (2-parameter), T/t: 2.0
Position
3,8003,6003,4003,2003,0002,8002,6002,4002,2002,0001,8001,6001,4001,2001,0008006004002000
Similarity
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1029 1652
Query:WH-Human-1
WH-Human-1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
WIV1
RsSHC014
Rs4231
SARS coronavirus TOR2
SARS coronavirus SZ3
Rf1
Rm1
Rp3
96
100
100
99
100
98
99
0.2
Rm1
Rp3
Rf1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
WH-Human-1
WIV1
SARS coronavirus TOR2
SARS coronavirus SZ3
Rs4231
RsSHC014
97
98
99
99
81
100
84
0.1
WH-Human-1
Bat-SL-CoVZC45
Bat-SL-CoVZXC21
Rm1
Rp3
Rf1
SARS coronavirus TOR2
SARS coronavirus SZ3
RsSHC014
WIV1
Rs4231
97
91
92
100
100
0.05
Region 1-1028 Region 1029-1652 Region 1653-3804
.CC-BY-NC-ND 4.0 International licenseIt is made available under a perpetuity.
this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for. http://dx.doi.org/10.1101/2020.01.24.919183doi: bioRxiv preprint first posted online Jan. 25, 2020;