ArticlePDF Available

Emergence of SARS-CoV-2 through recombination and strong purifying selection

Authors:

Abstract and Figures

COVID-19 has become a global pandemic caused by the novel coronavirus SARS-CoV-2. Understanding the origins of SARS-CoV-2 is critical for deterring future zoonosis, discovering new drugs, and developing a vaccine. We show evidence of strong purifying selection around the receptor binding motif (RBM) in the spike and other genes among bat, pangolin, and human coronaviruses, suggesting similar evolutionary constraints in different host species. We also demonstrate that SARS-CoV-2’s entire RBM was introduced through recombination with coronaviruses from pangolins, possibly a critical step in the evolution of SARS-CoV-2’s ability to infect humans. Similar purifying selection in different host species, together with frequent recombination among coronaviruses, suggest a common evolutionary mechanism that could lead to new emerging human coronaviruses.
Content may be subject to copyright.
Cite as: X. Li et al., Sci. Adv
10.1126/sciadv.abb9153 (2020).
RESEARCH ARTICLES
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 1
Introduction
The severe respiratory disease COVID-19 was first noticed in
late December 2019 (1). It rapidly became epidemic in China,
devastating public health and economy. At the beginning of
May, COVID-19 had spread to ~150 countries and infected
over 3.3 million people (2). On March 11, 2020, the World
Health Organization (WHO) officially declared it a pandemic.
The etiological agent of COVID-19 (3), severe acute respir-
atory syndrome coronavirus 2 (SARS-CoV-2) (4), was identi-
fied as a new member of the genus Betacoronavirus, which
includes a diverse reservoir of coronaviruses (CoVs) isolated
from bats (57). While genetically distinct from the betacoro-
naviruses that cause SARS and MERS in humans (8, 9), SARS-
CoV-2 shares the highest level of genetic similarity (96.3%)
with CoV RaTG13, sampled from a bat in Yunnan in 2013 (8).
Recently, CoV sequences closely related to SARS-CoV-2 were
obtained from confiscated Malaya pangolins in two separate
studies (10, 11). These pangolin SARS-like CoVs (Pan_SL-CoV)
form two distinct clades corresponding to their locations of
origin: the first clade, Pan_SL-CoV_GD, sampled from
Guangdong (GD) province in China, is genetically more sim-
ilar to SARS-CoV-2 (91.2%) than the second clade, Pan_SL-
CoV_GX, sampled from Guangxi (GX) province (85.4%).
Understanding the origin of SARS-CoV-2 may help de-
velop strategies to deter future cross-species transmissions
and to establish appropriate animal models. Recombination
plays an important role in the evolution of coronaviruses (12,
13). Viral sequences nearly identical to SARS and MERS vi-
ruses were found in civets and domestic camels, respectively
(14, 15), demonstrating that they originated from zoonotic
transmissions with intermediate host species between the bat
reservoirs and humansa common pattern leading to CoV
zoonosis (57). However, non-human viruses nearly identical
to SARS-CoV-2 have not yet been found. In this paper we
demonstrate, through localized genomic analysis, a complex
pattern of evolutionary recombination and strong purifying
selection between CoVs from distinct host species and that
cross-species infections that likely originated SARS-CoV-2.
Results
Acquisition of receptor binding motif through recombi-
nation
Phylogenetic analysis of 43 complete genome sequences from
three clades (SARS-CoVs and bat_SL-CoVs in clade 3; SARS-
CoV-2, bat_SL-CoVs and pan_SL-CoVs in clade 2; and two di-
vergent bat_SL-CoVs in clade 1) within the Sarbecovirus
group (9) confirms that RaTG13 is overall the closest se-
quence to SARS-CoV-2 (Fig. S1). Pan_SL-CoV_GD are the next
closest viruses, followed by Pan_SL-CoV_GX. Among the bat-
CoV sequences in clade 2 (Fig. S1), ZXC21 and ZC45, sampled
from bats in 2005 in Zhoushan, Zhejiang, China, are the most
divergent, with the exception of the beginning of the ORF1a
gene (region 1, Fig. 1A). All other Bat_SL-CoV and SARS-CoV
sequences form a separate clade 3, while clade 1 comprises
Emergence of SARS-CoV-2 through recombination and
strong purifying selection
Xiaojun Li1,†, Elena E. Giorgi2,†, Manukumar Honnayakanahalli Marichannegowda1, Brian Foley2, Chuan
Xiao3, Xiang-Peng Kong4, Yue Chen1, S. Gnanakaran2, Bette Korber2,5 Feng Gao1,6,*
1Department of Medicine, Duke University Medical Center, Dur ham, NC 27710, USA. 2Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM
87544, USA. 3Department of Chemistry and Biochemistry, The University of Texas at El Paso, El Paso, TX 79968, USA. 4Department of Biochemistry and Molecular
Pharmacology, Grossman School of Medicine, New York University, New York, NY 10016 5New Mexico Consortium, Los Alamos, New Mexico 87545, USA 6National
Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Changchun 130012, China.
†These authors contributed equally.
*Corresponding author. Email: fgao@duke.edu
COVID-19 has become a global pandemic caused by the novel coronavirus SARS-CoV-2. Understanding the
origins of SARS-CoV-2 is critical for deterring future zoonosis, discovering new drugs, and developing a
vaccine. We show evidence of strong purifying selection around the receptor binding motif (RBM) in the
spike and other genes among bat, pangolin, and human coronaviruses, suggesting similar evolutionary
constraints in different host species. We also demonstrate that SARS-CoV-2’s entire RBM was introduced
through recombination with coronaviruses from pangolins, possibly a critical step in the evolution of SARS-
CoV-2’s ability to infect humans. Similar purifying selection in different host species, together with
frequent recombination among coronaviruses, suggest a common evolutionary mechanism that could lead
to new emerging human coronaviruses.
Science Advances Publish Ahead of Print, published on May 29, 2020 as doi:10.1126/sciadv.abb9153
Copyright 2020 by American Association for the Advancement of Science.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 2
BtKY72 and BM48-31, the two most divergent Bat_SL-CoV se-
quences in the Sarbecovirus group (Fig. S1). Recombination
in the first SARS-CoV-2 sequence (Wuhan-Hu-1) with other
divergent CoVs has been previously noted (3). Here, to better
understand the role of recombination in the origin of SARS-
CoV-2 among these genetically similar CoVs, we compare Wu-
han-Hu-1 to six representative Bat_SL-CoVs, one SARS-CoV,
and the two Pan_SL-CoV_GD sequences using SimPlot anal-
ysis (16). RaTG13 has the highest similarity across the genome
(8), with two notable exceptions where a switch occurs (Fig.
1A). In phylogenetic reconstructions, SARS-CoV-2 clusters
closer to ZXC21 and ZC45 than RaTG13 at the beginning of
the ORF1a gene (region 1, Fig. 1B), and, as previously reported
(10, 17), to a Pan_SL-CoV_GD in region 2 (Figs. 1C and S2),
which spans the receptor angiotensin-converting enzyme 2
(ACE2) binding site in the spike (S) glycoprotein gene. When
comparing Wuhan-Hu-1 to Pan_SL-CoV_GD and RaTG13, as
representative of distinct host-species branches in the evolu-
tionary history of SARS-CoV-2, using the recombination de-
tection tool RIP (18), we find significant recombination
breakpoints before and after the ACE2 receptor binding mor-
tif (RBM) (19, 20) (Fig. S2A). This suggests that SARS-CoV-2
carries a history of cross-species recombination between the
bat and the pangolin CoVs.
Pan_SL-CoV sequences are generally more similar to
SARS-CoV-2 than other CoV sequences, with the exception of
RaTG13 and ZXC21, but are more divergent from SARS-CoV-
2 at two regions in particular: the beginning of the ORF1b
gene and the highly divergent N terminus of the S gene (re-
gions 3 and 4, respectively, Fig. 1A). Within-region phyloge-
netic reconstructions show that Pan_SL-CoV sequences
become as divergent as BtKY72 and BM48-31 in region 3 (Fig.
1D), while less divergent in region 4, where Pan_SL-CoV_GD
clusters with ZXC21 and ZC45 (Fig. 1E). Together, these ob-
servations suggest ancestral cross-species recombination be-
tween pangolin and bat CoVs in the evolution of SARS-CoV-
2 at the ORF1a and S genes. Furthermore, the discordant phy-
logenetic clustering at various regions of the genome among
clade 2 CoVs also supports extensive recombination among
these viruses isolated from bats and pangolins.
The SARS-CoV-2 S glycoprotein mediates viral entry into
host cells and therefore represents a prime target for drug
and vaccine development (12, 19). While SARS-CoV-2 se-
quences share the greatest overall genetic similarity with
RaTG13, this is no longer the case in parts of the S gene. Spe-
cifically, amino acid sequences of RBM in the S1 subunit are
nearly identical to those in two Pan_SL-CoV_GD viruses, with
only one amino acid difference (Q498H)although the RBM
region has not been fully sequenced in one of Guangdong
pangolin virus (Pan_SL-CoV_GD/P2S) (Fig. 2A). Pangolin
CoVs from Guangxi are much more divergent. Phylogenetic
analysis based on the amino acid sequences of this region
shows three distinct clusters of SARS-CoV, SARS-CoV-2 and
bat-CoV only viruses, respectively (Fig. 2B). Interestingly,
while SARS-CoV and SARS-CoV-2 viruses use ACE2 for viral
entry, all CoVs in the third cluster have a 5-aa deletion and a
13-14-aa deletion in RBM (Fig. 2A) and do not infect human
target cells (5, 21, 22).
Although both SARS-CoV and SARS-CoV-2 use the human
ACE2 as their receptors (8, 23), they show a high level of ge-
netic divergence (Figs. 1 and S1). However, structures of the
S1 unit of the S protein from both viruses are highly similar
(20, 2426), with the exception of a loop that bends differ-
ently (Fig. 3A). The root-mean-square deviation (RMSD) be-
tween the two S proteins are 1.2Å over 174 Cα residues (24).
This suggests that conformational similarity of the binding
motif enables viral entry through molecular recognition of
ACE2. These structural studies also thoroughly analyzed the
contact residues between the S protein and human ACE2 (20,
24). Previously structural and mutagenesis studies have iden-
tified two hot-spots, K31 and K353, at the S/ACE2 interface in
SARS-CoV. In SARS-CoV-2, these two hot-spots were slightly
weakened due to different residues on its S protein but the
loop that takes different conformations from SARS-CoV pro-
vides additional interaction that strengthens the interaction
(26). Among 17 distinct amino acids between SARS-CoV-2 and
RaTG13 in the RBM region (Fig. 2A), five contact sites based
on the structural studies (24) are different, likely impacting
RaTG13’s binding to ACE2 (Fig. 3B and Table S1). The single
amino acid difference at position 498 (Q or H) between SARS-
CoV-2 and Pan_SL-CoV_GD is at the edge of the ACE2 con-
tact interface; neither Q or H at this position form hydrogen
bonds with ACE2 residues (Fig. 3C). Thus, a functional RBM
nearly identical to the one in SARS-CoV-2 is naturally present
in Pan_SL-CoV_GD viruses. The very distinctive RaTG13
RBM suggests that this virus will not likely infect human cells
efficiently. Indeed, a recent study showed that the RaTG13
pseudovirus is much less efficient than SARS-CoV-2 pseudo-
viruses in using ACE2 to infect cells, and this is most likely
due to the L486F and Y493Q substitutions, which result in
lower ACE2 binding in RaTG13 (26). Therefore, it is likely that
the acquisition of a complete functional RBM by a RaTG13-
like CoV through a recombination event with a Pan_SL-
CoV_GD-like virus enabled it to more efficiently use ACE2 for
human infection.
Three small insertions are identical in SARS-CoV-2 and
RaTG13 but not found in other CoVs in the Sarbecovirus
group (27, 28). The RaTG13 sequence was sampled in 2013,
years before SARS-CoV-2 was first identified. It is unlikely
that both SARS-CoV-2 and RaTG13 independently acquired
identical insertions at three different locations in the S gene.
Thus, it is plausible that a RaTG13-like virus served as a pro-
genitor to generate SARS-CoV-2 by gaining a complete hu-
man ACE2 binding RBM from Pan_SL-CoV_GD-like viruses
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 3
through recombination. Genetic divergence at the nucleic
acid level between Wuhan-Hu-1 and Pan_SL-CoV_GD viruses
is significantly reduced from 13.9% (Fig. 1E) to 1.4% at the
amino acid level (Fig. 2B) in the RBM region, indicating re-
combination between RaTG13-like CoVs and Pan_SL-
CoV_GD-like CoVs. Furthermore, SARS-CoV-2 has a unique
furin cleavage site insertion (PRRA) not found in any other
CoVs in the Sarbecovirus group (Fig. S3) (27), although simi-
lar motifs are also found in MERS and more divergent bat
CoVs (29). This PRRA motif makes the S1/S2 cleavage in
SARS-CoV-2 much more efficient than in SARS-CoV and may
expand its tropism and/or enhance its transmissibility (20).
A recent study of bat CoVs in Yunnan, China, identified a
three-amino acid insertion (PAA) at the same site (30). Alt-
hough it is not known if this PAA motif can function like the
PRRA motif, the presence of a similar insertion at the same
site indicates that such insertion may already be present in
the wild bat CoVs. The more efficient cleavage of S1 and S2
subunits of the spike glycoprotein (29) and efficient binding
to ACE2 by SARS-CoV-2 (20, 25) may have allowed SARS-CoV-
2 to jump to humans, leading to the rapid spread of SARS-
CoV-2 in China and the rest of the world.
Strong purifying selection among SARS-CoV-2 and
closely related viruses
Recombination from Pan_SL-CoV_GD at the RBM and at the
unique furin cleavage site insertion prompted us to examine
the SARS-CoV-2 sequences within these regions. Amino acid
sequences from SARS-CoV-2, RaTG13, and all Pan_SL-CoV vi-
ruses (group A) are identical or nearly identical in the region
before and after the RBM and at the region after the furin
cleavage site (S2 subunit), while all other CoVs (group B) are
very distinctive (Fig. 4A and S4). The average of all pairwise
dN/dS ratios, defined as ω, among SARS-CoV-2, RaTG13, and
Pan_SL-CoV viruses at the S2 subunit is ω = 0.013, compared
to the much higher values ω =0.053 in the S1 region preceding
the furin cleavage site, and ω = 0.042 at the S2 subunit for all
other CoVs (Fig. 4B). The much lower ω value at the S2 subu-
nit among the SARS-CoV-2, RaTG13, and Pan_SL-CoV viruses
indicates that this region is under strong purifying selection
within these sequences. A plot of synonymous and nonsynon-
ymous substitutions relative to Wuhan-Hu-1 highlights the
regional differences across the region before and after the
furin cleavage site (Fig. 4A): the S2 subunit is highly con-
served among the SARS-CoV-2, RaTG13, and Pan_SL-CoV vi-
ruses (group A), while far more nonsynonymous mutations
are observed in the rest of the CoV sequences (group B). The
shift in selective pressure at the S1/S2 cleaveage site among
these related viruses versus other CoVs begins near codon
368 (Fig. 4B): the two graphs show the cumulative plots of
the average behavior of each codon for all pairwise compari-
sons in the input data, for synonymous mutations, non-
synonymous mutations and indels of group A sequences and
group B sequences. The non-synonymous plot shows a
marked change in slope (vertical step) in the group A se-
quences at codon 368, but not in group B sequences. Simi-
larly, when looking at all the dS/dN ratios (ω) for each group
A sequence compared to the Wuhan-Hu-1 sequence, we see
that these ratios are much lower in the 5-end of the region,
before codon 368 (nucleic acid position 1104), compared to
the 3-end, and no such difference is observed in the group
B sequences (Fig. 4C).
This strong purifying selection observed in the S2 subunit
of the S gene is not surprising given its role in cell entry by
fusing the viral and host cell membranes (5, 19). Following
the binding of RBD to the ACE2 receptor, heptad repeat re-
gions 1 (HR1) and 2 (HR2) within the S2 subunit rearrange to
form the fusion core, bringing together the viral and cell
membranes for fusion and infection (Fig. S5A). Due to the
mechanistic constraints for this assembly for fusion, the pro-
tein segments that take part in this assembly are well pre-
served (20, 31). Furthermore, some regions of the S2 subunit
are covered by S1 in the trimer conformation of the spike pro-
tein (Fig. S5B). Based on the currently available, but incom-
plete, cryo-EM structure of the spike trimer, we estimate that
60%-65% of S2 amino acids are buried. This adds further
structural constraints on changing amino acids in S2.
While hundreds of new SARS-CoV-2 sequences are added
to the GISAID repertoire every day (32), we note that the
RBM region currently remains highly conserved. No amino
acid within 6 Angstroms of the ACE2 binding site has re-
peated variations, with the exception of G476S, a very rare
mutation found in 8 sequences from a local cluster in Wash-
ington state, out of 6,400 total sequences from GISAID (April
13, 2020). In addition, we observe similar patterns of purify-
ing selection pressure in other parts of the genome, including
the E and M genes, as well as the partial ORF1a and ORF1b
genes (Fig. S6 and S7). Interestingly, the viruses affected by
purifying selection pressure varies depending on which genes
are analyzed. SARS-CoV-2, RaTG13, all Pan_SL-CoV and the
two bat CoVs (ZXC21 and ZC45) are under the similar purify-
ing selection in both the E and M genes (Figs. 5A and S6). In
the S2 subunit, similar purifying selection are only observed
for SARS-CoV-2, RaTG13, and all Pan_SL-CoV (Fig. 5B). A few
viruses including only SARS-CoV-2, RaTG13, and pangolin
CoVs from Guangdong are under similar purifying selection
in the partial regions of ORF1a and ORF1b (Figs. 5C and S7).
Strong purifying selection pressure on SARS-CoV-2, RaTG13
and Pan_SL-CoV_GD viruses, as indicated by consistently
low ω values, suggests that these complete and partial genes
are under similar functional/structural constraints among
the different host species. In two extreme cases, amino acid
sequences of the E gene and the 3 end of ORF1a are identical
among the compared CoV sequences, although genetic
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 4
distances are quite large among these viruses at the nucleic
acid level (Fig. 5A and 5C). Such evolutionary constraints in
many parts of the viral genome, especially at functional do-
mains in the S gene which plays an important role in cross-
species transmission (5, 12), coupled with frequent recombi-
nation, may facilitate cross-species transmissions between
RaTG13-like bat and/or Pan_SL-CoV_GD-like viruses.
Frequent recombination between SARS-CoVs and
bat_SL-CoVs
Previous studies using more limited sequence sets found that
SARS-CoVs originated through multiple recombination
events between different bat-CoVs (10, 12, 21, 33, 34). Our
phylogenetic analyses of individual genes confirm this and
show that SARS-CoV sequences tend to cluster with
YN2018B, Rs9401, Rs7327, WIV16 and Rs4231 (group A) for
some genes and Rf4092, YN2013, Anlong-112 and GX2013
(group B) for others (Fig. S8). SimPlot analysis using both
groups of bat_SL-CoVs and the closely related bat CoV YNLF-
34C (34) shows that SARS-CoV GZ02 shifts in similarity
among different bat SL-CoVs at various regions of the ge-
nome (Fig. 6A). In particular, phylogenetic reconstruction of
the beginning of ORF1a (region 1) confirms that SARS-CoVs
cluster with YNLF-34C (34), and this cluster is distinctive
comparing to all other CoVs (Fig. 6B). YNLF-34C is more di-
vergent from SARS-CoV than other bat-CoV viruses before
and after this region, confirming the previously reported
complex recombinant nature of YNLF-34C (34) (Fig. 5A). At
the end of the S gene (region 2), SARS-CoVs cluster with
group A CoVs, forming a highly divergent clade (Fig. 6C). In
region 3 (ORF8), SARS-CoVs and group B CoVs, together with
YNLF-34C, form a very divergent and distinctive cluster (Fig.
6D). To further explore the recombinant nature of SARS-
CoVs, we compared GZ02 to representative bat CoV se-
quences using the RIP recombination detection tool (18). We
identified four significant breakpoints (at 99% confidence)
between the two parental lineages (Fig. S9A), further sup-
ported by phylogenetic analysis (Fig. S9B-S9D). In addition,
the two aforementioned groups of bat CoVs (shown in light
brown and light blue in the trees) show similar cluster
changes across the five recombinant regions, suggesting mul-
tiple events of historic recombination among bat SL-CoVs.
These results demonstrate that SARS-CoV shares a recombi-
nant history with at least three different groups of bat-CoVs
and confirms the major role of recombination in the evolu-
tion of these viruses.
Of the bat SL-CoVs that contributed to the recombinant
origin of SARS-CoV, only group A viruses bind to ACE2.
Group B bat SL-CoVs do not infect human cells (5, 21, 22) and
have two deletions in the RBM (Figs. 1E and 2A). The short
deletion between residues 445 and 449, and in particular the
loss of Y449, which forms three hydrogen bonds with ACE2,
will significantly affect the overall structure of the RBM (Figs.
3C and 3D). The region encompassing the large deletion be-
tween residues 473 and 486 contains the loop structure that
accounts for the major differences between the S protein of
SARS-CoV and SARS-CoV-2 (Fig. 3A) and strengthen the in-
teraction of the latter to ACE2 (26). This deletion causes the
loss of contact site F486 and affects the conserved residue
F498’s hydrophobic interaction with residue M82 on ACE2
(Fig. 3D). These two deletions will render RBM in those CoVs
incapable of binding human ACE2. Therefore, recombination
may play a role in enabling cross-species transmission in
SARS-CoVs through the acquisition of an S gene type that can
efficiently bind to the human ACE2 receptor.
ORF8 is one of the highly variable genes in coronaviruses
and its function has not yet been well elucidated (5, 12, 35).
Recombination breakpoints within this region show that re-
combination occurred at the beginning and the end of ORF8
(Fig. S10), where nucleic acid sequences are nearly identical
among both SARS-CoVs and group B bat CoVs. Moreover, all
compared viruses form three highly distinct clusters (Fig.
6D), suggesting that the ORF8 gene may be biologically con-
strained and evolves through modular recombination. The
third recombination region at the beginning of ORF1a is near
where SARS-CoV-2 also recombined with other bat CoVs (re-
gion 1 in Fig. 1A). This region is highly variable (5, 12) and
recombination within this part of the genome was also found
in other CoVs, suggesting that it may be a recombination
hotspot and may factor into cross-species transmission.
Discussion
There are three important aspects to betacoronavirus evolu-
tion that should be carefully considered in phylogenetic re-
constructions among more distant coronaviruses. First, there
is extensive recombination among all of these viruses (10, 12,
21, 33, 34) (Figs. 1 and 5), making standard phylogenetic re-
constructions based on full genomes problematic, as different
regions of the genome have distinct ancestral relationships.
Second, between more distant sequences, synonymous sub-
stitutions are often fully saturated, which can confound anal-
yses of selective pressure and adds noise to phylogenetic
analysis. Finally, there are different selective pressures at
work in different lineages, which is worth consideration
when interpreting trees.
The currently sampled pangolin CoVs are too divergent
from SARS-CoV-2 to be its recent progenitors, but it is note-
worthy that these sequences contain an RBM that can most
likely bind to human ACE2. While RaTG13 is the most closely
related CoV sequence to SARS-CoV-2, it has a distinctive
RBM. In addition, a recent study showed that the RaTG13
pseudovirus is much less efficient than the SARS-CoV-2 pseu-
dovirus in using ACE2 to infect cells (26). SARS-CoV-2 has a
nearly identical RBM to the one found in the pangolin CoVs
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 5
from Guangdong. Thus, it is plausible that RaTG13-like bat-
CoV viruses may have obtained the RBM sequence binding to
human ACE2 through recombination with Pan_SL-CoV_GD-
like viruses. We hypothesize that this, and/or other ancestral
recombination events between viruses infecting bats and
pangolins, may have played a key role in the evolution of the
strain that lead to the introduction of SARS-CoV-2 into hu-
mans. It is also possible that other not yet identified hosts
infected with CoVs that can jump to human populations
through cross-species transmission if they can successfully
infect human cells through ACE2 or other receptors. Interest-
ingly, an analysis of 6,400 SARS-CoV-2 sequences from
GISAID (Global Initiative on Sharing All Influenza Data) (36,
37) identifies only one very rare mutation, G476S that is di-
rectly in a ACE2 contact residue. It was found in a local clus-
ter of sequences from Washington state. However, it is at the
periphery of the receptor contact surface, and so may not sig-
nificantly impact the virus’s receptor binding affinity.
All three human CoVs (SARS, MERS and SARS-2) are the
result of recombination among CoVs. Recombination in all
three viruses involved the S gene, likely a precondition to zo-
onosis that enabled efficient binding to human receptors (5,
12). Extensive recombination among bat coronaviruses and
strong purifying selection pressure among viruses from hu-
mans, bats and pangolins may allow such closely related vi-
ruses to readily jump between species and adapt to the new
hosts. Many bat CoVs have been found able to bind to human
ACE2 and replicate in human cells (10, 21, 22, 3840). Sero-
logical evidence has revealed that additional otherwise unde-
tected spillovers have occurred in people in China living in
proximity to wild bat populations (41). Continuous surveil-
lance of coronaviruses in their natural hosts and in humans
will be key to rapid control of new coronavirus outbreaks.
While the SARS and MERS originating strains have been
found in civets and dromedary camels respectively (14, 15), so
far, efforts to identify a similarly close link in the original
pathway of SARS-CoV-2 into humans have failed. If the new
SARS-CoV-2 strain did not cause widespread infections in its
natural or intermediate hosts, such a strain may never be
identified. The close proximity of animals of different species
in a wet market setting may increase the potential for cross-
species spillover infections, by enabling recombination be-
tween more distant coronaviruses and the emergence of re-
combinants with novel phenotypes. While the direct reservoir
of SARS-CoV-2 is still being sought, one thing is clear: reduc-
ing or eliminating direct human contact with wild animals is
critical to preventing new coronavirus zoonosis in the future.
Materials and Methods
Sequences analysis
All 43 CoV complete genome sequences were obtained from
GenBank and GISAID (Global Initiative on Sharing All
Influenza Data) (36, 37), and were selected to be representa-
tive of the diversity (Tables S2 and S3). Pan_SL-
CoV_GD/P1La sequence was generated by combining
Pan_SL-CoV_GD/P1L (10) with some additional sequences
from the NCBI BioProject database PRJNA5732983 (11, 42) to
have a maximal coverage of the complete genome sequence
for analysis. A new CoV sequence from pangolin
(EPI_ISL_410721) (43) was not included here as it became
available after we had already completed the analyses in this
study. Once it became available, we observed that it was as
close to SARS-CoV-2 as the sequences we had already used
and hence did not change the interpretation of our results.
Whole genome sequences were first aligned using Clustal X2
(44). The alignments for all coding regions were manually op-
timized based on the amino acid sequence alignment using
SeaView 5.0.1.
Recombination Analyses
SimPlot 3.5.15 (16) was used to determine the percent identity
of the query sequence to reference sequences. Potential re-
combinant regions among analyzed sequences were identi-
fied by sliding a 400bp-window at a 50bp-step across the
alignment using the Kimura 2-parameter model. Phyloge-
netic trees were constructed by the maximum likelihood
method using the GTR model (45), and their reliability was
estimated from 1,000 bootstrap replicates. The positions of
analyzed sequence regions were based on those in the refer-
ence SARS-CoV-2 Wuhan-Hu-1 (MN908947). Recombination
regions and breakpoints were also analyzed using the LANL
database (46) tool RIP (18) with a 400bp window. Regions
between breakpoints were identified using a 99% confidence
threshold.
Selection Analyses
Cumulative plots of the average behavior of each codon for
all pairwise comparisons in the input data, for insertions and
deletions (indels), synonymous (syn), and nonsynonymous
(nonsyn) mutations and values of the ratios of the rate of syn-
onymous nucleotide substitutions per synonymous site and
nonsynonymous substitutions per nonsynonymous site
(dN/dS, or ω) were obtained using the LANL database tool
SNAP (47). In order to avoid counting instances where syn-
onymous mutations were saturated, averages of all pairwise
dN/dS ratios were calculated excluding pairs that yielded dS
values greater than 1.
Structure modeling of receptor binding
To investigate the single mutation Q498H in RBM between
SARS-CoV-2 and Pan_SL-CoV_GD, Q498 in the crystal struc-
ture of S/ACE2 complex was mutated to H498 using Chimera
(48). Local energy minimization (only H498 was allowed to
move) was computed using Chimera’s built-in functions. To
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 6
investigate the impact of the deletion between residue 473 to
486 to the binding interface between SARS-CoV-2 and human
ACE2, a homology model with the deletion was generated us-
ing I-TASSER (49). The top five best models provided by the
server have Confidence Score (C-score) of 0.86, -2.33, -4.01, -
4.17, and -4.49. The C-score was used to estimate the quality
of the models, which should be between -5.0 to 2; the higher
the value, the higher the confidence in the model (49). Based
on the C-score, model 1 was used in Fig. 3D. The interaction
of the RBD of RaTG13 and ACE2 was modeled on PDB 6M0J,
a structure of RBD of SARS-CoV-2 in complex with human
ACE2 (24) using ICM software package (50), and the muta-
tional differences of the Gibbs free energy (Table S1) were cal-
culated with the built-in algorithm.
REFERENCES AND NOTES
1. N. Zhu, D. Zhang, W. Wang, X. Li, B. Yang, J. Song, X. Zhao, B. Huang, W. Shi, R. Lu,
P. Niu, F. Zhan, X. Ma, D. Wang, W. Xu, G. Wu, G. F. Gao, W. Tan; China Novel
Coronavirus Investigating and Research Team, A Novel Coronavirus from Patients
with Pneumonia in China, 2019. N. Engl. J. Med. 382, 727733 (2020).
doi:10.1056/NEJMoa2001017 Medline
2. World Health Organization, Novel Coronavirus (COVID-19) Situation.
https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125c
d, (2020).
3. F. Wu, S. Zhao, B. Yu, Y.-M. Chen, W. Wang, Z.-G. Song, Y. Hu, Z.-W. Tao, J.-H. Tian,
Y.-Y. Pei, M.-L. Yuan, Y.-L. Zhang, F.-H. Dai, Y. Liu, Q.-M. Wang, J.-J. Zheng, L. Xu,
E. C. Holmes, Y.-Z. Zhang, A new coronavirus associated with human respiratory
disease in China. Nature 579, 265269 (2020). doi:10.1038/s41586-020-2008-3
Medline
4. A. E. Gorbalenya, A. E. Gorbalenya, S. C. Baker, R. S. Baric, R. J. de Groot, C. Drosten,
A. A. Gulyaeva, B. L. Haagmans, C. Lauber, A. M. Leontovich, B. W. Neuman, D.
Penzar, S. Perlman, L. L. M. Poon, D. V. Samborskiy, I. A. Sidorov, I. Sola and J.
Ziebuhr; Coronaviridae Study Group of the International Committee on Taxonomy
of Viruses, The species Severe acute respiratory syndrome-related coronavirus:
Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 5, 536544
(2020). doi:10.1038/s41564-020-0695-z Medline
5. J. Cui, F. Li, Z. L. Shi, Origin and evolution of pathogenic coronaviruses. Nat. Rev.
Microbiol. 17, 181192 (2019). doi:10.1038/s41579-018-0118-9 Medline
6. X.-D. Lin, W. Wang, Z.-Y. Hao, Z.-X. Wang, W.-P. Guo, X.-Q. Guan, M.-R. Wang, H.-W.
Wang, R.-H. Zhou, M.-H. Li, G.-P. Tang, J. Wu, E. C. Holmes, Y.-Z. Zhang, Extensive
diversity of coronaviruses in bats from China. Virology 507, 110 (2017).
doi:10.1016/j.virol.2017.03.019 Medline
7. A. Banerjee, K. Kulcsar, V. Misra, M. Frieman, K. Mossman, Bats and Coronaviruses.
Viruses 11, 41 (2019). doi:10.3390/v11010041 Medline
8. P. Zhou, X.-L. Yang, X.-G. Wang, B. Hu, L. Zhang, W. Zhang, H.-R. Si, Y. Zhu, B. Li, C.-
L. Huang, H.-D. Chen, J. Chen, Y. Luo, H. Guo, R.-D. Jiang, M.-Q. Liu, Y. Chen, X.-R.
Shen, X. Wang, X.-S. Zheng, K. Zhao, Q.-J. Chen, F. Deng, L.-L. Liu, B. Yan, F.-X.
Zhan, Y.-Y. Wang, G.-F. Xiao, Z.-L. Shi, A pneumonia outbreak associated with a
new coronavirus of probable bat origin. Nature 579, 270273 (2020).
doi:10.1038/s41586-020-2012-7 Medline
9. R. Lu, X. Zhao, J. Li, P. Niu, B. Yang, H. Wu, W. Wang, H. Song, B. Huang, N. Zhu, Y.
Bi, X. Ma, F. Zhan, L. Wang, T. Hu, H. Zhou, Z. Hu, W. Zhou, L. Zhao, J. Chen, Y.
Meng, J. Wang, Y. Lin, J. Yuan, Z. Xie, J. Ma, W. J. Liu, D. Wang, W. Xu, E. C. Holmes,
G. F. Gao, G. Wu, W. Chen, W. Shi, W. Tan, Genomic characterisation and
epidemiology of 2019 novel coronavirus: Implications for virus origins and
receptor binding. Lancet 395, 565574 (2020). doi:10.1016/S0140-
6736(20)30251-8 Medline
10. T. T.-Y. Lam, M. H.-H. Shum, H.-C. Zhu, Y.-G. Tong, X.-B. Ni, Y.-S. Liao, W. Wei, W.
Y.-M. Cheung, W.-J. Li, L.-F. Li, G. M. Leung, E. C. Holmes, Y.-L. Hu, Y. Guan,
Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins. Nature
(2020). doi:10.1038/s41586-020-2169-0 Medline
11. P. Liu, W. Chen, J.-P. Chen, Viral Metagenomics Revealed Sendai Virus and
Coronavirus Infection of Malayan Pangolins (Manis javanica). Viruses 11, 979
(2019). doi:10.3390/v11110979 Medline
12. R. L. Graham, R. S. Baric, Recombination, reservoirs, and the modular spike:
Mechanisms of coronavirus cross-species transmission. J. Virol. 84, 31343146
(2010). doi:10.1128/JVI.01394-09 Medline
13. S. U. Rehman, L. Shafique, A. Ihsan, Q. Liu, Evolutionary Trajectory for the
Emergence of Novel Coronavirus SARS-CoV-2. Pathogens 9, 240 (2020).
doi:10.3390/pathogens9030240 Medline
14. Y. Guan, B. J. Zheng, Y. Q. He, X. L. Liu, Z. X. Zhuang, C. L. Cheung, S. W. Luo, P. H.
Li, L. J. Zhang, Y. J. Guan, K. M. Butt, K. L. Wong, K. W. Chan, W. Lim, K. F.
Shortridge, K. Y. Yuen, J. S. Peiris, L. L. Poon, Isolation and characterization of
viruses related to the SARS coronavirus from animals in southern China. Science
302, 276278 (2003). doi:10.1126/science.1087139 Medline
15. E. I. Azhar, S. A. El-Kafrawy, S. A. Farraj, A. M. Hassan, M. S. Al-Saeed, A. M.
Hashem, T. A. Madani, Evidence for camel-to-human transmission of MERS
coronavirus. N. Engl. J. Med. 370, 24992505 (2014).
doi:10.1056/NEJMoa1401505 Medline
16. K. S. Lole, R. C. Bollinger, R. S. Paranjape, D. Gadkari, S. S. Kulkarni, N. G. Novak,
R. Ingersoll, H. W. Sheppard, S. C. Ray, Full-length human immunodeficiency virus
type 1 genomes from subtype C-infected seroconverters in India, with evidence of
intersubtype recombination. J. Virol. 73, 152160 (1999).
doi:10.1128/JVI.73.1.152-160.1999 Medline
17. M. C. Wong, S. J. Javornik Cregeen, N. J. Ajami, J. F. Petrosino, Evidence of
recombination in coronaviruses implicating pangolin origins of nCoV-2019.
bioRxiv, 2020.2002.2007.939207 (2020).
18. A. C. Siepel, A. L. Halpern, C. Macken, B. T. Korber, A computer program designed
to screen rapidly for HIV type 1 intersubtype recombinant sequences. AIDS Res.
Hum. Retroviruses 11, 1413 1416 (1995). doi:10.1089/aid.1995.11.1413 Medline
19. F. Li, Structure, Function, and Evolution of Coronavirus Spike Proteins. Annu. Rev.
Virol. 3, 237261 (2016). doi:10.1146/annurev-virology-110615-042301 Medline
20. A. C. Walls, Y.-J. Park, M. A. Tortorici, A. Wall, A. T. McGuire, D. Veesler, Structure,
Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 180, 281
292 (2020). doi:10.1016/j.cell.2020.02.058 Medline
21. B. Hu, L.-P. Zeng, X.-L. Yang, X.-Y. Ge, W. Zhang, B. Li, J.-Z. Xie, X.-R. Shen, Y.-Z.
Zhang, N. Wang, D.-S. Luo, X.-S. Zheng, M.-N. Wang, P. Daszak, L.-F. Wang, J. Cui,
Z.-L. Shi, Discovery of a rich gene pool of bat SARS-related coronaviruses provides
new insights into the origin of SARS coronavirus. PLOS Pathog. 13, e1006698
(2017). doi:10.1371/journal.ppat.1006698 Medline
22. M. Letko, A. Marzi, V. Munster, Functional assessment of cell entry and receptor
usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat. Microbiol. 5,
562569 (2020). doi:10.1038/s41564-020-0688-y Medline
23. W. Li, M. J. Moore, N. Vasilieva, J. Sui, S. K. Wong, M. A. Berne, M. Somasundaran,
J. L. Sullivan, K. Luzuriaga, T. C. Greenough, H. Choe, M. Farzan, Angiotensin-
converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature
426, 450454 (2003). doi:10.1038/nature02145 Medline
24. J. Lan, J. Ge, J. Yu, S. Shan, H. Zhou, S. Fan, Q. Zhang, X. Shi, Q. Wang, L. Zhang,
X. Wang, Structure of the SARS-CoV-2 spike receptor-binding domain bound to
the ACE2 receptor. Nature 581, 215220 (2020). doi:10.1038/s41586-020-2180-
5 Medline
25. D. Wrapp, N. Wang, K. S. Corbett, J. A. Goldsmith, C.-L. Hsieh, O. Abiona, B. S.
Graham, J. S. McLellan, Cryo-EM structure of the 2019-nCoV spike in the
prefusion conformation. Science 367, 12601263 (2020).
doi:10.1126/science.abb2507 Medline
26. J. Shang, G. Ye, K. Shi, Y. Wan, C. Luo, H. Aihara, Q. Geng, A. Auerbach, F. Li,
Structural basis of receptor recognition by SARS-CoV-2. Nature 581, 221224
(2020). doi:10.1038/s41586-020-2179-y Medline
27. C. Xiao, X. Li, S. Liu, Y. Sang, S.-J. Gao, F. Gao, HIV-1 did not contribute to the 2019-
nCoV genome. Emerg. Microbes Infect. 9, 378381 (2020).
doi:10.1080/22221751.2020.1727299 Medline
28. K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, R. F. Garry, The proximal
origin of SARS-CoV-2. Nat. Med. 26, 450452 (2020). doi:10.1038/s41591-020-
0820-9 Medline
29. B. Coutard, C. Valle, X. de Lamballerie, B. Canard, N. G. Seidah, E. Decroly, The
spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 7
cleavage site absent in CoV of the same clade. Antiviral Res. 176, 104742 (2020).
doi:10.1016/j.antiviral.2020.104742 Medline
30. H. Zhou, X. Chen, T. Hu, J. Li, H. Song, Y. Liu, P. Wang, D. Liu, J. Yang, E.
C. Holmes, A. C. Hughes, Y. Bi, W. Shi, A novel bat coronavirus reveals natural
insertions at the S1/S2 cleavage site of the Spike protein and a possible
recombinant origin of HCoV-19. bioRxiv, 2020.2003.2002.974139 (2020).
31. S. Xia, M. Liu, C. Wang, W. Xu, Q. Lan, S. Feng, F. Qi, L. Bao, L. Du, S. Liu, C. Qin, F.
Sun, Z. Shi, Y. Zhu, S. Jiang, L. Lu, Inhibition of SARS-CoV-2 (previously 2019-
nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its
spike protein that harbors a high capacity to mediate membrane fusion. Cell Res.
30, 343355 (2020). doi:10.1038/s41422-020-0305-x Medline
32. A. Brufsky, Distinct Viral Clades of SARS-CoV-2: Implications for Modeling of Viral
Spread. J. Med. Virol. jmv.25902 (2020). doi:10.1002/jmv.25902 Medline
33. C. C. Hon, T.-Y. Lam, Z.-L. Shi, A. J. Drummond, C.-W. Yip, F. Zeng, P.-Y. Lam, F.
C.-C. Leung, Evidence of the recombinant origin of a bat severe acute respiratory
syndrome (SARS)-like coronavirus and its implications on the direct ancestor of
SARS coronavirus. J. Virol. 82, 18191826 (2008). doi:10.1128/JVI.01926-07
Medline
34. S. K. Lau, Y. Feng, H. Chen, H. K. H. Luk, W.-H. Yang, K. S. M. Li, Y.-Z. Zhang, Y.
Huang, Z.-Z. Song, W.-N. Chow, R. Y. Y. Fan, S. S. Ahmed, H. C. Yeung, C. S. F. Lam,
J.-P. Cai, S. S. Y. Wong, J. F. W. Chan, K.-Y. Yuen, H.-L. Zhang, P. C. Y. Woo, Severe
Acute Respiratory Syndrome (SARS) Coronavirus ORF8 Protein Is Acquired from
SARS-Related Coronavirus from Greater Horseshoe Bats through Recombination.
J. Virol. 89, 1053210547 (2015). doi:10.1128/JVI.01048-15 Medline
35. J. F. Chan, K.-H. Kok, Z. Zhu, H. Chu, K. K.-W. To, S. Yuan, K.-Y. Yuen, Genomic
characterization of the 2019 novel human-pathogenic coronavirus isolated from
a patient with atypical pneumonia after visiting Wuhan. Emerg. Microbes Infect. 9,
221236 (2020). doi:10.1080/22221751.2020.1719902 Medline
36. Y. Shu, J. McCauley, GISAID: Global initiative on sharing all influenza data - from
vision to reality. Euro Surveill. 22, 30494 (2017). doi:10.2807/1560-
7917.ES.2017.22.13.30494 Medline
37. S. Elbe, G. Buckland-Merrett, Data, disease and diplomacy: GISAID’s innovative
contribution to global health. Global Challenges 1, 3346 (2017).
doi:10.1002/gch2.1018 Medline
38. V. D. Menachery, B. L. Yount Jr., K. Debbink, S. Agnihothram, L. E. Gralinski, J. A.
Plante, R. L. Graham, T. Scobey, X.-Y. Ge, E. F. Donaldson, S. H. Randell, A.
Lanzavecchia, W. A. Marasco, Z.-L. Shi, R. S. Baric, A SARS-like cluster of
circulating bat coronaviruses shows potential for human emergence. Nat. Med.
21, 15081513 (2015). doi:10.1038/nm.3985 Medline
39. V. D. Menachery, B. L. Yount Jr., A. C. Sims, K. Debbink, S. S. Agnihothram, L. E.
Gralinski, R. L. Graham, T. Scobey, J. A. Plante, S. R. Royal, J. Swanstrom, T. P.
Sheahan, R. J. Pickles, D. Corti, S. H. Randell, A. Lanzavecchia, W. A. Marasco, R.
S. Baric, SARS-like WIV1-CoV poised for human emergence. Proc. Natl. Acad. Sci.
U.S.A. 113, 30483053 (2016). doi:10.1073/pnas.1517719113 Medline
40. X.-Y. Ge, J.-L. Li, X.-L. Yang, A. A. Chmura, G. Zhu, J. H. Epstein, J. K. Mazet, B. Hu,
W. Zhang, C. Peng, Y.-J. Zhang, C.-M. Luo, B. Tan, N. Wang, Y. Zhu, G. Crameri, S.-
Y. Zhang, L.-F. Wang, P. Daszak, Z.-L. Shi, Isolation and characterization of a bat
SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535538 (2013).
doi:10.1038/nature12711 Medline
41. N. Wang, S.-Y. Li, X.-L. Yang, H.-M. Huang, Y.-J. Zhang, H. Guo, C.-M. Luo, M. Miller,
G. Zhu, A. A. Chmura, E. Hagan, J.-H. Zhou, Y.-Z. Zhang, L.-F. Wang, P. Daszak, Z.-
L. Shi, Serological Evidence of Bat SARS-Related Coronavirus Infection in
Humans, China. Virol. Sin. 33, 104107 (2018). doi:10.1007/s12250-018-0012-7
Medline
42. P. Liu, J. -Z. Jiang , X. -F. Wan, Y. Hua, L. Li, J. Zhou, X. Wang, F. Hou, J. Chen, J.
Zou, J. Chen, Are pangolins the intermediate host of the 2019 novel coronavirus
(SARS-CoV-2)? PLoS Pathog 16, e1008421 (2020).
43. K. Xiao, J. Zhai, Y. Feng, N. Zhou, X. Zhang, J. -J. Zou, N. Li, Y. Guo, X. Li, X. Shen,
Z. Zhang, F. Shu, W. Huang, Y. Li, Z. Zhang, R. -A. Chen, Y. -J. Wu, S. -M. Peng, M.
Huang, W. -J. Xie, Q.-H. Cai, F. -H. Hou, Y. Liu, W. Chen, L. Xiao, Y. Shen, Isolation
and Characterization of 2019-nCoV-like Coronavirus from Malayan Pangolins.
bioRxiv, 2020.2002.2017.951335 (2020).
44. M. A. Larkin, G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H.
McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J.
Gibson, D. G. Higgins, Clustal W and Clustal X version 2.0. Bioinformatics 23,
29472948 (2007). doi:10.1093/bioinformatics/btm404 Medline
45. S. Guindon, O. Gascuel, A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst. Biol. 52, 696704 (2003).
doi:10.1080/10635150390235520 Medline
46. B. Foley, T. Leitner, C. Apetrei, B. Hahn, I. Mizrachi, J. Mullins, A. Rambaut, S.
Wolinsky, B. Korber, HIV sequence compendium 2018. (Theoretical Biology and
Biophysics Group, Los Alamos National Laboratory, NM, LA-UR 18-25673, Los
Alamos, New Mexico, 2018).
47. B. B. T. Korber, in Computational Analysis of HIV Molecular Sequences, A. G.
Rodrigo, G. H. Learn, Eds. (Kluwer Academic Publishers, Dordrecht, Netherlands,
2000), chap. 4, pp. 55-72.
48. E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C.
Meng, T. E. Ferrin, UCSF ChimeraA visualization system for exploratory
research and analysis. J. Comput. Chem. 25, 16051612 (2004).
doi:10.1002/jcc.20084 Medline
49. J. Yang, Y. Zhang, Protein Structure and Function Prediction Using I-TASSER.
Current protocols in bioinformatics 52, 5.8.1-5.8.15 (2015).
50. R. Abagyan, M. Totrov, D. Kuznetsov, ICMA new method for protein modeling
and design: Applications to docking and structure prediction from the distorted
native conformation. J. Comput. Chem. 15, 488506 (1994).
doi:10.1002/jcc.540150503
ACKNOWLEDGMENTS
We thank all those who have contributed SARS-CoV-2 genome sequences to the
GISAID database (https://www.gisaid.org). We also thank Dr. Xinquan Wang
from Tsinghua University for sharing the PDB 6M0J structure with us before its
official release date. Funding: EEG, BK, SG and BF acknowledge support by the
Laboratory Directed R esearch and Development program of Los Alamos
National Laboratory under project number 20200554ECR. Author
contributions: Project conceptualization: F.G., B.K., E.E.G; Structure analysis:
C.X., X-P.K., S.G.; Sequence analysis: F.G., B.K., X.L., E.E.G., M.H.M., Y.C., B.F;
Phylogenetic analysis: F.G., B.K., X.L., E.E.G., M.H.M., Y.C.; Recombination
analysis: F.G., E.E.G., B.K., X.L., M.H.M., B.F.; Manuscript writing: F.G., B.K.,
E.E.G. Manuscript editing: F.G., B.K., E.E.G., X.L., C.X., X-P.K.; F.G. and B.F.
supervised the project. Competing interests: All authors declare no competing
interests. All data are available in the main text or the supplementary materials.
Data and materials availability: All data needed to evaluate the conclusions in
the paper are present in the paper and/or the Supplementary Materials.
Additional data related to this paper may be requested from the authors.
SUPPLEMENTARY MATERIALS
advances.sciencemag.org/cgi/content/full/sciadv.abb9153/DC1
Submitted 26 March 2020
Accepted 19 May 2020
Published First Release 29 May 2020
10.1126/sciadv.abb9153
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 8
Fig. 1. SARS-CoV-2 recombination with Pan_SL-CoV and Bat_SL-CoV.
(
A
) SimPlot
genetic similarity plot between SARS-CoV-2 Wuhan-Hu-
1 and representative CoV
sequences, using a 400-bp window at a 50-bp step and the Kimura 2-parameter model.
Phylogenetic trees of regions of disproportional similarities, showing high similarities
between SARS-CoV-2 and ZXC21 (
B
) or GD/P1La (
C
), high genetic divergences of all
Pan_SL-CoV sequences (
D
), and high similarities between GD/P1La and to divergent
bat_SL-CoV sequences (
E
). All positions are relative to Wuhan-Hu-1. In Fig. 1A we use the
ORF1a and ORF1b nomenclature consistent with the original publication from of the Wuhan
virus (3), however, the NCBI betacoronavirus reference sequences (see SAR-CoV-2,
NC_045512.2, for an example) designate a single longer stretch called ORF1ab (from 266 to
21,555) that spans both 1a and 1b.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 9
Fig. 2. Impact of SARS-CoV-2 recombination on coreceptor binding
. (
A
) Amino acid sequences of the
receptor binding motif (RBM) in the spike (S) gene among Sarbecovirus CoVs compared to Wuhan-Hu-1
(top). Dashes indicate identical amino acids, dots indicate deletions. ACE2 critical contact sites highlighted
in blue, two large deletions in green. (
B
) Phylogenetic tree analysis of amino acids sequences of RBM. Viruses
with the ability to bind ACE2 form two distinct clusters (one including SARS_CoVs and the other including
SARS_CoV-2s). Bat-SL-CoVs with large deletions forms another distinct cluster.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 10
Fig. 3. Structure analysis of the RBM and ACE2 interface. (A) SARS-CoV and
SARS-CoV-2 receptor binding domains (RBD). Human ACE2 in green (PDB
6M0J) at the top and the RBD of the S-protein at the bottom; SARS-CoV S-protein
(PDB 2AJF) in red, and SARS-CoV-2 S-protein (PDB 6M0J) in magenta with RBM
in blue. All structure backbones shown as ribbons with key residues at the
interface shown as stick models, labeled using the same color scheme. (
B
)
Impact of different RBM amino acids between SARS-CoV-2 RaTG13 on ACE2
binding. (
C
) Impact of an amino acid at position 498 (Q in SARS-CoV-2, top, and
H in RaTG13, bottom) on ACE2 binding. Same color-coding as in (
A
) with
additional hydrogen bonding as light blue lines. (
D
) Impact of two deletions on
ACE2 binding interface in some bat-SL-CoVs, positions indicated in yellow, and
modeled structure with long deletion between residue 473 in light blue.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 11
Fig. 4. Strong purifying selection after furin cleavage in S gene among SARS-CoV-2 and
closely related viruses.
(
A
) Phylogenetic tree (left) and Highlighter plot (right) of sequences
around the RBM and furin cleavage site compared to SARS-CoV-2 Wuhan-Hu-1 (na positions
22541-24391). ACE2 receptor binding motif (RBM) and furin cleavage site highlighted in
light-gray boxes. Mutations compared to Wuahn-Hu-1 are light blue for synonymous, red for
non-synonymous. Dominance of synonymous mutations within group A compared to group
B highlighted on the right. (
B
) Cumulative plots of each codon average behavior for all
pairwise comparisons for indels and synonymous (light blue) and non-synonymous (red)
mutations, by group. The abrupt slope change of the nonsynonymous curve in group A at
around codon 368 (na 1104) is indicative of a shift in localized accumulations of non-
synonymous mutations after the furin cleavage site. Group B instead lacks this abrupt
change in slope at the same position. Values of ω denote average ratios of the rate of
nonsynonymous substitutions per nonsynonymous site (dN/dS) for each group and region.
(
C
) Sequence dS/dN ratios compared to Wuhan-Hu-1 within codons 1-368 (na 1-1104,
green) and codons 369-620 (na 1105-1893, dark blue) in group A and group B sequences.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 12
Fig. 5. Strong purifying selection on complete and partial gene regions among SARS-CoV-2,
RaTG13 and Pan_SL-CoV viruses.
Purifying selection pressure on complete and partial genes among
different viruses (red boxes) as evident by shorter branches in amino acid trees compared to nucleic
acid trees. Distinct purifying selection patterns are observed among different viruses: (
A
) SARS-CoV-
2, RaTG13, all Pan_SL-CoV and bat CoV ZXC21 and ZC45; (
B
) SARS-CoV-2, RaTG13, all Pan_SL-CoV
sequences; (
C
) SARS-CoV-2, RaTG13 and Pan_SL-CoV_GD. Cumulative plots of the average behavior
of each codon for all pairwise comparisons for synonymous mutations, non-synonymous mutations
and indels within each gene region. ω denotes the average ratio of the rate of nonsynonymous
substitutions per nonsynonymous site (dN/dS) for each group.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
First release: 29 May 2020 www.advances.sciencemag.org (Page numbers not final at time of first release) 13
Fig. 6. Multiple recombination of SARS-CoVs with different bat_SL-CoVs.
(
A
) SimPlot
genetic similarity plot between SARS-CoV GZ02 and SARS_SL-CoVs, using a 400-bp
window at a 50-bp step and the Kimura 2-parameter model. Group A CoVs (YN2018B,
Rs9401, Rs7327, WIV16 and Rs4231) are shown in blue, group B CoVs (Rf4092, YN2013,
Anlong-112 and GX2013) in orange, YNLF-34C in green, and outlier control HKU3-12 in
red. Phylogenetic trees for high similarity regions between GZ02 and YNLF-34C (
B
),
group A (
C
), and group B (
D
). All positions are relative to Wuhan-Hu-1.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
Emergence of SARS-CoV-2 through recombination and strong purifying selection
Chen, S. Gnanakaran, Bette Korber and Feng Gao
Xiaojun Li, Elena E. Giorgi, Manukumar Honnayakanahalli Marichannegowda, Brian Foley, Chuan Xiao, Xiang-Peng Kong, Yue
published online May 29, 2020
ARTICLE TOOLS http://advances.sciencemag.org/content/early/2020/05/28/sciadv.abb9153
MATERIALS
SUPPLEMENTARY http://advances.sciencemag.org/content/suppl/2020/05/28/sciadv.abb9153.DC1
REFERENCES http://advances.sciencemag.org/content/early/2020/05/28/sciadv.abb9153#BIBL
This article cites 42 articles, 7 of which you can access for free
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Terms of ServiceUse of this article is subject to the
is a registered trademark of AAAS.Science AdvancesAvenue NW, Washington, DC 20005. The title
(ISSN 2375-2548) is published by the American Association for the Advancement of Science, 1200 New YorkScience Advances
BY-NC). (CCNo claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0
Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science.
on May 29, 2020http://advances.sciencemag.org/Downloaded from
... Positive sense ssRNA viruses, like coronaviruses, are known to experience higher levels of recombination than negative sense ssRNA viruses [24]. It has long been known that coronaviruses experience frequent recombination events within the family Coronaviridae [26][27][28] leading to the genesis of new viruses and it has also been proposed that the ability to bind to human ACE2 receptors, which appears to be key to pathogenicity with regards to SARS-like coronaviruses, is the result of a heretofore unknown recombination event [25,29,30]. Additionally, interfamily recombination among viruses has been shown to occur frequently among some plant viruses [31,32] and several recombination events have been proposed and/or are now accepted to have occurred in mammalian coronaviruses [16,17] sometimes resulting in increased host tropism [32]. ...
... In data that correlates well with functional/structural evolution, such as MWHP PCDTW distances [18], it may be impossible to rule out convergence when sequences with low amino acid identity also appear to be very similar based on MWHP PCDTW distance assessment. While recombination is not specifically addressed, evidence supports the supposition that recombination could have played a role in the acquisition of ACE2 binding within betacoronaviruses [7,25,29]. We have shown in most of the virus families analyzed that the betacoronavirus clade includes some non-coronavirus sequences, exposing the possibility that while convergence cannot be ruled out, those sequences likely could have been mobilized from betacoronaviruses to other viral lineages through horizontal transfer, potentially via recombination. ...
Article
Full-text available
Recently, we proposed a new method, based on protein profiles derived from physicochemical dynamic time warping (PCDTW), to functionally/structurally classify coronavirus spike protein receptor binding domains (RBD). Our method, as used herein, uses waveforms derived from two physicochemical properties of amino acids (molecular weight and hydrophobicity (MWHP)) and is designed to reach into the twilight zone of homology, and therefore, has the potential to reveal structural/functional relationships and potentially homologous relationships over greater evolutionary time spans than standard primary sequence alignment-based techniques. One potential application of our method is inferring deep evolutionary relationships such as those between the RBD of the spike protein of betacoronaviruses and functionally similar proteins found in other families of viruses, a task that is extremely difficult, if not impossible, using standard multiple alignment-based techniques. Here, we applied PCDTW to compare members of four divergent families of viruses to betacoronaviruses in terms of MWHP physicochemical similarity of their RBDs. We hypothesized that some members of the families Arteriviridae, Astroviridae, Reoviridae (both from the genera rotavirus and orthoreovirus considered separately), and Toroviridae would show greater physicochemical similarity to betacoronaviruses in protein regions similar to the RBD of the betacoronavirus spike protein than they do to other members of their respective taxonomic groups. This was confirmed to varying degrees in each of our analyses. Three arteriviruses (the glycoprotein-2 sequences) clustered more closely with ACE2-binding betacoronaviruses than to other arteriviruses, and a clade of 33 toroviruses was found embedded within a clade of non-ACE2-binding betacoronaviruses, indicating potentially shared structure/function of RBDs between betacoronaviruses and members of other virus clades.
... This virus has caused a pandemic of a proportion not seen since the Spanish Flu pandemic of 1918, and, unfortunately, humanity was not prepared to deal with this new pandemic. [1][2][3] In Brazil, as of October 24 th , 2023, more than 37.9 million cases have been confirmed and reported, with the live loss of around 706,000 lives. 4 The SARS-CoV-2 virus encodes a total of four structural proteins, among which the most important in cell invasion are the spike glycoprotein (S) and the nucleocapsid protein (N). ...
... 9,10 Despite all the emergent variants of concern, such as P.1 / gamma that emerged in the city of Manaus (State of Amazonas, Brazil), 11 or the most recent Omicron variant, the critical importance of cellular immunity, where CD4+ and CD8+ T cells normally recognized the protein sequence spike, even with all it changes. 1,[12][13][14] In the humoral response to SARS-CoV-2 infection, the literature has already established the importance of neutralizing antibodies, such as IgG, and this response is also closely correlated with the severity of the disease in some patients. 15 Among the most important antibodies, there is the IgA class, which prevails in the human body initial response to the SARS-CoV-2 infection, compared to IgG and IgM concentrations. ...
Article
Full-text available
The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) stands for being the most serious epidemic (so far) of the 21 st century. However, only a few computational studies have investigated the molecular mechanisms underlying the neutralization of the spike protein by antibodies of different classes. Hence, bioinformatic methods were employed to unravel the factors contributing to the remarkable neutralization capacity exhibited by specific antibodies. Initially, crystallographic structures of IgA monomeric / dimeric, IgG, and IgM antibodies binding with the receptor-binding domain region of the SARS-CoV-2 spike protein were retrieved. Subsequently, rigid molecular docking and molecular dynamic simulations were performed over 100 ns with explicit water solvation. Lastly, an energy decomposition was conducted to estimate the binding affinity using the last frames from molecular dynamics. The results revealed a higher binding affinity for both monomeric and dimeric forms of IgA antibodies against the spike protein. Additionally, a greater number of hydrogen bonds were observed during their interaction with the spike protein, as well as greater structural instability along the time and especially a more thermodynamically favorable interaction affinity. In this way, the research contributes a small piece to the complex puzzle of understanding the humoral immune response induced by the SARS-CoV-2 virus.
... Recombination has been linked to the emergence of new coronavirids, such as SARS-CoV (25), and the evolution of new variants of SARS-CoV-2 (26). Initial hypotheses suggested that SARS-CoV-2 acquired its RBD through recombination with a sarbecovirus found in pangolins (27). However, subsequent research disputed this claim, proposing alternative scenarios involving more ancestral recombination events with other closely related sarbecoviruses, such as SARS-CoV, or sarbecovirus strains, such as RaTG13 (17,18,23,24). ...
Article
Full-text available
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; family Coronaviridae, genus Betacoronavirus, subgenus Sarbecovirus) has caused millions of deaths, prompting a need for better understanding of coronavirid emergence and spillover to humans. As an evaluation of how some features of SARS-CoV-2, unique among sarbecoviruses, may have been acquired from related viruses, we conducted phylogenetic and recombination analyses to compare the frequency of recombination among coronavirids across vs within genera, subgenera, and species. Among known betacoronaviruses, we identified 199 (183 intraspecies, 16 interspecies, but no inter subgenera) recombination events. Phylogenetic analyses revealed that the ancestry of interspecies events was limited and less prone to affect 5′ regions of coronavirid genome open reading frame 1 (ORF1) than intraspecies events. On the contrary, interspecies events were significantly more prone to impact the 3′ end (ORF6-ORF8 and the nucleocapsid protein [N] ORF), suggesting the existence of region-specific constraints on recombination. This work substantiated that recombination among betacoronaviruses is limited by the genome similarity between their parental viruses. We conclude that SARS-CoV-2 likely acquired unique features through recombination with closely related circulating sarbecoviruses (most likely from the same species) that co-existed geographi cally. IMPORTANCE Understanding the evolutionary events that led to SARS-CoV-2 emer gence, spillover, and spread is crucial to prevent, or at least be prepared for, the same type of occurrence in the future. Given that SARS-CoV-2 has some characteristics not found in other closely related viruses, we aimed to systematically assess how likely these unique features may have been acquired through recombination. We found that, although recombination is a frequent phenomenon among betacoronaviruses, it is mostly limited to closely related members of the same species. Therefore, we conclude that the most likely scenario involved feature acquisition from recombination with a closely related virus that was circulating in a geographically overlapping area or through a different biological process, but not recombination from a virus of a different species, genus, or subgenus.
... Altogether, the alignment of coronaviral S glycoproteins with the reference sequence revealed a high evolutionary relationship between SARS-CoV (Urbani), bat CoVs, and pangolin CoVs. It is suggested that the emergence of the highly contagious and pandemic-causing SARS-CoV-2 is highly attributable to genome recombination or mutations of the coronavirus in animal hosts such as bats [37][38][39]. The high evolutionary relationship among coronaviruses sheds light on the development of universal vaccines using conserved epitopes. ...
Article
Full-text available
Background/Objectives: The COVID-19 pandemic caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has exposed the vulnerabilities and unpreparedness of the global healthcare system in dealing with emerging zoonoses. In the past two decades, coronaviruses (CoV) have been responsible for three major viral outbreaks, and the likelihood of future outbreaks caused by these viruses is high and nearly inevitable. Therefore, effective prophylactic universal vaccines targeting multiple circulating and emerging coronavirus strains are warranted. Methods: This study utilized an immunoinformatic approach to identify evolutionarily conserved CD4+ (HTL) and CD8+ (CTL) T cells, and B-cell epitopes in the coronaviral spike (S) glycoprotein. Results: A total of 132 epitopes were identified, with the majority of them found to be conserved across the bat CoVs, pangolin CoVs, endemic coronaviruses, SARS-CoV-2, and Middle East respiratory syndrome coronavirus (MERS-CoV). Their peptide sequences were then aligned and assembled to identify the overlapping regions. Eventually, two major peptide assemblies were derived based on their promising immune-stimulating properties. Conclusions: In this light, they can serve as lead candidates for universal coronavirus vaccine development, particularly in the search for pan-coronavirus multi-epitope universal vaccines that can confer protection against current and novel coronaviruses.
... The rate of zoonotic pathogen emergence reveals that human-induced changes have brought wildlife, livestock, and humans into closer and more frequent contact (Morse et al., 2012). The proximity of different wild and domestic animal species in a wildlife market setting may enable recombination between more distant coronaviruses and the emergence of recombinants with novel phenotypes (Li et al., 2020). This is particularly relevant ...
Article
Full-text available
Livestock (farmed domestic animals) play crucial roles in the attainment of several Sustainable Development Goals (SDGs) of the United Nations. There is also an intricate link between one health (human, animal, and environmental) that is advocated by the World Health Organisation and the Sustainable Development Goals which encompasses environmental, economic, and social issues. Many infectious diseases and new or emerging infectious diseases are zoonotic in origin; this includes the current pandemic known as COVID-19. Animal-source foods will increasingly play a huge role in ensuring basic nutrition and health for humans in the coming years, especially in developing countries where the human population will increase rapidly. Three SDGs (Zero hunger, Good health and well-being, and Responsible consumption and production) will thus be addressed by livestock development. Livestock holds the key to sustainable economic growth, addressing two SDGs (Decent work and economic growth and Industry, Innovation, and Infrastructure). The livestock sector contributes 40% of the Agricultural GDP in developing 116 countries and the percentage is growing (FAO, 2021). Equitable livelihoods can be achieved by livestock development, covering four SDGs (No poverty, Quality education, Gender equality, and Peace, Justice, and Strong Institutions). Lastly, livestock can help ensure sustainable ecosystems. Six SDGs can be covered (Clean water, Affordable and Clean Energy, Sustainable cities and communities, Climate Action, Life below water, and Life on land). Global livestock development should therefore be given a pride of place, especially considering their envisaged importance in developing countries.
... Theories regarding the transmission of this virus from bats to humans are divided; some suggest the involvement of intermediate hosts, while others suggest direct transmission [29][30][31]. Approximately one year after its initial identification, SARS-CoV-2 began to be detected in various variants and sub-variants, which have raised significant concerns among health specialists [32]. ...
Article
Full-text available
Obesity, the current pandemic, is associated with alarming rises among children and adolescents, and the forecasts for the near future are worrying. The present paper aims to draw attention to the short-term effects of the excess adipose tissue in the presence of a viral infection, which can be life-threatening for pediatric patients, given that the course of viral infections is often severe, if not critical. The COVID-19 pandemic has been the basis of these statements, which opened the door to the study of the repercussions of obesity in the presence of a viral infection. Since 2003, with the discovery of SARS-CoV-1, interest in the study of coronaviruses has steadily increased, with a peak during the pandemic. Thus, obesity has been identified as an independent risk factor for COVID-19 infection and is correlated with a heightened risk of severe outcomes in pediatric patients. We sought to determine the main mechanisms through which obesity is responsible for the unfavorable evolution in the presence of a viral infection, with emphasis on the disease caused by SARS-CoV-2, in the hope that future studies will further elucidate this aspect, enabling prompt and effective intervention in obese patients with viral infections, whose clinical progression is likely to be favorable.
... It has been estimated that the mutation rate of CoVs is 10 −6 per base per infection cycle 56 and 10 -3 per site per year 57 . Recombination events in conjunction with purifying selection result in the generation of new CoV variants 58,59 . ...
Article
Full-text available
Coronaviruses (CoVs) have caused three global outbreaks: severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) in 2003, Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012, and SARS-CoV-2 in 2019, with significant mortality and morbidity. The impact of coronavirus disease 2019 (COVID-19) raised serious concerns about the global preparedness for a pandemic. Furthermore, the changing antigenic landscape of SARS-CoV-2 led to new variants with increased transmissibility and immune evasion. Thus, the development of broad-spectrum vaccines against current and future emerging variants of CoVs will be an essential tool in pandemic preparedness. Distinct phylogenetic features within CoVs complicate and limit the process of generating a pan-CoV vaccine capable of targeting the entire Coronaviridae family. In this review, we aim to provide a detailed overview of the features of CoVs, their phylogeny, current vaccines against various CoVs, the efforts in developing broad-spectrum coronavirus vaccines, and the future.
Article
Full-text available
Background Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing millions of viral genomes in multi-FASTA format is computationally demanding, especially when using alignment-based methods. Most existing methods are not designed to handle such large datasets, often requiring the analysis to be divided into smaller parts to obtain results using available computational resources. Findings We introduce AltaiR, a toolkit for analyzing multiple sequences in multi-FASTA format using exclusively alignment-free methodologies. AltaiR enables the identification of singularity and similarity patterns within sequences and computes static and temporal dynamics without restrictions on the number or size of input sequences. It automatically filters low-quality, biased, or deviant data. We demonstrate AltaiR’s capabilities by analyzing more than 1.5 million full severe acute respiratory virus coronavirus 2 sequences, revealing interesting observations regarding viral genome characteristics over time, such as shifts in nucleotide composition, decreases in average Kolmogorov sequence complexity, and the evolution of the smallest sequences not found in the human host. Conclusions AltaiR can identify temporal characteristics and trends in large numbers of sequences, making it ideal for scenarios involving endemic or epidemic outbreaks with vast amounts of available sequence data. Implemented in C with multithreading and methodological optimizations, AltaiR is computationally efficient, flexible, and dependency-free. It accepts any sequence in FASTA format, including amino acid sequences. The complete toolkit is freely available at https://github.com/cobilab/altair.
Article
Throughout the course of the SARS-CoV-2 pandemic, genetic variation has contributed to the spread and persistence of the virus. For example, various mutations have allowed SARS-CoV-2 to escape antibody neutralization or to bind more strongly to the receptors that it uses to enter human cells. Here, we compared two methods that estimate the fitness effects of viral mutations using the abundant sequence data gathered over the course of the pandemic. Both approaches are grounded in population genetics theory but with different assumptions. One approach, tQLE, features an epistatic fitness landscape and assumes that alleles are nearly in linkage equilibrium. Another approach, MPL, assumes a simple, additive fitness landscape, but allows for any level of correlation between alleles. We characterized differences in the distributions of fitness values inferred by each approach and in the ranks of fitness values that they assign to sequences across time. We find that in a large fraction of weeks the two methods are in good agreement as to their top-ranked sequences, \textit{i.e.} as to which sequences observed that week are most fit. We also find that agreement between the ranking of sequences varies with genetic unimodality in the population in a given week.
Article
Full-text available
The outbreak of a novel corona Virus Disease 2019 (COVID-19) in the city of Wuhan, China has resulted in more than 1.7 million laboratory confirmed cases all over the world. Recent studies showed that SARS-CoV-2 was likely originated from bats, but its intermediate hosts are still largely unknown. In this study, we assembled the complete genome of a coronavirus identified in 3 sick Malayan pangolins. The molecular and phylogenetic analyses showed that this pangolin coronavirus (pangolin-CoV-2020) is genetically related to the SARS-CoV-2 as well as a group of bat coronaviruses but do not support the SARS-CoV-2 emerged directly from the pangolin-CoV-2020. Our study suggests that pangolins are natural hosts of Betacoronaviruses. Large surveillance of coronaviruses in pangolins could improve our understanding of the spectrum of coronaviruses in pangolins. In addition to conservation of wildlife, minimizing the exposures of humans to wildlife will be important to reduce the spillover risks of coronaviruses from wild animals to humans.
Article
Full-text available
Distinct viral clades have a likely impact on COVID‐19 pathogenesis and spread. Sequence analysis from 2310 viral isolates from Nexstrain reveals that residue at 614 of the viral spike protein is changed from a putative ancestral aspartic acid (D) to a glycine (G) between two viral clades. The G strain is predominantly on the East Coast of the United States, and the D strain is predominantly on the West Coast. This mutation of the SARS‐CoV‐2 S protein spike is conserved in coronaviruses. Point mutations in a murine coronavirus spike protein can result in increased virulence through instability of the viral machinery and altered viral to cell membrane fusion. This observation may partially explain the discrepancy in predicted deaths from COVID‐19 between the East Coast and West Coast, and possibly explain that other factors aside from social distance, such as competition between two strains of differing virulence, may be at play. This article is protected by copyright. All rights reserved.
Article
Full-text available
The recent outbreak of coronavirus disease (COVID-19) caused by SARS-CoV-2 infection in Wuhan, China has posed a serious threat to global public health. To develop specific anti-coronavirus therapeutics and prophylactics, the molecular mechanism that underlies viral infection must first be defined. Therefore, we herein established a SARS-CoV-2 spike (S) protein-mediated cell–cell fusion assay and found that SARS-CoV-2 showed a superior plasma membrane fusion capacity compared to that of SARS-CoV. We solved the X-ray crystal structure of six-helical bundle (6-HB) core of the HR1 and HR2 domains in the SARS-CoV-2 S protein S2 subunit, revealing that several mutated amino acid residues in the HR1 domain may be associated with enhanced interactions with the HR2 domain. We previously developed a pan-coronavirus fusion inhibitor, EK1, which targeted the HR1 domain and could inhibit infection by divergent human coronaviruses tested, including SARS-CoV and MERS-CoV. Here we generated a series of lipopeptides derived from EK1 and found that EK1C4 was the most potent fusion inhibitor against SARS-CoV-2 S protein-mediated membrane fusion and pseudovirus infection with IC50s of 1.3 and 15.8 nM, about 241- and 149-fold more potent than the original EK1 peptide, respectively. EK1C4 was also highly effective against membrane fusion and infection of other human coronavirus pseudoviruses tested, including SARS-CoV and MERS-CoV, as well as SARSr-CoVs, and potently inhibited the replication of 5 live human coronaviruses examined, including SARS-CoV-2. Intranasal application of EK1C4 before or after challenge with HCoV-OC43 protected mice from infection, suggesting that EK1C4 could be used for prevention and treatment of infection by the currently circulating SARS-CoV-2 and other emerging SARSr-CoVs.
Article
Full-text available
A novel and highly pathogenic coronavirus (SARS-CoV-2) has caused an outbreak in Wuhan city, Hubei province of China since December 2019, and soon spread nationwide and spilled over to other countries around the world1–3. To better understand the initial step of infection at an atomic level, we determined the crystal structure of the SARS-CoV-2 spike receptor-binding domain (RBD) bound to the cell receptor ACE2 at 2.45 Å resolution. The overall ACE2-binding mode of the SARS-CoV-2 RBD is nearly identical to that of the SARS-CoV RBD, which also utilizes ACE2 as the cell receptor⁴. Structural analysis identified residues in the SARS-CoV-2 RBD that are critical for ACE2 binding, the majority of which either are highly conserved or share similar side chain properties with those in the SARS-CoV RBD. Such similarity in structure and sequence strongly argue for convergent evolution between the SARS-CoV-2 and SARS-CoV RBDs for improved binding to ACE2, although SARS-CoV-2 does not cluster within SARS and SARS-related coronaviruses1–3,5. The epitopes of two SARS-CoV antibodies targeting the RBD are also analysed with the SARS-CoV-2 RBD, providing insights into the future identification of cross-reactive antibodies.
Article
Full-text available
A novel SARS-like coronavirus (SARS-CoV-2) recently emerged and is rapidly spreading in humans1,2. A key to tackling this epidemic is to understand the virus’s receptor recognition mechanism, which regulates its infectivity, pathogenesis and host range. SARS-CoV-2 and SARS-CoV recognize the same receptor - human ACE2 (hACE2)3,4. Here we determined the crystal structure of the SARS-CoV-2 receptor-binding domain (RBD) (engineered to facilitate crystallization) in complex with hACE2. Compared with the SARS-CoV RBD, a hACE2-binding ridge in SARS-CoV-2 RBD takes a more compact conformation; moreover, several residue changes in SARS-CoV-2 RBD stabilize two virus-binding hotspots at the RBD/hACE2 interface. These structural features of SARS-CoV-2 RBD enhance its hACE2-binding affinity. Additionally, we show that RaTG13, a bat coronavirus closely related to SARS-CoV-2, also uses hACE2 as its receptor. The differences among SARS-CoV-2, SARS-CoV and RaTG13 in hACE2 recognition shed light on potential animal-to-human transmission of SARS-CoV-2. This study provides guidance for intervention strategies targeting receptor recognition by SARS-CoV-2.
Article
Full-text available
The ongoing outbreak of viral pneumonia in China and beyond is associated with a novel coronavirus, SARS-CoV-2¹. This outbreak has been tentatively associated with a seafood market in Wuhan, China, where the sale of wild animals may be the source of zoonotic infection². Although bats are likely reservoir hosts for SARS-CoV-2, the identity of any intermediate host that might have facilitated transfer to humans is unknown. Here, we report the identification of SARS-CoV-2-related coronaviruses in Malayan pangolins (Manis javanica) seized in anti-smuggling operations in southern China. Metagenomic sequencing identified pangolin-associated coronaviruses that belong to two sub-lineages of SARS-CoV-2-related coronaviruses, including one that exhibits strong similarity to SARS-CoV-2 in the receptor-binding domain. The discovery of multiple lineages of pangolin coronavirus and their similarity to SARS-CoV-2 suggests that pangolins should be considered as possible hosts in the emergence of novel coronaviruses and should be removed from wet markets to prevent zoonotic transmission.
Article
Full-text available
Over the last two decades, the world experienced three outbreaks of coronaviruses with elevated morbidity rates. Currently, the global community is facing emerging virus SARS-CoV-2 belonging to Betacoronavirus, which appears to be more transmissible but less deadly than SARS-CoV. The current study aimed to track the evolutionary ancestors and different evolutionary strategies that were genetically adapted by SARS-CoV-2. Our whole-genome analysis revealed that SARS-CoV-2 was the descendant of Bat SARS/SARS-like CoVs and bats served as a natural reservoir. SARS-CoV-2 used mutations and recombination as crucial strategies in different genomic regions including the envelop, membrane, nucleocapsid, and spike glycoproteins to become a novel infectious agent. We confirmed that mutations in different genomic regions of SARS-CoV-2 have specific influence on virus reproductive adaptability, allowing for genotype adjustment and adaptations in rapidly changing environments. Moreover, for the first time we identified nine putative recombination patterns in SARS-CoV-2, which encompass spike glycoprotein, RdRp, helicase and ORF3a. Six recombination regions were spotted in the S gene and are undoubtedly important for evolutionary survival, meanwhile this permitted the virus to modify superficial antigenicity to find a way from immune reconnaissance in animals and adapt to a human host. With these combined natural selected strategies, SARS-CoV-2 emerged as a novel virus in human society.
Preprint
The unprecedented epidemic of pneumonia caused by a novel coronavirus, HCoV-19, in China and beyond has caused public health concern at a global scale. Although bats are regarded as the most likely natural hosts for HCoV-19 1,2 , the origins of the virus remain unclear. Here, we report a novel bat-derived coronavirus, denoted RmYN02, identified from a metagenomics analysis of samples from 227 bats collected from Yunnan Province in China between May and October, 2019. RmYN02 shared 93.3% nucleotide identity with HCoV-19 at the scale of the complete virus genome and 97.2% identity in the 1ab gene in which it was the closest relative of HCoV-19. In contrast, RmYN02 showed low sequence identity (61.3%) to HCoV-19 in the receptor binding domain (RBD) and might not bind to angiotensin-converting enzyme 2 (ACE2). Critically, however, and in a similar manner to HCoV-19, RmYN02 was characterized by the insertion of multiple amino acids at the junction site of the S1 and S2 subunits of the Spike (S) protein. This provides strong evidence that such insertion events can occur in nature. Together, these data suggest that HCoV-19 originated from multiple naturally occurring recombination events among those viruses present in bats and other wildlife species.
Article
The emergence of SARS-CoV-2 has resulted in >90,000 infections and >3,000 deaths. Coronavirus spike (S) glycoproteins promote entry into cells and are the main target of antibodies. We show that SARS-CoV-2 S uses ACE2 to enter cells and that the receptor-binding domains of SARS-CoV-2 S and SARS-CoV S bind with similar affinities to human ACE2, correlating with the efficient spread of SARS-CoV-2 among humans. We found that the SARS-CoV-2 S glycoprotein harbors a furin cleavage site at the boundary between the S1/S2 subunits, which is processed during biogenesis and sets this virus apart from SARS-CoV and SARS-related CoVs. We determined cryo-EM structures of the SARS-CoV-2 S ectodomain trimer, providing a blueprint for the design of vaccines and inhibitors of viral entry. Finally, we demonstrate that SARS-CoV S murine polyclonal antibodies potently inhibited SARS-CoV-2 S mediated entry into cells, indicating that cross-neutralizing antibodies targeting conserved S epitopes can be elicited upon vaccination.