Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human.
ABSTRACT The genomic sequences of severe acute respiratory syndrome coronaviruses from human and palm civet of the 2003/2004 outbreak in the city of Guangzhou, China, were nearly identical. Phylogenetic analysis suggested an independent viral invasion from animal to human in this new episode. Combining all existing data but excluding singletons, we identified 202 single-nucleotide variations. Among them, 17 are polymorphic in palm civets only. The ratio of nonsynonymous/synonymous nucleotide substitution in palm civets collected 1 yr apart from different geographic locations is very high, suggesting a rapid evolving process of viral proteins in civet as well, much like their adaptation in the human host in the early 2002-2003 epidemic. Major genetic variations in some critical genes, particularly the Spike gene, seemed essential for the transition from animal-to-human transmission to human-to-human transmission, which eventually caused the first severe acute respiratory syndrome outbreak of 2002/2003.
- SourceAvailable from: oxfordjournals.org[show abstract] [hide abstract]
ABSTRACT: A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.Molecular Biology and Evolution 10/1994; 11(5):725-36. · 10.35 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: A worldwide outbreak of severe acute respiratory syndrome (SARS) has been associated with exposures originating from a single ill health care worker from Guangdong Province, China. We conducted studies to identify the etiologic agent of this outbreak. We received clinical specimens from patients in seven countries and tested them, using virus-isolation techniques, electron-microscopical and histologic studies, and molecular and serologic assays, in an attempt to identify a wide range of potential pathogens. None of the previously described respiratory pathogens were consistently identified. However, a novel coronavirus was isolated from patients who met the case definition of SARS. Cytopathological features were noted in Vero E6 cells inoculated with a throat-swab specimen. Electron-microscopical examination revealed ultrastructural features characteristic of coronaviruses. Immunohistochemical and immunofluorescence staining revealed reactivity with group I coronavirus polyclonal antibodies. Consensus coronavirus primers designed to amplify a fragment of the polymerase gene by reverse transcription-polymerase chain reaction (RT-PCR) were used to obtain a sequence that clearly identified the isolate as a unique coronavirus only distantly related to previously sequenced coronaviruses. With specific diagnostic RT-PCR primers we identified several identical nucleotide sequences in 12 patients from several locations, a finding consistent with a point-source outbreak. Indirect fluorescence antibody tests and enzyme-linked immunosorbent assays made with the new isolate have been used to demonstrate a virus-specific serologic response. This virus may never before have circulated in the U.S. population. A novel coronavirus is associated with this outbreak, and the evidence indicates that this virus has an etiologic role in SARS. Because of the death of Dr. Carlo Urbani, we propose that our first isolate be named the Urbani strain of SARS-associated coronavirus.New England Journal of Medicine 06/2003; 348(20):1953-66. · 51.66 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The severe acute respiratory syndrome (SARS) has recently been identified as a new clinical entity. SARS is thought to be caused by an unknown infectious agent. Clinical specimens from patients with SARS were searched for unknown viruses with the use of cell cultures and molecular techniques. A novel coronavirus was identified in patients with SARS. The virus was isolated in cell culture, and a sequence 300 nucleotides in length was obtained by a polymerase-chain-reaction (PCR)-based random-amplification procedure. Genetic characterization indicated that the virus is only distantly related to known coronaviruses (identical in 50 to 60 percent of the nucleotide sequence). On the basis of the obtained sequence, conventional and real-time PCR assays for specific and sensitive detection of the novel virus were established. Virus was detected in a variety of clinical specimens from patients with SARS but not in controls. High concentrations of viral RNA of up to 100 million molecules per milliliter were found in sputum. Viral RNA was also detected at extremely low concentrations in plasma during the acute phase and in feces during the late convalescent phase. Infected patients showed seroconversion on the Vero cells in which the virus was isolated. The novel coronavirus might have a role in causing SARS.New England Journal of Medicine 06/2003; 348(20):1967-76. · 51.66 Impact Factor
Cross-host evolution of severe acute respiratory
syndrome coronavirus in palm civet and human
Huai-Dong Songa,b, Chang-Chun Tub,c, Guo-Wei Zhanga,b, Sheng-Yue Wangb,d, Kui Zhenge, Lian-Cheng Leic,
Qiu-Xia Chene, Yu-Wei Gaoc, Hui-Qiong Zhoue, Hua Xiangc, Hua-Jun Zhengd, Shur-Wern Wang Chernf, Feng Chenga,
Chun-Ming Pana, Hua Xuanc,g, Sai-Juan Chena,g, Hui-Ming Luob,e, Duan-Hua Zhoub,h, Yu-Fei Liuh, Jian-Feng Hee,
Peng-Zhe Qinh, Ling-Hui Lie, Yu-Qi Reni, Wen-Jia Liange, Ye-Dong Yui, Larry Andersonf, Ming Wangg,h, Rui-Heng Xue,g,
Xin-Wei Wub,h, Huan-Ying Zhengb,e, Jin-Ding Chenb,j, Guodong Liangk, Yang Gaoh, Ming Liaoj, Ling Fange, Li-Yun Jiangh,
Hui Lie, Fang Chenh, Biao Dih, Li-Juan Heh, Jin-Yan Line,g, Suxiang Tongf,g, Xiangang Kongg,l, Lin Dug,h, Pei Haob,m,n,
Hua Tangb,o, Andrea Berninib,p, Xiao-Jing Yum, Ottavia Spigap, Zong-Ming Guon, Hai-Yan Pann, Wei-Zhong Hen,
Jean-Claude Manuguerraq, Arnaud Fontanetq, Antoine Danchinq, Neri Niccolaig,p, Yi-Xue Lig,m,n, Chung-I Wug,o,
and Guo-Ping Zhaod,m,r,s
aState Key Laboratory for Medical Genomics?Po ˆle Sino-Franc ¸ais de Recherche en Sciences du Vivant et Ge ´nomique, Ruijin Hospital Affiliated to Shanghai
Second Medical University, 197 Rui Jin Road II, Shanghai 200025, China;cChangchun University of Agriculture and Animal Sciences, Changchun 130062,
China;dChinese National Human Genome Center, 250 Bi Bo Road, Zhang Jiang High Tech Park, Shanghai 201203, China;eGuangdong Center for Disease
Control and Prevention, 176 Xingangxi Road, Guangzhou 510300, Guangdong, China;fCenters for Disease Control and Prevention, 1600 Clifton Road,
Atlanta, GA 30333;hGuangzhou Center for Disease Control and Prevention, 23 Third Zhongshan Road, Guangzhou 510080, Guangdong, China;iGuangdong
Provincial Veterinary Station of Epidemic Prevention and Supervision, Guangzhou 510230, China;jCollege of Veterinary Medicine, South China Agriculture
University, Guangzhou 510246, China;kNational Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention,
Beijing 100052, China;lNational Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agriculture
Sciences, Harbin 150001, China;mBioinformation Center?Institute of Plant Physiology and Ecology?Health Science Center, Shanghai Institutes for Biological
Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China;nShanghai Center for Bioinformation Technology, 100 Qinzhou Road,
Shanghai 200235, China;oDepartment of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637;pBiomolecular Structure
Research Center and Department of Molecular Biology, University of Siena, Via A. Fiorentina 1, I-53100 Siena, Italy;qInstitut Pasteur, 25, Rue du Docteur
Roux, 75724 Paris Cedex 15, France; andrState Key Laboratory of Genetic Engineering?Department of Microbiology, School of Life Science, Fudan
University, 220 Handan Road, Shanghai 200433, China
Communicated by Zhu Chen, Shanghai Institute of Hematology, Shanghai, People’s Republic of China, December 22, 2004 (received for review
November 20, 2004)
The genomic sequences of severe acute respiratory syndrome
coronaviruses from human and palm civet of the 2003?2004 out-
break in the city of Guangzhou, China, were nearly identical.
Phylogenetic analysis suggested an independent viral invasion
from animal to human in this new episode. Combining all existing
data but excluding singletons, we identified 202 single-nucleotide
variations. Among them, 17 are polymorphic in palm civets only.
The ratio of nonsynonymous?synonymous nucleotide substitution
in palm civets collected 1 yr apart from different geographic
locations is very high, suggesting a rapid evolving process of viral
proteins in civet as well, much like their adaptation in the human
host in the early 2002–2003 epidemic. Major genetic variations in
some critical genes, particularly the Spike gene, seemed essential
for the transition from animal-to-human transmission to human-
to-human transmission, which eventually caused the first severe
acute respiratory syndrome outbreak of 2002?2003.
syndrome (SARS) demonstrated the power deriving from co-
ordinate integration of clinical investigation and molecular
virology (1–4). SARS-CoV-like virus was isolated from a few
Himalayan palm civets (Paguma larvata) and a raccoon dog
(Nyctereutes procyonoides) at a Shenzhen food market during the
SARS epidemic of 2002–2003 (May 7 and 8, 2003). Their
genomic sequences displayed 99.8% identity with that of the
human SARS-CoV (5). Together with the evidence of a signif-
icant high ratio of positive cases bearing the anti-SARS-CoV
was first proposed. Meanwhile, molecular epidemiological ap-
proaches were effectively conducted for better understanding
7). Characteristic genotypes were identified for viruses of dif-
ferent transmitting lineages, and the disease episodes were
categorized into different epidemiological phases based on the
combination of classical epidemiology analysis and molecular
he prompt identification of a novel human coronavirus
(CoV) as the etiologic agent of severe acute respiratory
phylogeny analysis using well represented viral genomic se-
quences. It was particularly interesting that critical intermediate
single-nucleotide variations (SNVs) were found among isolates
collected between connective phases along with their transmis-
sion paths. It also strongly suggested an animal origin of the
human SARS-CoV and its viral adaptation to human hosts (7).
However, direct evidence of animal-to-human infection has yet
to be provided, and the molecular mechanism that enabled the
virus to switch hosts has not been investigated.
After the first epidemic of SARS ended in July 2003, as
announced by the World Health Organization (WHO) (www.
who.int?csr?don?2003?07?05?en), scattered new cases were re-
ported. Unlike the cases of laboratory infections reported from
Singapore (www.who.int?csr?don?2003?09?24?en), Taiwan
(www.who.int?csr?don?2003?12?17?en), and Beijing (www.
who.int?csr?don?2004?05?18a?en), the four confirmed SARS
patients of the 2003–2004 episode in the city of Guangzhou,
China, were all community-infected cases without obvious
human-to-human contact history related to SARS (see Materials
and Methods). In this report, using the sequence data of viruses
obtained from these human patients as well as from palm civets
collected at the same period in the same region, we were able to
Abbreviations: SARS, severe acute respiratory syndrome; CoV, coronavirus; SNV, single-
nucleotide variation; WHO, World Health Organization; CDSs, coding DNA sequences; S,
Spike; MRCA, most recent common ancestor; Ks, number of synonymous substitution per
synonymous site; A?S, ratio of nonsynonymous?synonymous substitution numbers.
database (accession numbers are listed in Table 1, which is published as supporting
information on the PNAS web site).
bH.-D.S., C.-C.T., G.-W.Z., S.-Y.W., H.-M.L., D.-H.Z., X.-W.W., H.-Y.Z., J.-D.C., P.H., H.T., and
A.B. contributed equally to this work in performing the research.
gH.X., S.-J.C., M.W., R.-H.X., J.-Y.L., S.T., X.K., L.D., N.N., Y.-X.L., and C.-I.W. contributed
equally to this work in organizing the research.
sTo whom correspondence should be addressed. E-mail: email@example.com.
© 2005 by The National Academy of Sciences of the USA
February 15, 2005 ?
vol. 102 ?
no. 7 www.pnas.org?cgi?doi?10.1073?pnas.0409608102
CoV over a short period. This is an essential step for under-
standing the genetic process of the adaptation of an animal virus
to a human host.
Materials and Methods
Epidemiological Investigation and Sample Collection. Official epi-
demiological records about SARS cases occurring during the
2003–2004 period from both the Guangdong Center for Disease
Control and Prevention and the Guangzhou Center for Disease
Control and Prevention were reviewed. These records were
matched with the information released by WHO (www.who.int?
Human patient samples were collected by the virologists of the
Guangzhou Center for Disease Control and Prevention. Palm
civet samples from the animal cage of the restaurant TDLR were
collected by WHO experts, whereas those from the Guangzhou
food market were collected by virologists of the SARS Consor-
tium of the Minister of Agriculture of the Chinese Central
Sequencing Strategy and Procedures. The sequencing strategy was
basically the same as described (7). However, all new sequences
we obtained during this study were derived directly from RT-
PCR products of specimens from individual human patients or
animals (or their cages, as indicated). Another set of nested PCR
primers with shorter genomic fragments being amplified was
used when the regular primer set failed to amplify the corre-
sponding genomic regions. This strategy was successful in ob-
taining more genomic DNA fragments being amplified for
sequencing. All of the International Nucleotide Sequence Da-
tabase Collaboration?GenBank accession nos. for SARS-CoV
sequences analyzed in the text are listed in Table 1, which is
published as supporting information on the PNAS web site.
In Silico Sequence Analysis. Whole-genome sequence alignments
were generated by using CLUSTALW, Ver. 1.83 (www.ebi.ac.uk?
clustalw) with the default DNA weight matrix for the 96 SARS-
CoV genomic sequences analyzed in this study (91 from human
alignment analysis of Spike (S) genes from 14 animal samples (in
addition to the five sequences from palm civet host used in
whole-genome sequence alignment, seven sequences from other
palm civet samples of the Guangzhou food market and two
sequences, SZ1 and SZ13, from palm civet samples of the 2002–
2003 epidemic were added) and 92 human SARS-CoV sequences.
Compared with the 91 sequences used in the whole genome
patient (GZ03-01) (7) and the newly sequenced S gene from the
third patient (GZ03-03) of the 2003–2004 epidemic were added,
whereas the GZ-D of the 2002–2003 epidemic was deleted due to
the incompleteness of the sequence. The scoring algorithm used to
determine the variant loci characteristic of the SARS-CoV geno-
major groups was previously described (7), and the outcome of this
analysis is listed in Table 2, which is published as supporting
information on the PNAS web site.
For purposes of illustration, we adopted the following
nomenclature as shown in Fig. 1: PC for palm civet and HP for
human patient. Both of them were suffixed with 03 or 04 to
specify the 2002–2003 or 2003–2004 epidemics, respectively.
Furthermore, the HP03 events are followed by E, M, or L,
representing the early, middle, or late phases of the 2002–2003
Analysis of the Phylogenetic Relationship Among Different Transmis-
sion Lineages of the Early Samples of SARS-CoV Sequences. The
consensus genomic nucleotide sequences for groups PC04,
HP04, PC03 and individual transmission lineages of HP03E
(GZ, HSZ, and ZS) were used to construct the neighbor-
joining tree (8). Tajima’s relative rate test (9) was then
performed to see whether there is significant difference
between the distance from PC03 to PC04 and that from PC04
Calculating the Average Number of Nucleotide Difference D Between
Two Sample Groups. We used n1and n2to denote the sample sizes
for groups 1 and 2. All of the singleton sites were excluded for
the sequences between the two groups. The total number of the
nucleotide difference Di,j(i ? 1, . . . , n1; j ? 1, . . . , n2) was then
calculated for two genome sequences, i and j, one from each
Analysis of the Three Most Significantly Variable Protein Coding DNA
Sequences (CDSs), S, sars3a, and nsp3, Among Palm Civets and Human
Patients of the Two Epidemics.The phylogenetic tree of sequences
in the four groups (PC03, PC04, HP03E, and HP04) of each
gene was first constructed by the neighbor-joining method (8).
Given the tree, we used maximum-likelihood analysis (10) for
codon substitutions to estimate the number of nonsynonymous
and synonymous changes in each branch as well as their rate
ratio ? (? dN?dS) (10). The codon-substitution model (11)
accounts for the genetic code structure, transition?
transversion rate bias, and different base frequencies at
each codon position. In the likelihood analysis, we applied the
most general model, which implies an independent dN?dS
ratio for each branch in the phylogeny (10). An ? value ?1
is usually taken as evidence for the signature of positive
Statistical Analysis for Estimation of the Neutral Mutation Rate and
the Date for the Most Recent Common Ancestor (MRCA) and Con-
struction of Rooted Phylogenetic Tree. The Pamilo–Bianchi–Li
model was used to calculate the number of synonymous
substitutions per synonymous site, Ks, for the concatenated
five known major coding sequences (orf1ab, S, E, M, and N)
of SARS-CoV, as we did previously (7). Taking GZ02 (7), the
reference sequence of the HP03 epidemic, as the outgroup, the
Kss were calculated for two PC03 SARS-CoV (SZ16 and SZ3),
three PC04 SARS-CoV (PC4-136, PC4-227, PC4-13), and two
HP04 sequences (GZ03-01 and GZ03-02), to estimate the
neutral mutation rate.
Based on the plot of Fig. 2, the intercept (?0) of the fitted line
is 0.0007806, with the corresponding sampling date 0, which is
the end of year 2002. Let T denote the number of days ahead of
January 1, 2003, for the MRCA of the PC03 and HP03 groups.
Because we used the GZ02 as an outgroup whose sampling date
is February 11, 2003 (i.e., 42 days after January 1, 2003), the
estimated T will be T?(?ˆ0??ˆ1?42)?2?28(days), which is
equivalent to early December 2002.
The Ks between SZ16 and SZ3 is 0.001585. Therefore, the
estimated date of MRCA for PC03 group is around the end of
January 2003. (0.001585?0.000008?2 ? 99 days ahead of May 7,
2003, which is the sampling date for SZ16 and SZ3).
The Ks between SZ16 and PC4-136, PC4-227, and PC4-13 are
0.003785, 0.003752, and 0.003782, respectively. Therefore, the
estimated date of MRCA for PC03 and PC04 is ?(0.00378?
0.000008-244)?2 ? 114 days ahead of May 7, 2003, which
corresponds to the middle of January 2003.
Based on these estimates, a rooted phylogenetic tree for
Song et al.PNAS ?
February 15, 2005 ?
vol. 102 ?
no. 7 ?
SARS-CoV isolates from palm civet (PC03 and PC04) and early
human patients (HP03E and HP04) is constructed (Fig. 3).
Estimating the Coevolution Coefficients Among SARS-CoV Proteins
(Identified and Hypothetical) Based on Amino Acid Substitution Rates.
The value of the linear correlation coefficient (r) of the amino acid
substitution rates between two proteins of SARS-CoV indicates
their level of coevolution (13). We first conducted multiple se-
quence alignment for each of the SARS proteins (among 72
samples with 21 assigned or predicted protein CDSs without gap in
the coding areas) and then used them to build matrices containing
the distances between all possible protein pairs. Distances were
the McLachlan amino acid homology matrix (14). The outcome of
this study is listed in Table 3, which is published as supporting
information on the PNAS web site.
Results and Discussion
Contact History and Clinical Symptoms of the Four Confirmed SARS
Patients (2003–2004) Provide Direct Evidence of Animal-to-Human
Infection. The epidemiology information collected by the Guang-
dong Center for Disease Control and Prevention and the Guang-
zhou Center for Disease Control and Prevention indicated that
between December 16, 2003, and January 8, 2004, a total of four
patients were independently hospitalized in the city of Guang-
zhou, Guangdong Province, China, with flu-like syndromes later
diagnosed as confirmed SARS cases (see Materials and Meth-
ods). Although none of these patients had a contact history with
the other previously documented SARS cases, they all had direct
or indirect contact history with wild animals in geographically
restricted areas. The second patient worked in a local restaurant,
TDLR, and the fourth patient dined in the same restaurant
where palm civet and other exotic dishes were served, whereas
SNVs and deletions of 91 sequences from the human patient-derived viruses (HP) and five sequences from the palm civet-derived viruses (PC) (A) and a
neighbor-joining (N-J) tree for the consensus nucleotide sequences of PC and early individual transmission lineages of HP (B). In A, the division of the clusters
and the corresponding nomenclatures was based on both the hosts of the viruses and the phases of the epidemic (7) (Table 2). The map distance between
individual sequences represents the extent of genotypic difference. To highlight the variations between two neighboring clusters, the number of SNVs [total
(synonymous, nonsynonymous causing drastic amino acid changes)] occurring among the genomic sequences of both groups and the average number of
nucleotide difference D between the two sample groups (see Materials and Methods) were shown in the boxes. Besides the SNVs of the whole genome (Total),
those occurring in ORF1AB (particularly in ORF1A, which is part of Orf1ab), S, and sars3a are listed in the same manner as the total SNVs. These SNVs were present
in at least two independent samples of all the sequences used for this analysis. In B, consensus nucleotide sequences were derived from each PC and HP data set.
For HP03E, consensus nucleotide sequences were individually derived from three primary transmission lineages, based on their direct epidemiological
connections and high genomic sequence similarities, and were represented as HP03EGZ (Guangzhou), HP03EHSZ (Shenzhen), and HP03EZS (Zhongshan). These
six consensus nucleotide sequences were used to construct the N-J tree (8) in MEGA2 (23), and the Kimura 2-parameter model was assumed. The branch lengths
are the estimates of genetic distances.
www.pnas.org?cgi?doi?10.1073?pnas.0409608102 Song et al.
the third patient dined in a neighboring restaurant, SJR. These
restaurants are located near two major hospitals in Guangzhou
where many SARS patients were treated in the previous epi-
demic, and the first patient, the only patient with no contact with
index patient also contacted house rats in his apartment a few
days before disease onset. It is important to emphasize that,
unlike most SARS patients during the 2002–2003 epidemic,
these four new patients clinically presented very mild symptoms,
and neither of them had close contacts who were infected (15).
Genomic Sequences of SARS-CoV from both the Human Patients and
the Market Palm Civet of the 2003–2004 Outbreak Are Almost Iden-
tical. Among the specimens collected during the 2003–2004
outbreak in Guangzhou (see Materials and Methods), we were
able to sequence nearly completely the SARS-CoV viral genome
from the first two of the four human patients, the two palm civets
of the Guangzhou food market, and one sample from the palm
civet cage at the restaurant TDLR. These genomic sequences
were characterized and phylogenetically analyzed by comparison
with 89 human SARS-CoV and two SARS-like-CoV sequences
from the Himalayan palm civets available at GenBank as of the
end of September 2004, using the in silico analysis methodology
adopted previously. A total of 202 SNVs with multiple occur-
rences were identified, among which 200 were in the CDSs.
Among the 128 nonsynonymous mutations, 89 led to a predicted
radical amino acid changes (Table 2 and Fig. 1A).
Besides the individual sequence-based analysis, we further
analyzed the data based on comparisons between groups of
the analytical methods are described in Materials and Methods,
an abbreviated nomenclature will be redefined in the text on first
All of the HP04 and PC04 (human patient and palm civet,
2003–2004) SARS-CoV isolates retained the 29-nt segment
marker in orf8a as in the viruses of PC03 (palm civet 2002–2003)
and the Guangzhou primary transmission lineages of HP03E
(human patient 2002–2003, early phase). The genomes of the
SARS-CoV from HP04 were almost identical to those of the
SARS-CoV-like viruses from PC04 (Fig. 1A). There were 33
SNVs detected among the viral genomic sequences from PC04
and HP04, which accounts for 0.11% of the viral genome. The
average total number of nucleotide differences in the whole
genome between the two groups is 20.33. In contrast, between
genomic sequences of HP03E and PC03, the average number of
nucleotide differences is 39.5, and a total of 77 SNVs was
detected, accounting for nearly 0.26% of the viral genome (Fig.
1A). Although 17 of the 202 SNVs were polymorphic in the palm
civets only, no signature SNVs are shared by all members of palm
civet isolates distinguishable from all members of the human
isolates (Table 2).
The phylogenetic relationship among different transmission
lineages of the early samples of SARS-CoV sequences were also
analyzed on the basis of consensus of each epidemiological
phase?primary transmission lineage (7) (Fig. 1B). Tajima’s
relative rate test was performed based on the phylogenetic
analysis of consensus nucleotide sequence of PC04 as the root,
the ?2was 39.72 with one degree of freedom (P ? 0.000), i.e., the
distance between PC04 and PC03 is significantly larger than that
between PC04 and HP04. Thus, structurally, there is little
difference to distinguish the genomic sequences of the SARS-
CoV and SARS-CoV-like viruses and functionally, concerning
the animal contact history of the current patients, it is likely that
the same virus can infect both palm civet and human.
The Estimation of the Neutral Mutation Rate and the Date for the
MRCA Illustrated the Evolving SARS-CoV in both Palm Civet and
Human. We used the concatenated five major CDSs (orf1ab, S, E,
M, and N) of SARS-CoV from PC03, PC04, and HP04 to estimate
the neutral mutation rate during SARS-CoV transmission in palm
civets and HP04 (Fig. 2). The total length of the concatenated
sequence accounts for 91.25% of the whole genome. The estimate
turned out to be ?8.00 ? 10?6nt–1?day–1, which is almost the same
group (8.26 ? 10?6nt–1?day–1) (7). These two independent esti-
mates are almost identical, and thus it supports well the previous
this new estimate should be more accurate. This relatively long-
term evolutionary analysis once again strongly suggested that
SARS-CoV evolves at a relatively constant neutral rate both in
human and palm civet. Furthermore, the date estimates of the
MRCAs for PC03, HP03E, PC04, and HP04 were obtained (see
site, Ks, for the concatenated coding sequences vs. the sampling dates. The Ks
calculation and samples used are described in Materials and Methods. The
sampling dates are measured as the number of days away from Jan. 1, 2003.
The slope (?1) of the fitted line from the linear regression model gives the
estimation of the neutral rate, 8.00 ? 10?6per site per day.
A plot of the number of synonymous substitutions per synonymous
(PC03 and PC04) and early human patients (HP03E and HP04) based on MRCA
estimations. All data are described in Materials and Methods except that for
HP03E, which was from previous work (7). The branch length is proportional
to the time interval.
A rooted phylogenetic tree for SARS-CoV isolates from palm civet
Song et al. PNAS ?
February 15, 2005 ?
vol. 102 ?
no. 7 ?
Materials and Methods), which enabled us to derive a rooted
phylogenetic tree (Fig. 3). It clearly indicated that PC03 and PC04
are not in the same primary transmission lineage. The viral trans-
instances. PC03 and PC04 further diverged around January 2003,
long divergence time since their MRCA, it is no surprise to observe
an average of 70.83 total nucleotide difference between the viral
genome PC03 and PC04 (Fig. 1A), higher than that observed for
PC03 and HP03E (see above). Because a higher viral load of PC04
was suggested in palm civets from Guangzhou food market during
the 2003–2004 outbreak based on the fact that it was much easier
to obtain SARS-CoV samples for genomic sequencing than that
during the 2002–2003 epidemic (laboratory experience; C.-C.T.,
H.X., and J.-D.C.), PC04 might have evolved to be more virulent
in or better adapted to palm civet. This further demonstrated that
SARS is a zoonotic disease from still-unknown origin that has been
evolving not only in human but also in palm civet hosts.
The Three Most Significantly Variable Protein CDSs, S, sars3a, and
nsp3, Evolved Differently Among Palm Civets and Human Patients of
the Two Epidemics. The phylogeny relationship among palm civets
and human patients of the two epidemics was further analyzed
by using the maximum-likelihood method (10) based on the
three most significantly variable CDSs, S, sars3a, and nsp3 (Fig.
4). In the S gene (Fig. 4A), from the ancestor node of PC03 to
the node of PC04, the ratio of nonsynonymous?synonymous
substitution numbers (A?S) is 18.2?2.1, i.e., ? ? 2.68 (? ?
dN?dS: ratio of nonsynonymous and synonymous rates), indi-
cating a positive selection pressure during animal-to-animal
transmission. Furthermore, the ancestor nodes of PC04 and
HP04 in the S gene were the same, indicating that unlike during
the 2002?2003 epidemic, HP04 viruses did not have a chance to
diverge for enough time, although in the patient GZ03-02, they
already accumulated some amino acid changes (A?S ? 6?1). In
contrast, the A?S from the ancestor node of PC03 to the node
of HP03E in S gene is 11.8?0, which corresponds to ? ? ? (no
synonymous variations). This is consistent with our previous
conclusion that, during the virus transmission from palm civet to
human, the S gene experienced strong positive selection and
improvement to adapt to its human host. Within the HP03E, in
most branches, we observed a very high A?S, again suggesting
that the S gene was still evolving, having not yet reached its
maximum adaptation to human.
It has been shown that the sars3a CDS encodes a minor
structural protein associated with the S protein on the surface of
the SARS-CoV viral envelope (16). Interestingly, the sars3a
CDS evolved in synergy with the S protein (Table 3). Therefore,
it is no surprise that it evolved adaptively, as did the S gene, as
a trifurcating tree for the four epidemic groups (Fig. 4B). The
A?S is 4?0 between PC03 and HP03E, 4?0 between PC03 and
PC04 (HP04), and 6?0 between HP03E and HP04. In contrast,
there is no single variation among palm civets and human beings
of the current epidemic. Although the coevolving process be-
tween S and sars3a is likely due to the need of maintaining their
necessary interaction, amino acid changes in the sars3a protein
might also be critical, as are those in the S protein, to modulate
the host switch of SARS-CoV.
The phylogenetic tree of nsp3 is largely different from that of
S or sars3a (Fig. 4C). The PC03 is very close to HP03E but
relatively more divergent from those of new cases. This suggests
for the S and sars3a genes. In the lineage connecting the ancestor
node of HP03E and HP04 (or PC04), the A?S is only 4.1?6.2
(? ? 0.227), which does not show any positive selection signa-
ture. It is worth pointing out that in the new cases, there is one
mutation at nucleotide 6295 leading to a stop codon in the nsp3
CDS of the orf1a. Considering the unique alterations of nsp3
CDS structure in SARS-CoV compared with other CoVs (17),
we propose this special mutation might account for the mild
clinical symptoms and apparent weak infectivity of this episode.
Major Genetic Variations in the S Gene Seem Essential for the
Transition from Animal-to-Human Transmission to Human-to-Human
Transmission. The S protein is responsible for binding to the
angiotensin-converting enzyme 2 (ACE2) receptor (18) and thus is
the fastest-evolving protein of SARS-CoV in the epidemic from
animal to human. Besides the S gene sequences available from
whole-genome data (Table 2, except for GZ-D of the 2002–2004
epidemic, which was deleted due to the incompleteness of the
sequence), we were able to add more S gene sequences for
from palm civet samples of the Guangzhou food market, all for the
2003–2004 epidemic. Two sequences, SZ1 and SZ13, from palm
civet samples of the 2002–2003 epidemic publicly available were
also included in the analysis. Because the 3D structure of the S
protein was successfully simulated (Protein Data Bank ID code
1T7G) (19), it was used for a better understanding of the molecular
mechanism driving the mutations of the S gene over the course of
on the PNAS web site, lists all 49 SNVs observed in ?1 of the 103
S CDS sequences, i.e., two more SNVs observed than that using
whole-genome sequences (Table 2), because more sequences were
added for the analysis. One of them (nucleotide 22220) causes
synonymous variation in the amino acid residue 243D, which is
predicted to be partially exposed at the top of the S1 domain,
new cases of the 2003–2004 outbreak. Samples of the early cases of the 2002–2003 epidemic were selected based on two criteria: the completeness of the
sequences and their representativity in each of the epidemiology lineages. The two numbers shown along each branch are the maximum-likelihood estimates
ratio is assumed for each branch. The branch length is proportional to the total number of estimated synonymous and nonsynonymous substitutions occurring
in that branch.
Phylogeny of the most variable genes, S (A), sars3a (B), and nsp3 (C) in the SARS-CoV samples from the early cases of the epidemic 2002–2003 and the
www.pnas.org?cgi?doi?10.1073?pnas.0409608102Song et al.