Proteomic fingerprints for potential application to early diagnosis of severe acute respiratory syndrome.
ABSTRACT Definitive early-stage diagnosis of severe acute respiratory syndrome (SARS) is important despite the number of laboratory tests that have been developed to complement clinical features and epidemiologic data in case definition. Pathologic changes in response to viral infection might be reflected in proteomic patterns in sera of SARS patients.
We developed a mass spectrometric decision tree classification algorithm using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Serum samples were grouped into acute SARS (n = 74; <7 days after onset of fever) and non-SARS [n = 1067; fever and influenza A (n = 203), pneumonia (n = 176); lung cancer (n = 29); and healthy controls (n = 659)] cohorts. Diluted samples were applied to WCX-2 ProteinChip arrays (Ciphergen), and the bound proteins were assessed on a ProteinChip Reader (Model PBS II). Bioinformatic calculations were performed with Biomarker Wizard software 3.1.1 (Ciphergen).
The discriminatory classifier with a panel of four biomarkers determined in the training set could precisely detect 36 of 37 (sensitivity, 97.3%) acute SARS and 987 of 993 (specificity, 99.4%) non-SARS samples. More importantly, this classifier accurately distinguished acute SARS from fever and influenza with 100% specificity (187 of 187).
This method is suitable for preliminary assessment of SARS and could potentially serve as a useful tool for early diagnosis.
- [Show abstract] [Hide abstract]
ABSTRACT: Gastric cancer is one of the leading causes of tumor-related deaths in China. The tumor, node, metastasis (TNM) classification system is useful for predicting clinical prognosis of patients with gastric cancer. However, determining the presence of lymph node involvement in the early stages of gastric cancer is difficult without biopsy. Therefore, it is necessary to identify novel serum biomarkers for TNM cancer staging and prognostic follow-up. In this study, we have reported fibrinopeptide-A (FPA) with alanine truncation at the N-terminal as a novel biomarker to differentiate gastric cancer with and without lymph node metastases. We analyzed 369 individual serum samples including gastric cancer patients without lymph node metastases (n = 33), gastric cancer patients with lymph node metastases (n = 157; confirmed by pathology), and age- and sex-matched healthy individuals (n = 179). The data showed that 85.4% of patients with lymph node metastases were positive for FPA with alanine truncation at the N-terminal (degAla-FPA, 1,465.63 Da), as determined by tandem mass spectrometry (MS). Using degAla-FPA as the biomarker, the sensitivity was 85.4% for gastric cancer patients with lymph node metastases, and the specificity was 100% for gastric cancer patients without lymph node metastases. The high sensitivity and specificity achieved with serum degAla-FPA levels indicated that MS technology could facilitate the discovery of a novel and quantitative prognostic biomarker for gastric cancer with lymph node involvement. Anat Rec, 2012. © 2012 Wiley Periodicals, Inc.The Anatomical Record Advances in Integrative Anatomy and Evolutionary Biology 02/2013; · 1.34 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI)is one of several proteomics technologies that can be used in biomarker discovery studies. Such studies often have the goal of finding protein markers that predict early onset of cancers such as cervical cancer. The reproducibility of SELDI has been shown to be an issue in the literature. There are numerous sources of error in a SELDI experiment starting with sample collection from patients to the signal processing steps used to estimate the protein mass and abundance values present in a sample. This dissertation is concerned with all aspects of signal processing related to SELDI's use in biomarker discovery projects. In chapter 2, we perform a comprehensive study of the most popular preprocessing algorithms available. Next, in chapter 3, we study the basic statistics of SELDI data acquisition. From here, we propose a quadratic variance measurement model for buffer+matrix only spectra. This model leads us to develop a modified Antoniadis-Sapatinas wavelet denoising algorithm that demonstrates superior performance when compared to MassSpecWavelet, one of the leading techniques for preprocessing SELDI data. In chapter 4, we show that the quadratic variance model 1) extends to real pooled cervical mucus QC data from a clinical study, 2) predicts behavior and reproducibility of peak heights, and 3) finds four times as many reproducible peaks as the vendor-supplied preprocessing programs. The quadratic variance measurement model for SELDI data is fundamental and promises to lead to improved techniques for analyzing the data from clinical studies using this instrument.01/2010;
- [Show abstract] [Hide abstract]
ABSTRACT: Neural tube defects (NTDs) are common birth defects, whose specific biomarkers are needed. The purpose of this pilot study is to determine whether protein profiling in NTD-mothers differ from normal controls using SELDI-TOF-MS. ProteinChip Biomarker System was used to evaluate 82 maternal serum samples, 78 urine samples and 76 amniotic fluid samples. The validity of classification tree was then challenged with a blind test set including another 20 NTD-mothers and 18 controls in serum samples, and another 19 NTD-mothers and 17 controls in urine samples, and another 20 NTD-mothers and 17 controls in amniotic fluid samples. Eight proteins detected in serum samples were up-regulated and four proteins were down-regulated in the NTD group. Four proteins detected in urine samples were up-regulated and one protein was down-regulated in the NTD group. Six proteins detected in amniotic fluid samples were up-regulated and one protein was down-regulated in the NTD group. The classification tree for serum samples separated NTDs from healthy individuals, achieving a sensitivity of 91% and a specificity of 97% in the training set, and achieving a sensitivity of 90% and a specificity of 97% and a positive predictive value of 95% in the test set. The classification tree for urine samples separated NTDs from controls, achieving a sensitivity of 95% and a specificity of 94% in the training set, and achieving a sensitivity of 89% and a specificity of 82% and a positive predictive value of 85% in the test set. The classification tree for amniotic fluid samples separated NTDs from controls, achieving a sensitivity of 93% and a specificity of 89% in the training set, and achieving a sensitivity of 90% and a specificity of 88% and a positive predictive value of 90% in the test set. These suggest that SELDI-TOF-MS is an additional method for NTDs pregnancies detection.PLoS ONE 01/2014; 9(7):e103276. · 3.53 Impact Factor
Proteomic Fingerprints for Potential Application
to Early Diagnosis of Severe Acute
Xixiong Kang,1Yang Xu,2Xiaoyi Wu,3Yong Liang,4Chen Wang,5Junhua Guo,2
Yajie Wang,1Maohua Chen,13Da Wu,3Youchun Wang,7Shengli Bi,8Yan Qiu,9
Peng Lu,10Jing Cheng,11Bai Xiao,6Liangping Hu,15Xing Gao,12Jingzhong Liu,6
Yiping Wang,3Yingzhao Song,3Liqun Zhang,3Fengshuang Suo,1Tongyan Chen,1
Zeyu Huang,1Yunzhuan Zhao,1Hong Lu,1Chunqin Pan,4and Hong Tang14*
Background: Definitive early-stage diagnosis of severe
acute respiratory syndrome (SARS) is important despite
the number of laboratory tests that have been developed
to complement clinical features and epidemiologic data
in case definition. Pathologic changes in response to
viral infection might be reflected in proteomic patterns
in sera of SARS patients.
Methods: We developed a mass spectrometric decision
tree classification algorithm using surface-enhanced la-
ser desorption/ionization time-of-flight mass spectrom-
etry. Serum samples were grouped into acute SARS (n ?
74; <7 days after onset of fever) and non-SARS [n ?
1067; fever and influenza A (n ? 203), pneumonia (n ?
176); lung cancer (n ? 29); and healthy controls (n ?
659)] cohorts. Diluted samples were applied to WCX-2
ProteinChip arrays (Ciphergen), and the bound proteins
were assessed on a ProteinChip Reader (Model PBS II).
Bioinformatic calculations were performed with Bi-
omarker Wizard software 3.1.1 (Ciphergen).
Results: The discriminatory classifier with a panel of
four biomarkers determined in the training set could
precisely detect 36 of 37 (sensitivity, 97.3%) acute SARS
and 987 of 993 (specificity, 99.4%) non-SARS samples.
More importantly, this classifier accurately distin-
guished acute SARS from fever and influenza with
100% specificity (187 of 187).
Conclusions: This method is suitable for preliminary
assessment of SARS and could potentially serve as a
useful tool for early diagnosis.
© 2005 American Association for Clinical Chemistry
Since November 1, 2002, severe acute respiratory syn-
drome (SARS)16has affected 32 countries and regions,
with 8422 reported probable cases, 916 deaths, and local
transmission in at least 6 countries (1). Collective efforts
have been made to identify its epidemiologic determinant
as a novel member of Coronaviridae, SARS-associated
coronavirus (SARS-CoV) (2–6), and etiologic experiments
in cynomolgus macaques have confirmed the virus as the
1Center for Laboratory Diagnosis, Beijing Tiantan Hospital and Capital
University of Medical Sciences, Beijing, China.
2Ciphergen Biosystems, Inc., Beijing, China.
3Deyi Diagnosis Institute, Beijing, China.
4Taizhou Municipal Hospital, Taizhou, Zhejiang Province, China.
5Institute of Respiratory Medicine and6Basic Medical Research Center,
Chaoyang Hospital and Capital University of Medical Science, Beijing, China.
7Department of Cell Biology, National Institute for the Control of Phar-
maceutical and Biological Products (NICPBP), Beijing, China.
8Institute of Virology, Chinese Academy of Preventive Medicine, Beijing,
9Department of Quality Control, Beijing Red Cross Blood Center, Beijing,
10Society of Blood Transfusion, Beijing, China.
11National Engineering Research Center for Beijing Biochip Technology,
Tsinghua University, Beijing, China.
12Beijing Center for Disease Control and Prevention, Beijing Bureau of
Public Health, Beijing, China.
13Department of Neurosurgery, The Affiliated Hospital of Xuzhou Med-
ical College, Jiangsu Province, China.
14Center for Molecular Immunology, Institute of Microbiology, Chinese
Academy of Sciences, Beijing, China.
15Consulting Center of Biomedical Statistics, Academy of Military Medical
Sciences, Beijing, China.
*Address correspondence to this author at: Center for Molecular Immu-
nology, Chinese Academy of Sciences, 13 Zhongguancun Bei Yi Tiao, PO Box
2714, Beijing, China 100080. Fax 86-10-62638849; e-mail email@example.com.
Received February 9, 2004; accepted October 21, 2004.
Previously published online at DOI: 10.1373/clinchem.2004.032458
16Nonstandard abbreviations: SARS, severe acute respiratory syndrome;
CoV, coronavirus; SELDI-TOF MS, surface-enhanced laser desorption/ioniza-
tion time-of-flight mass spectrometry; and PBS, Protein Biological System.
Clinical Chemistry 51:1
causative agent for SARS (7, 8). Rapid progress has also
been made in the determination of its genome sequences
(9–11) and the molecular evolution of the coronavirus
(12). Identification of angiotensin-converting enzyme 2 as
the viral receptor provided further information toward
deciphering its molecular mechanisms of infection (13).
Despite such advances in virologic studies, early diag-
nosis of SARS has been based primarily on the clinical
definitions released by WHO and CDC (14, 15), which can
be confusing or contradictory (16). Available serologic
tests cannot guarantee an early diagnosis (17), and PCR-
based molecular detection of the viral RNA suffers from
unsatisfactory sensitivity and specificity (3, 17–19). In the
last year, failure to develop diagnostic tests for SARS,
especially in the acute phase, severely impacted specific
prevention and treatment measures for SARS. There is a
need to establish a reliable diagnostic methodology for
SARS-CoV, in particular, to distinguish the similar clinical
manifestations of SARS and other respiratory tract infec-
tions. This urgency is reinforced by the first SARS case not
linked to laboratory contamination, which occurred in
Guangdong, China this year (20).
Proteomic analysis has provided a unique tool for the
identification of diagnostic biomarkers, evaluation of dis-
ease progression, and drug development (21, 22). Surface-
enhanced laser desorption/ionization time-of-flight mass
spectrometry (SELDI-TOF MS) enables rapid, reproduc-
ible protein/peptide profiling of multiple disease-specific
biomarkers directly from crude samples (e.g., tissue cell
lysates or body fluids) (23, 24). Small amounts of sample
can be applied directly to a biochip coated with specific
chemical matrices (e.g., hydrophobic, cationic, or anionic)
or specific biochemical materials such as DNA fragments
or purified proteins. The bound proteins/peptides can
then be analyzed by MS to obtain the protein fingerprints,
or even amino acid sequence determinants, when inter-
faced to a mass spectrometric microsequencing device.
Analogous to the proteomic detection of various can-
cers (25, 26), we used a weakly cationic ProteinChip
(WCX2 chip surface) to retrospectively analyze SARS sera
to determine whether there are distinct and reproducible
protein fingerprints potentially applicable to the diagno-
sis of SARS. We established a decision tree algorithm
consisting of four unique biomarkers for acute SARS in
the training set and subsequently validated the accuracy
of this classifier by use of a completely blinded test set.
Materials and Methods
patients and samples
More than 2000 serum specimens from suspected/prob-
able SARS patients admitted to 38 major hospitals in the
Beijing area between April 14 and June 5, 2003, were
eligible for inclusion. The serum procurement, data man-
agement, and blood collection protocols were approved
by the Beijing SARS-Control Working Group and were in
accordance with WHO biosafety guidelines (27). Among
the retrospective samples, only 74 were selected from
probable patients whose blood samples were collected
with onset of fever within 7 days at the time of admission
(acute SARS patients; Table 1). Probable cases were based
on the eligibility criteria set forth by WHO (15). These
cases had also radiographic evidence of infiltrates consis-
tent with pneumonia or respiratory distress syndrome on
chest x-ray. The paired convalescent serum samples from
the SARS cohort tested positive for IgM seroconversion by
the IFA method (Beijing Genomics Institute), and four
samples also tested positive in a DNA array test using
nasopharyngeal samples. The 1067 non-SARS control se-
Table 1. Patients with acute SARS who matched the fit in WHO SARS case definition.
A, C, J, K
A, B, H, N, S, Y
D, E, H, J, K, S, T, X, Y
A, D, E, J, K, M
A, D, H, J, S, T, X, Y
C, E, J, M, O, P, S, T
aCases from April 15 to June 5, 2003, with retrospective serum samples collected ?7 days after self-described onset of symptoms. The ages of these cohorts varied
from 6 to 74 years. Each group of samples was divided into two parts for training and blinded tests.
bAbbreviations for hospitals in Beijing area: A, Civil Aviation Hospital; B, Beijing Center for Disease Control and Prevention; C, Concord Hospital; D, Dongzhimen
Hospital; E, Earth Temple Hospital; H, Chaoyang Hospital; J, Jishuitan Hospital; K, Peking University Medical School 3rd Affiliate Hospital; M, Martial Police General
Hospital; N, North Suburban Hospital; O, Osier Hospital; P, State Power Hospital; S, Shijingshan Hospital; T, Tongren Hospital; X, Jiuxianqiao Hospital; Y, Youan
cIFA, immunofluorescence assays; NA, not available.
dIncluded patients were positive for IgM seroconversion in immunofluorescence assays with the paired convalescent sera. The other information on microbiological
tests, clinical records, or treatment were not accessible because of the classified nature of the work performed by Beijing SARS-Control Working Group.
eFour included patients tested positive in a DNA chip array method (Xiao et al, manuscript in preparation) with four sets of DNA probes derived from SARS-CoV
genome coding replicase 1A (2 independent probes), spike, and nucleocapsid genes. Other patients were negative by real-time fluorescent RT-PCR of nasopharyngeal
Clinical Chemistry 51, No. 1, 2005
rum samples (Table 2) were obtained from recruited
healthy donors (n ? 659) or from patients with respiratory
infections [pneumonia (n ? 176) or high fever (n ? 203; 66
with influenza A)] or lung cancer (n ? 29). The control
samples were all negative for SARS-CoV seroconversion.
The patients and serum samples were then divided
into two groups: one for the “training” set and the other
for the blinded “test” set (Tables 1 and 2). SARS and
non-SARS control sera were all stored at ?80 °C in 30-?L
aliquots. Before each round of mass spectrometric assays,
we routinely performed quality control of serum samples
by the appearance and peak intensity of m/z 6635.09 (Fig.
3A). Because the peak intensity of m/z 6635.09 remained
relatively constant among spectra from different assays
and different instruments, it was also used for normaliza-
tion between each round of analyses.
Three different chip chemistries (hydrophobic, anionic,
and cationic) were first evaluated to determine which
affinity chemistry gave the best serum profiles in terms of
the number and resolution of proteins. The weakly cat-
ionic exchange chip (WCX) gave the best results with
mass spectra from 0 to 200 kDa. The WCX chips in an
8-well bioprocessor format (Ciphergen) were chosen to
allow a larger volume of serum for the chip array. The
bioprocessor was pretreated with 150 ?L of 100 mmol/L
sodium acetate (pH 4) on a platform shaker at 250 rpm for
5 min. The excess sodium acetate was removed by invert-
ing the bioprocessor on a paper towel. This process was
repeated twice. The serum samples were thawed on ice in
a Biosafety Level II cabinet, and 20 ?L of each sample was
mixed with 30 ?L of U9 buffer (9 mol/L urea, 10 g/L
CHAPS in phosphate-buffered saline) in a 1.5-mL Eppen-
dorf tube and vortex-mixed at 4 °C for 20 min. We then
added 100 ?L of U1 buffer [U9 buffer diluted by ninefold
(100 mL of U9 buffer plus 800 mL of Tris-HCl) with 50
mmol/L Tris-HCl (pH 7)] to the serum/urea mixture,
vortex-mixed it for 10 min, and stopped the reaction by
addition of 600 ?L of sodium acetate on ice. We applied 50
?L of the serum/urea sample to each well, and the
bioprocessor was sealed and shaken on a platform shaker
at 250 rpm for 30 min. The excess serum/urea solution
was discarded, and the bioprocessor was washed three
times with 100 mmol/L sodium acetate as described
above. The chips were removed from the bioprocessor,
washed twice with deionized water, and air-dried. Sub-
sequently 0.5 ?L of EAM sinapinic acid saturated in 500
mL/L acetonitrile–5 g/L trifluoroacetic acid was added to
each well. After air-drying, the sinapinic acid application
Chips were then placed in the Protein Biological Sys-
tem II (PBS II) mass spectrometer reader (Ciphergen), and
TOF spectra were generated by an average of 104 laser
shots collected in the positive mode. The settings for
low-energy readings were set with a high mass of 50 kDa
and were optimized from 3 to 15 kDa at a laser intensity
of 200, detector sensitivity of 8, and a focus by optimiza-
tion center. High-energy readings were set with a high
mass of 200 kDa and were optimized from 10 to 50 kDa at
a laser intensity of 230 and a detector sensitivity of 9. Mass
accuracy was calibrated externally by use of the All-in-
One peptide molecular mass calibrator (Ciphergen).
Sera from a healthy control were individually applied
to seven bait surfaces of eight WCX2 chips and run during
3-day intervals for analysis of within-run reproducibility.
In parallel, 40 samples (10 from SARS patients, 10 from
patients with fever, 10 from patients with pneumonia, and
10 from health controls) were applied in duplicate to a
single chip and run on two different instruments (PBS II
and PBS IIc; Ciphergen) for between-run analysis of
instrument drift. To avoid the possibility that placement
or run order of samples would affect assay accuracy,
samples were loaded on chips in a rotational fashion. In
Table 2. Control cohorts with various respiratory inflammations and carcinomas.
Patients, n (M/F)
38.7–40.1 °C; Fluc(n ? 66)
CXR, P (n ? 75); MP (n ? 57); P?TB (n ? 44)
CXR ? pathology (n ? 3); CT (n ? 16)
aSera from healthy persons attending Anzhen Hospital (n ? 14) were collected in 2001, sera from 307 Hospital (n ? 10) were collected before November 2002,
and sera from Deyi Diagnostic Institute (n ? 21; Beijing; epidemic region) and Taizhou Hospital (n ? 34; Zhejiang Province; nonepidemic region) were collected on
June 3, 2003. The rest of the healthy control sera, from Beijing Red Cross Blood Center, were collected between July and December 2003.
bSerum samples from patients with high fevers were collected from Taizhou Hospital, Zehjiang Province (nonepidemic region), on June 3, 2003; from Chaoyang
Hospital on November 15, 2003; and from Di Tan Hospital on November 22 and December 3, 2003. Among them, 66 were positive in the influenza A IgM ELISA.
cFlu, influenza; CXR, chest x-ray; MP, mycoplasma; P, pneumonia; TB, mycobacterium tuberculosis; CT, computed tomography.
dSerum samples were collected from Tiantan Hospital (n ? 12), Beijing, on May 3, 2003; from Taizhou Hospital (n ? 54), Zehjiang Province, on June 3, 2003; from
Chaoyang Hospital (n ? 38) on November 25, 2003; and from Ditan Hospital (n ? 72) on December 3, 2003. All patients had positive chest x-rays and manifested
with pneumonia or atypical pneumonia; 57 tested positive in the mycoplasma IgM ELISA, and 44 were positive in both the pneumonia and tuberculosis PCR assays.
eDiagnosis was based on the criteria in Surgery, 5th edition (Zaide Wu. Beijing, China: Public Health Press). Clinical features included various forms of metastasis
in the pericardium (n ? 1), upper right clavicle (n ? 1), lymph nodes (n ? 1), liver (n ? 1), and brain (n ? 1); accompanying hydrothorax was also observed in nine
Kang et al.: Early Diagnosis of SARS Using Proteomic Fingerprints
brief, sample 1 was spotted on the 8-well directional chip
(wells A to H) in duplicate in wells A and B and then in
wells G and H of the second chip. Samples 2, 3, and 4 were
loaded on chips in the same rotation order. We also
randomized the order of chip placement in the spectrom-
eter to minimize bias from run order. Spectra were
collected for each sample and analyzed independently
using the classification algorithm established in the train-
The peak at m/z 6635.09 in the quality-control serum
was adjusted to have an intensity of 40–60 for both the
PBS II and PBS IIc. The peak intensity of m/z 6635.09 in the
quality-control serum was used to normalize instrument
resolution between the PBS II and PBS IIc. We normalized
spectra using total ion current with an identical normal-
ization coefficient and a low mass cutoff ?2000 Da. If the
factor was ?0.3 or ?2.9 after normalization to total ion
current for the peak at m/z 3939, repeated runs would be
performed. No outlier was rejected in the test. The “root”
biomarker, m/z 3939, yielded the lowest and similar P
value in both the PBS II and PBS IIc.
bioinformatics and biostatistics
Peak detection was performed with Biomarker Wizard
software 3.1.1 (Ciphergen). The m/z ratios between 2000
and 20 000 were selected for analysis because this range
contained the majority of the resolved protein and pep-
tides. The m/z range between 0 and 2000 was eliminated
from analysis to avoid interference from adducts, artifacts
of the energy-absorbing molecules, and other possible
chemical contaminants. Peak detection involved baseline
subtraction, mass normalization using a common cali-
brant peak (m/z 6635.09), and normalization to the total
ion current intensity with a minimum m/z of 2000, using
an external normalization coefficient of 0.2 (normalization
factor for individual spectrum ? 0.2/average ion current
for each spectrum) for spectra obtained at different times
or locations. The settings used for autodetect peaks to
cluster in the first pass were a signal-to-noise ratio of 5
and a minimum peak threshold of 5% of all spectra. The
peak clusters were completed by second-pass peak detec-
tion using a signal-to-noise ratio of 2 and 0.3% of mass for
the cluster window. An average of 99 peaks was detected
in each spectrum. The mass range from 20 to 200 kDa was
analyzed in parallel.
Data analysis.The data analysis process used in this study
involved three stages: (a) peak detection and alignment;
(b) selection of peaks with the highest discriminatory
power; and (c) data analysis using a decision tree algo-
rithm. A random sampling (acute SARS, fever, pneumo-
nia, lung cancer, and healthy) with two strata (acute SARS
and non-SARS) was used to separate the entire data set
into training and test data sets. The training data set
consisted of SELDI spectra from 37 acute SARS and 74
non-SARS serum samples. The validity and accuracy of
the classification algorithm were then challenged with a
blinded test data set consisting of 37 acute SARS and 993
Decision tree classification. Construction of the decision tree
classification algorithm was performed as described pre-
viously (26) with modifications based on the Biomarker
Patterns Software (Ciphergen). Classification trees were
split into two branches or nodes, using one rule at a time.
We set target the variable level at 2 and the minimum
value at 0, and the decision was made based on the
presence or absence and the intensity of one peak, using
the Gini or Twoing method, favoring even splits from 0.00
to 2.00 and varied by 0.2 each time, and with V-fold
cross-validation from 6 to 12 changed by 2 for the growth
of 88 trees. The lowest cost tree (value ? 0.068; Gini ? 2.0;
V-fold ? 10) was selected for the final test.
tree classification and pattern discovery
To identify the serum biomarkers that could distinguish
SARS from non-SARS samples, we used a training set of
specimens (37 SARS acute and 74 controls; Tables 1 and 2)
and constructed the decision tree classification algorithm
using 10 989 peaks [99 peaks ? (37 ? 74) spectra] of
statistical significance identified in the low energy read-
ings (see Materials and Methods). The classification algo-
rithm used four peaks between 3 and 12 kDa (m/z 3939.08,
4137.71, 8136.64, and 11 514.2) and generated five terminal
nodes (Fig. 1). These discriminatory peaks efficiently split
SARS specimens into terminal nodes 3 and 5 and non-
SARS samples into terminal nodes 1, 2, and 4. Each mass
peak showed a mean intensity ratio of SARS vs non-SARS
?3 and a P value close to 0 (Table 3). Notably, the protein
or peptide with masses at 3939.08, 8136.64, and 11 514.2
Da was up-regulated in patients with acute SARS,
whereas that of a mass at 4137.71 Da was down-regulated
compared with healthy controls or patients with respira-
tory tract infections. A representative spectrum of a SARS
specimen aligned with that of a healthy control (Fig. 2A)
showed the four fingerprints in node 3 required for
pattern recognition in the classifier. The unique presence
of the root biomarker, m/z 3939.08, is demonstrated in the
alignment of representative spectra of samples from pa-
tients with acute SARS (1, 3, 5, and 7 days after the onset
of fever; from terminal node 5) and those from healthy
controls and patients with fever and influenza or pneu-
monia (Fig. 2B). This decision algorithm correctly classi-
fied 37 of 37 (100%) of the acute SARS samples and 72 of
74 (97.3%) of the non-SARS controls in the training set
The above classifier used only those masses in the
low-energy readings (m/z ?50 000). To exhaust all mean-
ingful serum biomarkers, we expanded the analysis of the
same training samples in the high-energy setting (m/z
Clinical Chemistry 51, No. 1, 2005
?200 kDa, see Materials and Methods) and pooled both
low- and high-energy readings together [161 ? (37 ?
74) ? 17 871 peaks]. The classification algorithm then
used five peaks between 4 and 16 kDa (m/z 4824.28,
8136.64, 11505.30, 14 023.00, and 15 369.20; peaks at m/z
8136.64 and 11 505.30 overlapped with those in Fig. 1) in
six terminal nodes and yielded a sensitivity and specific-
ity of 94.6% (35 of 37) and 95.9% (71 of 74), respectively
(data not shown). The peaks at m/z 3939.08 and 4137.71 in
this new classifier disappeared because their correspond-
ing peak intensities were beyond the limits after normal-
ization with the intensity for the peak at m/z 6635.09 (see
the section on patients and samples in the Materials and
Methods). However, because most of the SARS cases in
this alternative classifier (34 of 37) fell into the terminal
node where the proteins/peptides were down-regulated
(m/z 14023.0 ?0.611087, m/z 4824.28 ?0.746989, and m/z
15369.2 ?3.27656), and because this algorithm had to
Fig. 1. Diagram of the decision tree classification in the training data set.
The numbers in the root node (top), the descendant nodes (ovals), and the terminal nodes 1–5 (rectangles) represent the classes. S, SARS; NS, non-SARS; N, sum
of S and NS. The numbers below the root and descendant nodes are the mass values followed by the peak intensity values. For example, the mass value under the
root node is 3939.08 kDa, and the intensity is ?1.7107.
Table 3. Biomarker statistics for SARS vs non-SARS spectra and decision tree classification.a
Acute SARS Non-SARS
3 ? 10?10
a,bThe 95% confidence intervals were estimated using the principle of binominal distribution:afor sensitivity, the 95% confidence interval was 90.5–100.0% for the
training set and 85.8–99.9% for the test set;bfor specificity, the 95% confidence interval was 90.6–99.7% for the training set and 91.9–96.9% for the test set.
Kang et al.: Early Diagnosis of SARS Using Proteomic Fingerprints
Fig. 2. Representative SELDI spectra.
(A), combination of four peak masses required to correctly classify the sample (S4d-B, patient B 4 days after the onset of illness) as SARS in terminal node 3. The arrows
in the magnified panels indicate the differentially expressed protein peaks compared with the healthy control (C6-B) used in the classifier. The mass and peak intensity
are displayed as in Fig. 1. (B), alignment of representative SARS and non-SARS controls [healthy, pneumonia, and influenza and fever (Flu/Fever)] spectra with the mass
range (boxed) for the root biomarker m/z 3939.08 (arrow) highlighted. Shown are examples of SARS spectra from days 1, 3, 5, and 7 after the onset of symptoms.
Clinical Chemistry 51, No. 1, 2005
combine two energy settings for analysis, we reasoned
that the decision tree generated with only low-energy
readings (Fig. 1) would be more sensitive (100%) and
more convenient for a clinical application.
To determine the reproducibility of SELDI spectra,
mass location, and intensity from array to array on a
single chip (intraassay) and between instruments (inter-
assay), we first spotted the serum from a healthy control
on seven baits in a single chip and collected seven
independent spectra over a time span of 21 days (Fig. 3A).
We then selected seven proteins in the range of 3–10 kDa
(m/z 4089.59, 5334.17, 5631.18, 5901.49, 6625.63, 7762.24,
and 7966.63; black arrows in Fig. 3A) to calculate the
intraassay CV. These peaks were selected because they
were in the proximity of the four biomarkers with com-
parable current intensities. The interassay experiments
were similar except that sera from healthy controls and
from patients with high fever, pneumonia, and SARS
were applied to a single chip, and the independent spectra
were collected from two different instruments (PBS II
and PBS IIc; Fig. 3, B and C). The mean intra- and
interassay CVs for peak location were 0.02% and 0.03%,
respectively. We considered masses with accuracies
within 0.1% between spectra to be the same. The mean
intra- and interassay CVs for the normalized intensity
were 15% and 20%, respectively. CV calculations using
lower intensity peaks (Fig. 3A, gray arrowheads), on the
other hand, yielded results similar to those obtained with
the seven high-intensity peaks (peak location, intra- and
interassay CVs both 0.03%; peak intensity, intraassay
CV ? 17% and interassay CV ? 18%).
detection of sars
Analysis of spectra from the completely blinded test set
(37 acute SARS and 993 controls; Tables 1 and 2) accu-
rately classified 36 of 37 (97.3%) SARS specimens and
accurately classified 987 of 993 (99.4%) of the controls as
non-SARS (Table 3). More important was that the classi-
fication algorithm successfully distinguished acute SARS
from fever and influenza, with a sensitivity and specificity
reaching 97.3% (36 of 37) and 100% (187 of 187; 60 of 60
with influenza), respectively. Interestingly, when we
tested the classifier using an additional control population
of 40 samples from patients in the Beijing area with
measles after July 16, 2003, who had no history of close
contact with SARS patients and had not visited those
hospitals treating SARS patients, the classifier had a
specificity of 100% (95% confidence interval, 89–100%;
data not shown).
Several laboratory tests, based on either viral RNA
(3, 17, 19) or serology (6, 17), have been developed to
complement clinical characteristics and epidemiologic
data in the identification of SARS, but early detection of
SARS with sufficiently high sensitivity and specificity has
not been achieved.
The identification of proteins/peptides of pathophysi-
Fig. 3. Intra- and interassay reproducibility.
(A), example of intraassay reproducibility of mass spectra and tree decision
classification. Serum from an unaffected healthy control was individually applied
to seven bait surfaces on eight chips, and seven randomly selected peaks
(arrows) in each spectrum over a course of 27 days were used as surrogate
markers for calculation of CV. The reproducibility of SELDI spectra, mass
location, and intensity from spectrum to spectrum was determined accordingly.
(B and C), examples of interassay reproducibility evaluation of the same chip
loaded with duplicate serum samples from a healthy control (C1-A and -B), a
SARS patient (S4-A and -B), and patients with pneumonia (P10-A and -B) or fever
(F7-A and -B). Spectra from a PBS II (B) and PBS IIc (C) are aligned for
Kang et al.: Early Diagnosis of SARS Using Proteomic Fingerprints
ologic significance (phenomic fingerprints) in crude bio-
logical and clinical samples by SELDI-TOF MS has been
demonstrated in various cancer studies (28). Using a
similar profiling strategy, we have established a classifi-
cation algorithm that delineates probable SARS patients
as early as day 1 after self-described onset of symptoms
from healthy individuals and from patients with respira-
tory tract infections in the training set (sensitivity ? 100%;
specificity ? 97.3%). When applied to the blinded test set,
this discriminatory profiling method precisely classified
97.3% of patients with acute SARS and 99.4% of non-SARS
patients. More strikingly, our classifier was able to dis-
criminate SARS-CoV infection from bacterial (myco-
plasma, tuberculosis) and other local (influenza) or sys-
temic (measles) viral infections of the respiratory tract
with a specificity reaching 100%. This was attributable to
the inclusion of corresponding inflammatory control sam-
ples in the training set and optimization of the classifica-
tion algorithm. The biomarkers identified in the acute
phase of SARS seemed to remain throughout the conva-
lescent phase of the disease because when we applied the
identical tree classification to samples from patients in
whom onset of fever had been ?2, 3, 4, and ?5 weeks
previously, we could detect SARS with sensitivities and
specificities reaching 89.2% and 91.8%, 86.0% and 91.8%,
93.1% and 91.8%, and 79.5% and 91.8%, respectively (data
not shown). One intriguing observation was that SARS
patients clustering in terminal node 3 all demonstrated
moderate clinical features, whereas those in node 5 were
severe cases. We are investigating the correlation between
this proteomic pattern and the pathology of SARS.
These results represent, to the best of our knowledge,
the most accurate laboratory technique for early detection
of SARS: PCR-based assays have a maximum sensitivity
of 80% when used to test nasopharyngeal aspirates or
plasma specimens (29, 30). The proteomic method de-
scribed here also has advantages over PCR-based assays
in that it does not require BSL-3 containment and it can
detect SARS in serum samples. This is a critical alternative
to PCR-based tests, which are challenged by low viral
loads in nasopharyngeal aspirates and throat swab spec-
imens in the acute phase of SARS.
Instead of traditional chromatographic fractionation of
samples, we directly spotted the crude serum on the WCX
chips. By doing this we avoided the unnecessarily biased
depletion of thousands of proteins and/or peptides asso-
ciated with human serum albumin before MS analysis.
Processing of samples and generation of the diagnostic
mass spectra by our method required only a small amount
of serum (20 ?L vs several milliliters needed for PCR
methods) and took ?3 h. High-throughput proteomic
screening for SARS in a 96-well format is also feasible.
We adhered to the WHO case definition and eligibility
criteria for SARS and avoided using samples from non-
SARS controls from hospitals where SARS patients had
been admitted because these persons might have a history
of close contact with SARS patients or had been inside
those SARS hospitals. We further emphasized this point
by sampling control sera from a nonepidemic region of
the country. Although the possibility might exist that the
difference in serum fingerprints would reflect differences
among SARS and non-SARS hospitals, the fact that all
SARS cases from 38 different hospitals fit into the single
classification algorithm would likely rule out such a
concern. More importantly, severe and mild cases of
SARS from different hospitals, which had been com-
pletely randomized in the experimental analysis, fell into
distinct nodes of the tree classification, strongly indicating
that the biomarkers we have identified were specific to
SARS and not the sites at which blood samples were
collected. We further minimized the potential sampling
bias by simultaneously using four biomarkers instead of
one (e.g., m/z 3939.08), which nevertheless could suffi-
ciently delineate SARS from non-SARS (sensitivity ?
93.7%; specificity ? 91.8%; data not shown). All SARS and
non-SARS samples were from patients with the same
ethnic background. SARS and non-SARS control sera
collected at different times were all freshly aliquoted and
properly stored at ?80 °C.
The differential protein pattern as the discriminator
between SARS and non-SARS is independent of protein
identities. The origins and full identities of the discrimi-
nating biomarkers are under investigation. To know their
identities for the purpose of differential diagnosis is not
absolutely required, as shown by numerous studies show-
ing diagnosis of cancers by SELDI methods. However, to
characterize these peaks would certainly help in under-
standing the biological roles of these peptide/proteins
and could potentially lead to the discovery of more direct
diagnostic tools and novel therapeutic targets for SARS-
We thank Drs. C. Stohr and F.C. He for helpful discus-
sions and comments and Drs. T. Yip, E. Fung, L. Ma, W.
Zhang, and F. Zhang for their communication of results
and assistance in statistical analyses. We thank the SARS
Control Scientific Committee of the Ministry of Science
and Technology of China (MOST), the National Institute
for the Control of Pharmaceutical and Biological Products
(NICPBP), and the Beijing SARS-Control Working Group
for their encouragement and technical assistance. This
work was supported by an Outstanding Young Investi-
gators Fellowship of National Natural Science Foundation
of China (NSFC30025010) and the “973 Plan” of MOST
(2002CB513000 and 2003CB514116) to H.T.
1. WHO. Cumulative number of reported probable cases of SARS.
08_15.pdf (accessed August 2003).
2. Donnelly CA, Ghani AC, Leung GM, Hedley AJ, Fraser C, Riley S, et
al. Epidemiological determinants of spread of causal agent of
Clinical Chemistry 51, No. 1, 2005
severe acute respiratory syndrome in Hong Kong. Lancet 2003;
3. Drosten C, Gunther S, Preiser W, van der Werf S, Brodt HR, Becker
S, et al. Identification of a novel coronavirus in patients with
severe acute respiratory syndrome. N Engl J Med 2003;348:
4. Ksiazek TG, Erdman D, Goldsmith CS, Zaki SR, Peret T, Emery S,
et al. A novel coronavirus associated with severe acute respiratory
syndrome. N Engl J Med 2003;348:1953–66.
5. Poutanen SM, Low DE, Henry B, Finkelstein S, Rose D, Green K, et
al. Identification of severe acute respiratory syndrome in Canada.
N Engl J Med 2003;348:1995–2005.
6. Peiris JS, Lai ST, Poon LL, Guan Y, Yam LY, Lim W, et al.
Coronavirus as a possible cause of severe acute respiratory
syndrome. Lancet 2003;361:1319–25.
7. Kuiken T, Fouchier RA, Schutten M, Rimmelzwaan GF, van Amer-
ongen G, van Riel D, et al. Newly discovered coronavirus as the
primary cause of severe acute respiratory syndrome. Lancet
8. Fouchier RA, Kuiken T, Schutten M, van Amerongen G, van
Doornum GJ, van den Hoogen BG, et al. Aetiology: Koch’s postu-
lates fulfilled for SARS virus. Nature 2003;423:240.
9. Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle
JP, et al. Characterization of a novel coronavirus associated with
severe acute respiratory syndrome. Science 2003;300:1394–9.
10. Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butter-
field YS, et al. The genome sequence of the SARS-associated
coronavirus. Science 2003;300:1399–404.
11. Ruan YJ, Wei CL, Ee AL, Vega VB, Thoreau H, Su ST, et al.
Comparative full-length genome sequence analysis of 14 SARS
coronavirus isolates and common mutations associated with
putative origins of infection. Lancet 2003;361:1779–85.
12. Stadler K, Masignani V, Eickmann M, Becker S, Abrignani S, Klenk
HD, et al. SARS—beginning to understand a new virus. Nat Rev
13. Li W, Moore MJ, Vasilieva N, Sui J, Wong SK, Berne MA, et al.
Angiotensin-converting enzyme 2 is a functional receptor for the
SARS coronavirus. Nature 2003;426:450–4.
14. CDC. Updated interim U.S. case definition of severe acute respi-
ratory syndrome (SARS). 2003. http://www.cdc.gov/ncidod/
sars/casedefinition.htm (accessed August 2003).
15. WHO. Case definitions for surveillance of severe acute respira-
tory syndrome (SARS). 2003. http://www.who.int/csr/sars/
casedefinition/en/ (accessed August 2003).
16. Hon KL, Leung CW, Cheng WT, Chan PK, Chu WC, Kwan YW, et al.
Clinical presentations and outcome of severe acute respiratory
syndrome in children. Lancet 2003;361:1701–3.
17. Peiris JS, Chu CM, Cheng VC, Chan KS, Hung IF, Poon LL, et al.
Clinical progression and viral load in a community outbreak of
coronavirus-associated SARS pneumonia: a prospective study.
18. Ng EKO, Hui DS, Chan KC, Hung E, Chiu R, Lee N, et al.
Quantitative analysis and prognostic implication of SARS corona-
virus RNA in the plasma and serum of patients with severe acute
respiratory syndrome. Clin Chem 2003;49:1976–80.
19. Poon LL, Wong OK, Luk W, Yuen KY, Peiris JS, Guan Y. Rapid
diagnosis of a coronavirus associated with severe acute respira-
tory syndrome (SARS). Clin Chem 2003;49:953–5.
20. WHO. Laboratory confirmation of a SARS case in southern China.
http://www.who.int/csr/don/2004_01_05/en/ (accessed Janu-
21. Hanash S. Disease proteomics. Nature 2003;422:226–32.
22. Boguski MS, McIntosh MW. Biomedical informatics for proteom-
ics. Nature 2003;422:233–7.
23. Petricoin EF, Zoon KC, Kohn EC, Barrett JC, Liotta LA. Clinical
proteomics translating benchside promise to bedside reality. Nat
Rev Drug Discov 2002;1:683–95.
24. Wright GL Jr. SELDI proteinchip MS: a platform for biomarker
discovery and cancer diagnosis. Expert Rev Mol Diagn 2002;2:
25. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg
SM, et al. Use of proteomic patterns in serum to identify ovarian
cancer. Lancet 2002;359:572–7.
26. Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, et
al. Serum protein fingerprinting coupled with a pattern-matching
algorithm distinguishes prostate cancer from benign prostate
hyperplasia and healthy men. Cancer Res 2002;62:3609–14.
27. WHO. WHO biosafety guidelines for handling of SARS specimens.
(accessed August 2003).
28. Wulfkuhle JD, Liotta LA, Petricoin EF. Proteomic applications for
the early detection of cancer. Nat Rev Cancer 2003;3:267–75.
29. Grant PR, Garson JA, Tedder RS, Chan PK, Tam JS, Sung JJ.
Detection of SARS coronavirus in plasma by real-time RT-PCR.
N Engl J Med 2003;349:2468–9.
30. Poon LL, Chan KH, Peiris JS. Crouching tiger, hidden dragon: the
laboratory diagnosis of severe acute respiratory syndrome. Clin
Infect Dis 2004;38:297–9.
Kang et al.: Early Diagnosis of SARS Using Proteomic Fingerprints