A practical platform for blood biomarker study by using global gene expression profiling of peripheral whole blood.
ABSTRACT Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study.
We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal.
We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study.
Article: Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease.[show abstract] [hide abstract]
ABSTRACT: Huntington's disease (HD) is an autosomal dominant disorder caused by an expansion of glutamine repeats in ubiquitously distributed huntingtin protein. Recent studies have shown that mutant huntingtin interferes with the function of widely expressed transcription factors, suggesting that gene expression may be altered in a variety of tissues in HD, including peripheral blood. Affymetrix and Amersham Biosciences oligonucleotide microarrays were used to analyze global gene expression in blood samples of HD patients and matched controls. We identified 322 mRNAs that showed significantly altered expression in HD blood samples, compared with controls (P < 0.0005), on two different microarray platforms. A subset of up-regulated mRNAs selected from this group was able to distinguish controls, presymptomatic individuals carrying the HD mutation, and symptomatic HD patients. In addition, early presymptomatic subjects showed gene expression profiles similar to those of controls, whereas late presymptomatic subjects showed altered expression that resembled that of symptomatic HD patients. These elevated mRNAs were significantly reduced in HD patients involved in a dose-finding study of the histone deacetylase inhibitor sodium phenylbutyrate. Furthermore, expression of the marker genes was significantly up-regulated in postmortem HD caudate, suggesting that alterations in blood mRNAs may reflect disease mechanisms observed in HD brain. In conclusion, we identified changes in blood mRNAs that clearly distinguish HD patients from controls. These alterations in mRNA expression correlate with disease progression and response to experimental treatment. Such markers may provide clues to the state of HD and may be of predictive value in clinical trials.Proceedings of the National Academy of Sciences 08/2005; 102(31):11023-8. · 9.68 Impact Factor
Article: High-sensitivity array analysis of gene expression for the early detection of disseminated breast tumor cells in peripheral blood.[show abstract] [hide abstract]
ABSTRACT: Early detection is an effective means of reducing cancer mortality. Here, we describe a highly sensitive high-throughput screen that can identify panels of markers for the early detection of solid tumor cells disseminated in peripheral blood. The method is a two-step combination of differential display and high-sensitivity cDNA arrays. In a primary screen, differential display identified 170 candidate marker genes differentially expressed between breast tumor cells and normal breast epithelial cells. In a secondary screen, high-sensitivity arrays assessed expression levels of these genes in 48 blood samples, 22 from healthy volunteers and 26 from breast cancer patients. Cluster analysis identified a group of 12 genes that were elevated in the blood of cancer patients. Permutation analysis of individual genes defined five core genes (P < or = 0.05, permax test). As a group, the 12 genes generally distinguished accurately between healthy volunteers and patients with breast cancer. Mean expression levels of the 12 genes were elevated in 77% (10 of 13) untreated invasive cancer patients, whereas cluster analysis correctly classified volunteers and patients (P = 0.0022, Fisher's exact test). Quantitative real-time PCR confirmed array results and indicated that the sensitivity of the assay (1:2 x 10(8) transcripts) was sufficient to detect disseminated solid tumor cells in blood. Expression-based blood assays developed with the screening approach described here have the potential to detect and classify solid tumor cells originating from virtually any primary site in the body.Proceedings of the National Academy of Sciences 03/2001; 98(5):2646-51. · 9.68 Impact Factor
Article: Assessment of two methods for handling blood in collection tubes with RNA stabilizing agent for surveillance of gene expression profiles with high density microarrays.[show abstract] [hide abstract]
ABSTRACT: Genome-wide expression studies of human blood samples in the context of epidemiologic surveillance are confronted by numerous challenges-one of the foremost being the capability to produce reliable detection of transcript levels. This led us to consider the Paxgene Blood RNA System, which consists of a stabilizing additive in an evacuated blood collection tube (PAX tube) and a sample processing kit (PAX kit). The PAX tube contains a solution that inhibits RNA degradation and gene induction as blood is drawn into the tube. The stability of RNA in PAX tubes under conditions for practical clinical applications has been determined by RT-PCR, but has not been assessed at the transcriptome level on Affymetrix microarrays. Here, we report a quality assured and controlled protocol that is capable of producing reliable gene expression profiles using the GeneChip system with RNA isolated from PAX tubes. Using this protocol, we compared quality metrics and gene-expression profiles of RNA, extracted from blood in PAX tubes that sat at room temperature for 2 h, with that of blood in PAX tubes incubated at room temperature for 9 h followed by storage at -20 degrees C for 6 days. Of numerous metrics, differences between the two handling methods were detected for the level of DNA contamination, RNA yield, and double stranded cDNA yield. Analysis of variance of gene-expression revealed small but significant differences between the handling methods. These results contribute to the determination of protocols for clinical studies and progress us towards the goal of using the transcriptome in diagnosis and surveillance.Journal of Immunological Methods 01/2004; 283(1-2):269-79. · 2.20 Impact Factor
A Practical Platform for Blood Biomarker Study by Using
Global Gene Expression Profiling of Peripheral Whole
Ze Tian1,2., Nathan Palmer3., Patrick Schmid3, Hui Yao5, Michal Galdzicki1, Bonnie Berger3,4*,
Erxi Wu1,2,6*, Isaac S. Kohane1,2
1Informatics Program, Children’s Hospital Boston, Harvard Medical School, Boston, Massachusetts, United States of America, 2Division of Health Sciences and
Technology, Harvard University and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, 3Computer Science and Artificial
Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, 4Department of Mathematics, Massachusetts
Institute of Technology, Cambridge, Massachusetts, United States of America, 5Genomics Program, Children’s Hospital Boston, Harvard Medical School, Boston,
Massachusetts, United States of America, 6Department of Pharmaceutical Sciences, North Dakota State University, Fargo, North Dakota, United States of America
Background: Although microarray technology has become the most common method for studying global gene expression,
a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using
peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to
meet clinical requirements for biomarker study.
Methods and Findings: We applied peripheral whole blood samples with globin reduction and performed genome-wide
transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data
and elucidate the mode in which hemoglobin interferes in gene expression profiling.
applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant
increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction
samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene
separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests
that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin
derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal.
We demonstrated that, when
Conclusion: We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood
samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study.
Citation: Tian Z, Palmer N, Schmid P, Yao H, Galdzicki M, et al. (2009) A Practical Platform for Blood Biomarker Study by Using Global Gene Expression Profiling of
Peripheral Whole Blood. PLoS ONE 4(4): e5157. doi:10.1371/journal.pone.0005157
Editor: Aimee K. Zaas, Duke University, United States of America
Received November 4, 2008; Accepted March 8, 2009; Published April 17, 2009
Copyright: ? 2009 Tian et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The research was supported by National Institute of Health Training Grant, grant 2 T15 LM007092-16. The funder had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: email@example.com (BB); firstname.lastname@example.org (EW)
. These authors contributed equally to this work.
Peripheral blood has recently become an attractive prime tissue
for biomarker detection because of its critical role in immune
response, metabolism, and communication with cells, and
extracellular matrices in almost all tissues and organs in the
human body, as well as its being less invasive and the simplicity of
Many different techniques are used to handle peripheral blood
samples prior to RNA isolation based on the experimental
design: PAXgene (whole blood), QIAamp (Platelets and White
Blood Cells), and Ficoll and BD-CPT (Mononuclear cells).
Several studies have been conducted to make a comparison
between methods by examining their reproducibility, variance,
and signal-to-noise ratios. Each method has its unique
advantages and disadvantages. The PAXgene Blood RNA
system provides a way to stabilize RNA immediately after
sample collection and makes it possible to store the samples for a
relatively long time without compromising the RNA’s integrity
[4,5,6,7]. This is very important for multiple-center clinical
practice. However, a high degree of variability and low present
call rates have been observed in data obtained from Affymetrix
microarrays when the samples are prepared using this whole-
blood RNA system. These poor results are thought to be the
effects of an overabundance of hemoglobin, and, consequently,
several globin reduction methods have been developed to solve
this problem[8,9,10,11]. Although globin reduction significantly
improved whole genome gene expression profiles in Affymetrix
arrays, and post-globin reduction samples could be successfully
applied to Illumina bead arrays, less is known about the
PLoS ONE | www.plosone.org1April 2009 | Volume 4 | Issue 4 | e5157
impact of globin on Illumina bead array, since no comparison
has been made between pre-globin reduction and post-globin
reduction gene expression profiles in this array.
The Illumina Sentrix human-6 v2 array uses gene-specific 50-
mer probes attached to 3 mm beads with an average of 30
redundant features for each transcript, allowing six samples to be
profiled per BeadChip simultaneously. This multi-sample
approach provides the possibility of higher throughput for large-
scale clinical research. In order to realize the potential impact of
over-abundant globin on Illumina high-throughput expression
array in medical practice, a practical framework for clinical
microarray-based studies must be established. To this end, we
compared the data obtained by microbead array profiling of pre-
globin reduction and post-globin reduction peripheral whole blood
samples. Our data demonstrated the combination of performing
globin reduction in peripheral whole-blood samples and hybrid-
izing on Illumina BeadChips to be the practical approach for
large-scale multi-center biomarker research.
Globin mRNA reduction improves cRNA signal
No difference was seen between pre- and post-globin reduction
samples when comparing RNA quality using a bioanalyzer. Pre-
globin reduction samples looked to be of similar quality to post-
globin reduction samples, with two sharp peaks of 18 s and 28 s
RNA around 2000 bp and 4000 bp. However, the cRNA from
pre-globin reduction samples showed different signals from
standard cRNA. Hemoglobin showed an additional sharp peak
on the top of cRNA Fluorescence absorbance curve in pre-globin
reduction samples and the sharp peak disappeared in post-globin
reduction samples. In addition, most of the pre-globin reduction
samples only showed a band at 800 bp in cRNA electrophoresis
figure and post-globin reduction samples exhibited smeared bands
(200 bp–6000 bp) as normal standard cRNA (Data not shown).
Globin reduction improves microarray probe detection
and decreases variance
A great deal of evidence shows that, when compared to other
techniques, whole blood samples prepared by the PAXgene
method typically result in Affymetrix microarray data with a low
rate of genes detected as ‘‘present’’ with respect to the background
noise on the chip, as well as large intra-group variance. The great
abundance of hemoglobin mRNA is thought to account for this
poor performance. In order to quantify these observations, we
analyzed the performance of the Illumina Sentrix human-6
BeadChip microarray platform on pre- and post-globin reduction
peripheral whole-blood samples of 11 adults, 8 females and 3
The BeadStudio software used to process Illumina’s BeadChip
data provides a ‘‘detection p-value’’ that can be used to determine
whether a particular probe was detected against background noise.
After correcting for multiple rounds of hypothesis testing, we
considered adjusted p-values below 0.05 to be ‘‘detected’’ or
Given the above definition, the average number of present calls
11921.736296.38, whereas the average number of present calls
per array in the pre-globin reduction group was significantly lower
and more variable, at 8987.7561264.94. Hence, samples in the
post-globin reduction group showed improved probe detection
and reduced intra-group detection call variance with respect to
those in the pre-globin reduction group (Figure 1A). In addition,
intra-sample intensity variance was reduced in the post-globin
reduction group (Figure 1B).
In order to determine whether the improvement in detection
was a consistent phenomenon (i.e., possibly probe- or target-
dependent), we sought to identify probes that were consistently
detected in the post-globin reduction group, and consistently not
called in the pre-globin reduction group. Such probes might
represent nucleotide sequence-specific susceptibility to globin-
induced noise, or some other systemic effect.
1876 probes showed a consistent pattern of improved detection,
where at least 75% of the adjusted detection p-values were ,0.05
for the post-globin reduction group and at least 75% were .0.05
in the pre-globin reduction group with a statistically significant
difference between the distributions of pre- and post-globin
reduction adjusted p-values (Table S1). Here, statistical signifi-
cance of the separation of sampling distributions was determined
by computing the p-value of a Wilcoxon Mann-Whitney test
between the pre- and post-globin reduction distributions, and
thresholding at 0.05.
In addition, no genes were found to pass the detection criteria
described above (75%,0.05 adjusted p-value) in the pre-globin
reduction group, but failed to pass the criteria in the post-globin
In order to show that the 1876 probes with consistently
improved p-values are not an artifact of random fluctuation of the
p-values reported by the BeadStudio software, a randomized
simulation was run to estimate the number of probes that would
pass our selection criteria under random re-association of
improved detection calls to probe identifiers. After 100 random
trials, none of the simulations produced more than one probe that
passed our criteria for being significantly improved by globin
reduction. These results indicate that the 1876 probes identified
here were not an artifact of the stochastic nature of microarray
hybridization and instead represent a set of probes whose
improved detection was strongly dependent on the removal of
globin RNA from the whole blood samples.
Globin reduction improves the sensitivity of microarray
The sensitivity threshold of microarray measurements defines
the concentration range in which accurate measurements can be
achieved. One of the advantages of the BeadArray platform is that
it requires less mRNA for hybridization. This makes BeadArray
more sensitive than any other platforms.
All of the 1876 genes with consistently improved detection p-
values exhibited higher expression intensity in the post-globin
reduction group than in the pre-globin reduction group (Figure 2).
The BeadStudio software computes signal intensity by subtracting
away background; this may be an indication that the improvement
in detection is mainly due to reduced background noise caused by
the over-abundance of globin mRNA. Moreover, in all of the 11
post-globin reduction samples, at least 90% of the probes with
consistently improved detection values had expression intensities
that were among the lowest 1/3 of all detected probes, i.e. low
abundance genes (Figure 3). Thus, globin reduction improved
detection sensitivity most dramatically for low abundance genes.
This again supports the hypothesis that the improvement in
detection is predominantly due to lower background noise induced
by the overabundance of globin.
Globin reduction improves useful biological signals
In order to show that the post-globin reduction improves the
practical usability of whole blood mRNA samples, the pre- and
post-globin reduction samples were used to select gender marker
Gene Expression Profiling
PLoS ONE | www.plosone.org2 April 2009 | Volume 4 | Issue 4 | e5157
genes. A similar approach was used by Debey et al. After
performing the normalization and gene selection process described
in the Materials and Methods section, we found that a significant
fraction of the marker genes were in fact genes that are located on
either the X or Y chromosome.
The result of the selection process shows a clear performance
improvement when finding gender marker genes using post-globin
reduction samples instead of pre-globin reduction samples. For
example, when the top ten probes are selected in each of the ten
iterations using the pre-globin reduction data, only one was
selected in at least half of the iterations. In contrast, when using the
post-globin reduction experiments four probes, representing three
distinct genes on the Y chromosome, were selected. Figure 4 shows
heatmaps of the intensity values of these four selected probes in the
pre- and post-globin reduction groups. The separation between
the male and female samples was much more clearly defined in the
post-globin reduction samples. Similarly, when selecting the top 40
probes from the untreated experiments only nine were found. Of
those, only five were on the X or Y chromosome. When using the
post-globin data, however, 12 probes were found, 10 of which
were on either the X or Y chromosome. With only one exception,
the p-values from a Mann-Whitney U test for the male-to-female
Figure 1. Globin reduction Increased present calls and decreased variance. Box plots showing the distribution of number of present calls
per array in both the pre- and post-globin reduction data. Whiskers on the plots extend from the minimum and maximum values to the lower and
upper quartiles, respectively. The box extends from the lower quartile through the upper quartile, with a bold line marking the median. B. Decreased
intra-sample variance in intensity by globin reduction. Each column shows the distribution of log expression intensities for one sample as reported in
the microarray data. Pre-globin reduction samples are labeled 1b–11b, and post-globin reduction samples are labeled as 1a–11a.
Gene Expression Profiling
PLoS ONE | www.plosone.org3 April 2009 | Volume 4 | Issue 4 | e5157
comparison in the post-globin samples were more significant than
those from the pre-globin samples. The one exception is likely the
result of small sample size. Thus, in the context of biomarker
selection, the post-globin reduction samples yield not only more
statistically significant results than the pre-globin reduction
samples, but also pick out more biomarkers with higher accuracy.
Evaluation of the differentially expressed genes by real
As well as improving the detection of low intensity and
biologically informative transcripts, the globin reduction treatment
also resulted in 298 genes (all with strong detection calls in both
treatment groups) exhibiting apparent differential expression
between the pre- and post-globin reduction groups. Among these,
75 genes were up-regulated with fold changes §2; 223 genes were
found down-regulated with fold changes #0.5. We postulate first
that hemoglobin may compete with the target mRNA in binding
to the probe, thus contributing to a higher-than-normal pseudo
binding background signal. Second, the overabundance of
hemoglobin may interfere with target mRNA binding through
an unknown mechanism.
We used real-time PCR to show that the RNA levels of these
genes were, in fact, no different in the pre- and post-globin
reduction samples, and that the differential expression observed in
the array data was an artifact of the noise induced by the
overabundance of globin RNA in the pre-globin reduction
samples. Four hemoglobin genes, HBA1, HBB, HBD, and
HBE1, together with another down-regulated gene, as well as 2
up-regulated genes in bead array were randomly chosen to verify
by real-time PCR. The two most abundant hemoglobin genes,
HBA1 and HBB, were observed at lower levels in the post-globin
reduction group with respect to the pre-globin reduction group in
both the array data and in the real time PCR, indicating that the
globin reduction protocol was effective. Two other hemoglobin
genes, HBD and HBE, were also significantly down-regulated by
globin reduction in both assays. However, AYTL2, the gene
observed to be down-regulated in the array data, and the 2 genes
that showed higher expression levels in the array data, ccl5 and
DYRK2, all showed no significant change in expression level in
the real-time PCR assay (Figure 5).
Expression profiling using peripheral whole blood samples is an
attractive method for biomarker detection. However, hemoglobin,
which represents as much as 70% of the total mRNA population
in peripheral whole blood samples isolated by the PAXgene tube,
effectively dilutes the mRNA population and interrupts the gene
expression profiles in the Affymetrix array. It is of interest to
investigate whether and how hemoglobin influences the gene
expression profiles acquired from Illumina bead arrays, which
constitute a high throughput platform. In this study, we compared
the gene expression profile of 11 pre- and post-globin reduction
peripheral whole blood samples hybridized on Illumina bead
arrays. We demonstrated that hemoglobin influenced the gene
expression profiles from these arrays in a clear and consistent
manner. Globin reduction efficiently improved the probe
detection by increasing present calls and decreasing variance, as
well as improving sensitivity of lower abundance genes. More
importantly, the more consistent expression signature of 4 sex
genes in the post-globin reduction group with respect to the pre-
globin reduction group indicates that class prediction was
markedly improved with globin reduction. We reasoned that the
high abundance of hemoglobin might interrupt the target mRNA
binding, or contribute to pseudo-binding (a nonspecific, back-
ground signal that is present in the absence of any significant
sequence similarity) to the probe, and, therefore, distort the true
Real-time PCR is commonly used to validate the mRNA
expressions acquired from microarray experiments due to the
greater specificity of the primer vs. microarray probes.
Therefore, real-time PCR was used as a ‘‘truth’’ measurement
to evaluate the reliability of the pre- and post-globin reduction
bead array data. Four hemoglobin genes, HBA1, HBB, HBD and
HBE1, along with 3 randomly selected differentially expressed
genes (one down-regulated and two up-regulated by globin
reduction), were chosen for mRNA level measurement by real
time PCR. We found that, although the GLOBINclear kit is
claimed to only reduce HBA1 and HBB, the other hemoglobin
genes HBD and HBE1 also showed significantly lower mRNA
levels in the post-globin reduction samples when compared to the
pre-globin reduction samples. This might be due to the fact that
HBD and HBE1 have 93% and 79% percent sequence identify
with HBB (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi), respec-
tively. However, the non-hemoglobin gene that was observed to be
down-regulated after globin reduction in the microarray data,
AYTL2, showed no significant change in the real-time PCR data.
This suggests that globin reduction did not actually decrease the
level of other genes’ mRNA, but rather allowed for more accurate
measurement of these levels when the samples were hybridized to
microarrays. The decreased intensity observed for non-hemoglo-
bin genes on the post-globin reduction microarrays might be due
to high abundance hemoglobin providing a non-specific pseudo-
binding signal. On further analysis, no significant sequence
similarity was found between these down regulated genes and
HBA1 and HBB. This again supports our hypothesis that the
higher expression in pre-globin reduction samples of some genes
Figure 2. Probes with improved detection have higher
expression intensity in post-globin reduction data compared
to pre-globin reduction. The histogram shows the distribution of the
difference in average log-reduced intensities between the post-globin
reduction and pre-globin reduction data for the probes with improved
detection p-values. Most of the improved probes show at least two-fold
increase in average expression intensity in the post-globin reduction
Gene Expression Profiling
PLoS ONE | www.plosone.org4April 2009 | Volume 4 | Issue 4 | e5157
might be due to a non-specific pseudo-binding signals from
hemoglobin. In addition, the two up-regulated genes (ccl5 and
DYR) exhibited no significant changes in expression level in the
real time PCR data. This indicates that hemoglobin may interfere with
target mRNA binding through an unknown mechanism and result in lower
expression signals from the pre-globin reduction microarrays.
In summary, this study demonstrates that the combination of
performing globin mRNA reduction in peripheral whole blood
samples and hybridizing on Illumina BeadChips is a practical
approach for biomarker research. Our future study will focus on
the cancer biomarker detection by using this established platform.
Materials and Methods
Sample collection and RNA preparation
This study was conducted under protocols approved by the
Children’s Hospital Boston Institutional Review Board. Blood
samples were obtained from 11 subjects who voluntarily agreed to
participate and gave written informed consent. Peripheral blood
was drawn with a BD safely LokTMblood collection set (BD,
Franklin Lakes, NJ) into PAXgene RNA collection tube (Qiagen,
Valencia, CA) according to the standard procedure. Total RNA
was prepared with the PAXgene Blood RNA Kit (Qiagen)
according to the manufacturer’s instructions with an on-column
DNase digestion step. RNA quantity and quality were determined
by a NanoDrop ND-1000 Spectrophotometer (NanoDrop Tech-
nologies, Wilmington DE) and an ExperionTM(Bio-RAD, Hercu-
Since a large amount of hemoglobin exists in erythrocytes,
several studies have shown decreased present calls, reduced
accuracy, and increased variability among replicates in an
Affymetrix GeneChip array when using PAXgene RNA collection
technology[5,13]. To overcome this obstacle, the GLOBIN-
clearTMKit (Ambion, Austin, Texas) was employed to remove
the highly abundant hemoglobin mRNA according to the
manufacturer’s instructions. In short, 4 mg total RNA from each
sample were hybridized with a biotinylated Capture OLIGO Mix
that is specific for human mRNA hemoglobin a and b.
Streptavidin Magnetic Beads were added to bind the biotinylated
oligonucleotides that hybridized with globin mRNA and then were
Figure 3. Globin reduction improved sensitivity. Probes with Improved Detection Values have Low Intensity. Each histogram shows the
distribution of the ranks of the expression levels of the probes with improved detection p-values in the post-globin reduction data. The X-axis
represents the ranks of expression level of the genes and Y-axis represents the frequency with which a given intensity was observed. All 11 samples
show a clear tendency for the improved probes to have low expression level.
Gene Expression Profiling
PLoS ONE | www.plosone.org5 April 2009 | Volume 4 | Issue 4 | e5157
pulled down by magnet. The globin mRNA depleted RNA was
transferred to a fresh tube and further purified with a rapid
magnetic bead-based purification process.
RNA amplification and hybridization on Illumina Sentrix
100 ng of total RNA was applied to generate cRNA by using a
Illumina TotalPrep RNA Amplification Kit (Ambion). Reverse
transcription with the T7 oligo (dT) primer was used to produce
first strand cDNA. The cDNA then underwent second strand
synthesis and RNA degradation by DNA Polymerase and RNase
H, followed by clean up. In vitro transcription (IVT) technology,
along with biotin UTP, was employed to generate multiple copies
of biotinylated cRNA The labeled cRNA was purified via Filter
Cartridge and quantified by NanoDrop and RiboGreenH
(Molecular Probes Inc. Eugene, OR). The integrity of cRNA
was evaluated using an ExperionTM(Bio-RAD).
The labeled cRNA target (1.5 mg) was used for hybridization to
an array according to the Illumina Sentrix humanref-6 beadchip
protocol. A maximum of 10 ml cRNA was mixed with a 20 mL
GEX-HYB hybridization solution. The preheated 30 ml assay
sample was dispensed onto the large sample port of each array and
incubated for 18 hours at 58uC at a rocker speed of 5. Following
hybridization, the samples were washed according to the protocol
and scanned with a BeadArray Reader (Illumina, San Diego, CA).
Hemoglobin a (HBA1), b (HBB), d (HBD), and e(HBE1) genes
together with 3 other randomly chosen genes from a differential
expression list were picked for evaluating the array data by using
real-time PCR. Primers were designed using primer 3 and shown
in Table 1. Briefly, 1 mg RNA of each sample was used for cDNA
synthesis following the protocol described in the iScript cDNA
synthesis kit (Bio-RAD). Real Time PCR was performed on the
iQ5 Real-Time PCR detection system with the iQ SYBR Green
Supermix (Bio-RAD) and GAPDH was used as an internal
control. The relative quantification of mRNA expression was
calculated according to the literatures[15,16,17].
Data extraction and statistics
Detection p-values produced by the BeadStudio software were
corrected for multiple hypothesis testing. The R software
Figure 4. Globin reduction improves class separation. The panel on the top shows a heatmap of the marker genes in the pre-globin reduction
data, with gender labels for the experiments on the columns and gene identifiers on the rows. Student’s t-test p-values are also shown on the rows,
describing the statistical separation between the male and female intensity distributions. The panel on the bottom shows a heatmap of the same
marker genes in the post-globin reduction data. The post-globin reduction data shows a clear improvement in biological signal. These genes were
identified, as described in Materials and Methods, by repeated ranking for discriminatory power based on the ReliefF algorithm.
Gene Expression Profiling
PLoS ONE | www.plosone.org6April 2009 | Volume 4 | Issue 4 | e5157
package was used for statistical analysis, as were several
components of the BioConductor libraries for R. The
Wilcoxon Mann-Whitney test was used to identify probes with a
statistically significant separation of adjusted p-values between the
pre- and post-globin reduction groups. Gene selection was
performed independently on the pre- and post- globin reduction
groups using the ReliefF algorithm[21,22,23] as implemented in
the WEKA machine learning package.
In order to show that the probes with consistently improved
detection p-values were not an artifact of random chance, a
randomized simulation was run. Each trial in the simulation
consisted of identifying, for each blood sample (a pre- and post-
globin reduction pair of microarray experiments), those probes
with p-values greater than 0.05 in the pre-globin reduction data
and less than 0.05 in the post-globin reduction data. Those p-
values were then randomly reassigned among the probes whose p-
values were not below 0.05 in both the pre- and post-globin
reduction data. After performing this random re-association for
each blood sample, we applied the same criteria for selecting
significantly improved probes as was applied to the observed data,
and recorded the number of probes that passed the selection
criteria. Thus, our simulation generates a distribution according to
the null hypothesis that an improved p-value pair is equally likely
to be associated with any probe whose p-values are not already
below 0.05 in both the pre- and post-globin reduction samples.
To verify the efficacy of the globin reduction treatment in the
context of gene selection, the following normalization and
selection process was run. Analysis of the raw intensity data
revealed that, between pairs of arrays, a non-linear relationship
existed between corresponding pairs of probes. To correct this, we
used a Loess adjustment as implemented in the BioConductor
package for R . Loess normalization is a standard microarray
normalization method that removes non-linear intensity-depen-
Figure 5. Real-time PCR evaluations of genes identified as differentially expressed between pre- and post-globin reduction
samples in the microarray data. A. A heatmap showing seven of the genes in 11 samples differentially expressed between the pre- and post-
globin reduction samples in the BeadArray data. B. The four hemoglobin genes were significantly reduced and no other gene was markedly changed
by globin reduction in real-time PCR data. The X-axis represents 11 samples and Y-axis represents the gene expression ratio of pre-globin reduction
to post-globin reduction.
Table 1. Primers of selected genes for real-time PCR.
Gene Forward Reverse
DYRK2 CTCACGGACAGATCCAGGTT TGCTTCATTGCTTGTTCAGG
Gene Expression Profiling
PLoS ONE | www.plosone.org7 April 2009 | Volume 4 | Issue 4 | e5157
dent artifacts from the data by iteratively fitting a series of local
piecewise curves to the log-mean-difference plots of each pair of
arrays, and effectively subtracting the curve from the data.
After normalization, the data was split into pre- and post-globin
reduction groups, each containing eight female and three male
samples. We then performed gene selection independently on
these two groups of 11 experiments using the ReliefF algo-
rithm[21,22,23,24]. Our goal when identifying marker genes was
to find genes that are best able to separate male samples from
female samples, and to determine whether the results were more
reproducible in the post-globin reduction data than in the pre-
globin reduction data. Due to the small sample size and unequal
number of male and female subjects, the gene selection process
was repeated ten times on a subset of the data. Each repeat used
three random female samples and compared them against all three
male samples. To perform the gene selection in each repeat, we
used the ReliefF algorithm. ReliefF ranks the individual genes by
their ability to distinguish gender based on intensity values. Briefly,
for each experiment e, the ReliefF algorithm finds e’s nearest
neighbor with the same class (gender) using Euclidian distance
over all genes. The nearest neighbor of the other class (opposite
gender) is also found in the same manner. These selections are
called the hit and miss, respectively. The importance of each gene
is then computed by taking a normalized sum of differences
between the distance from e to the hit and the distance from e to
the miss. Thus, the greater the difference between the hit and miss,
the greater the importance of the gene in distinguishing class. This
method has the advantage that it makes no assumptions about the
distribution of expression intensities. Marker genes were identified
by picking out genes that were consistently ranked highly across at
least half of the repeats. Heatmaps, as well as Student’s t-test p-
values describing the ability of the genes identified by the above
method to distinguish gender were generated in R.
Found at: doi:10.1371/journal.pone.0005157.s001 (0.52 MB
We wish to extend our thanks to Vishal Saxena (Brigham and Women’s
Hospital), Ronald E. Vincent (BIOCON SCIENTIFIC), and Enrico Sassi
(North Dakota State University) for thoughtful reading of the manuscript.
Conceived and designed the experiments: ZT EW. Performed the
experiments: ZT MG EW. Analyzed the data: ZT NP PS HY BB EW
ISK. Contributed reagents/materials/analysis tools: ZT NP PS HY BB
EW ISK. Wrote the paper: ZT NP PS BB EW ISK.
1. Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, et al. (2005) Genome-wide
expression profiling of human blood reveals biomarkers for Huntington’s
disease. Proc Natl Acad Sci U S A 102: 11023–11028.
2. Osman I, Bajorin DF, Sun TT, Zhong H, Douglas D, et al. (2006) Novel blood
biomarkers of human urinary bladder cancer. Clin Cancer Res 12: 3374–3380.
3. Martin KJ, Graner E, Li Y, Price LM, Kritzman BM, et al. (2001) High-
sensitivity array analysis of gene expression for the early detection of
disseminated breast tumor cells in peripheral blood. Proc Natl Acad Sci U S A
4. Feezor RJ, Baker HV, Mindrinos M, Hayden D, Tannahill CL, et al. (2004)
Whole blood and leukocyte RNA isolation for gene expression analyses. Physiol
Genomics 19: 247–254.
5. Rainen L, Oelmueller U, Jurgensen S, Wyrich R, Ballas C, et al. (2002)
Stabilization of mRNA expression in whole blood samples. Clin Chem 48:
6. Thach DC, Lin B, Walter E, Kruzelock R, Rowley RK, et al. (2003) Assessment
of two methods for handling blood in collection tubes with RNA stabilizing
agent for surveillance of gene expression profiles with high density microarrays.
J Immunol Methods 283: 269–279.
7. Chai V, Vassilakos A, Lee Y, Wright JA, Young AH (2005) Optimization of the
PAXgene blood RNA extraction system for gene expression analysis of clinical
samples. J Clin Lab Anal 19: 182–188.
8. Field LA, Jordan RM, Hadix JA, Dunn MA, Shriver CD, et al. (2007)
Functional identity of genes detectable in expression profiling assays following
globin mRNA reduction of peripheral blood samples. Clin Biochem 40:
9. Debey S, Zander T, Brors B, Popov A, Eils R, et al. (2006) A highly
standardized, robust, and cost-effective method for genome-wide transcriptome
analysis of peripheral blood applicable to large-scale clinical trials. Genomics 87:
10. Liu J, Walter E, Stenger D, Thach D (2006) Effects of globin mRNA reduction
methods on gene expression profiles from whole blood. J Mol Diagn 8: 551–558.
11. Wright C, Bergstrom D, Dai H, Marton M, Morris M, et al. (2008)
Characterization of globin RNA interference in gene expression profiling of
whole-blood samples. Clin Chem 54: 396–405.
12. Kuhn K, Baker SC, Chudin E, Lieu MH, Oeser S, et al. (2004) A novel, high-
performance random array platform for quantitative gene expression profiling.
Genome Res 14: 2347–2356.
13. Debey S, Schoenbeck U, Hellmich M, Gathof BS, Pillai R, et al. (2004)
Comparison of different isolation techniques prior gene expression profiling of
blood derived cells: impact on physiological responses, on overall expression and
the role of different cell types. Pharmacogenomics J 4: 193–207.
14. Qin LX, Beyer RP, Hudson FN, Linford NJ, Morris DE, et al. (2006) Evaluation
of methods for oligonucleotide array data via quantitative real-time PCR. BMC
Bioinformatics 7: 23.
15. Pfaffl MW (2001) A new mathematical model for relative quantification in real-
time RT-PCR. Nucleic Acids Res 29: e45.
16. Tian Z, An N, Zhou B, Xiao P, Kohane IS, et al. (2008) Cytotoxic
diarylheptanoid induces cell cycle arrest and apoptosis via increasing ATF3
and stabilizing p53 in SH-SY5Y cells. Cancer Chemother Pharmacol.
17. Wu E, Palmer N, Tian Z, Moseman AP, Galdzicki M, et al. (2008)
Comprehensive dissection of PDGF-PDGFR signaling pathways in PDGFR
genetically defined cells. PLoS ONE 3: e3794.
18. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical
and powerful approach to multiple testing. Journal of the Royal Statistical
Society Series B 57: 289–300.
19. Team RDC (2007) R: A language and Envirnment for Statistical Computing.
20. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, et al. (2004)
Bioconductor: Open software development for computational biology and
bioinformatics. Genome Biology 5: R80.
21. Robnik-Sikonja M, Kononenko I (1997) An adaptation of Relief for attribute
estimation in regression. In: Fisher D, ed. Nashville, TN, USA: Morgan
Kaufmann Publishers Inc. pp 296–304.
22. Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF.
In: De Raedt L, Bergadano F, eds. Catania, Italy: Springer-Verlag New York,
Inc. pp 171–182.
23. Kira K, Rendell L (1992) A practical approach to feature selection. In:
Sleeman D, Edwards P, eds. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc. pp 249–256.
24. Witten IH, Frank E (2005) Data Mining: Practical machine learning tools and
techniques. San Francisco: Morgan Kaufmann Publishers Inc.
25. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of
normalization methods for high density oligonucleotide array data based on
variance and bias. Bioinformatics 19: 185–193.
Gene Expression Profiling
PLoS ONE | www.plosone.org8 April 2009 | Volume 4 | Issue 4 | e5157