Methods 38 (2006) 227–234
1046-2023/$ - see front matter © 2006 Elsevier Inc. All rights reserved.
Application of microarray technology in primate
behavioral neuroscience research
Adriaan M. Karssena,1, Jun Z. Lib,1, Song Hera, Paresh D. Patelc, Fan Mengc,
Simon J. Evansc, Marquis P. Vawterd, Hiroaki Tomitad, Prabhakara V. Choudarye,
William E. Bunney Jr.d, Edward G. Jonese, Stanley J. Watsonc, Huda Akilc,
Richard M. Myersb, Alan F. Schatzberga, David M. Lyonsa,¤
a Department of Psychiatry and Behavioral Sciences, Stanford University, USA
b Stanford Human Genome Center, Department of Genetics, Stanford University, USA
c Molecular and Behavioral Neuroscience Institute, Department of Psychiatry, University of Michigan, USA
d Department of Psychiatry and Human Behavior, University of California, Irvine, USA
e Center for Neuroscience, University of California, Davis, USA
Accepted 15 September 2005
Gene expression proWling of brain tissue samples applied to DNA microarrays promises to provide novel insights into the neurobio-
logical bases of primate behavior. The strength of the microarray technology lies in the ability to simultaneously measure the expression
levels of all genes in deWned brain regions that are known to mediate behavior. The application of microarrays presents, however, various
limitations and challenges for primate neuroscience research. Low RNA abundance, modest changes in gene expression, heterogeneous
distribution of mRNA among cell subpopulations, and individual diVerences in behavior all mandate great care in the collection, process-
ing, and analysis of brain tissue. A unique problem for nonhuman primate research is the limited availability of species-speciWc arrays.
Arrays designed for humans are often used, but expression level diVerences are inevitably confounded by gene sequence diVerences in all
cross-species array applications. Tools to deal with this problem are currently being developed. Here we review these methodological
issues, and provide examples from our experiences using human arrays to examine brain tissue samples from squirrel monkeys. Until spe-
cies-speciWc microarrays become more widely available, great caution must be taken in the assessment and interpretation of microarray
data from nonhuman primates. Nevertheless, the application of human microarrays in nonhuman primate neuroscience research recovers
useful information from thousands of genes, and represents an important new strategy for understanding the molecular complexity of
behavior and mental health.
© 2006 Elsevier Inc. All rights reserved.
Keywords: Primate; Brain; Behavior; Microarray; Gene expression; Prefrontal cortex; Hippocampus
Studies of the neural basis of behavior in nonhuman pri-
mates continue to play a vital role in understanding the
structure and function of brain circuits that mediate emo-
tional, cognitive, and social aspects of behavior in humans.
In part, this is due to practical limitations and ethical con-
cerns that restrict opportunities to conduct controlled
experiments in healthy humans or patients with behavioral
disorders. Rodent models are of limited value because brain
circuits that mediate emotion, cognition, and social behav-
ior diVer signiWcantly between primates and rodents [1,2].
Despite the fact that certain key features of behavior in
humans cannot be adequately modeled in nonhuman
*Corresponding author. Fax: +1 650 498 7761.
E-mail address: firstname.lastname@example.org (D.M. Lyons).
1These authors contributed equally to this work.
A.M. Karssen et al. / Methods 38 (2006) 227–234
primates , comparative studies of homologous brain
regions in humans and nonhuman primates are required to
advance our understanding of behavior and mental health.
Over the years, neuroscientists have addressed funda-
mental questions regarding brain structure and function by
adopting an impressive assortment of techniques and meth-
odologies. In this regard, microarray technology stands as a
uniquely powerful new tool for the global characterization
of novel molecular pathways and mechanisms underlying
behavior. Microarray studies aim to measure the steady-
state levels of all actively transcribed genes, with the
implicit premise that observable features of behavior are
often associated with recognizable patterns of gene expres-
sion that reXect structural, functional, and metabolic adap-
tations in relevant brain regions [4–6]. Since the
introduction of microarray technology a decade ago [7,8],
large-scale gene expression studies have grown exponen-
tially in neuroscience research. Microarrays dramatically
increase the capacity and eYciency of data collection, and
facilitate a systems level approach [9,10]. At the same time,
however, microarrays create unprecedented challenges in
statistical analysis and biological interpretation in neurosci-
In this report, we broadly examine technical and analyti-
cal aspects of gene expression microarray methods relevant
to the study of primate behavior. SpeciWcally, we discuss the
diVerent platforms that are now available, the impact of
various analysis decisions on the Wnal results, and the chal-
lenges that arise when working with complex tissues in the
brain. We also consider how gene sequence diVerences com-
plicate the application of microarrays designed for humans
to measure samples collected from monkeys or apes. To
illustrate aspects of the analysis of microarray data from
nonhuman primates, we present examples from our use of
human arrays to examine the diVerences between hippo-
campus and dorsolateral prefrontal cortex in adult squirrel
2. Microarray platforms
Microarrays measure gene expression by quantifying the
amount of hybridization between the RNA (or cDNA)
under study and DNA probes that are immobilized on a
solid surface. The DNA probes on many recent, compre-
hensive platforms are designed to cover the entire transcrip-
tome, i.e., the steady-state levels of all known transcribed
genes. Typically, RNA is extracted from cell or tissue sam-
ples, labeled with a marker (usually a Xuorescent dye), and
hybridized to the arrays. The Xuorescent intensity values at
each probe location are then determined from the scanned
optical image of the array, and these intensities reXect the
abundance of the targeted RNA in the sample of interest.
When each sample is analyzed on a diVerent array, the rela-
tive expression levels of every mRNA can be compared
Currently, two kinds of microarray probes are prevalent:
complementary DNA (cDNA) and oligonucleotides.
Probes can be pre-synthesized and then robotically printed
in a predeWned matrix on microscope slides (i.e., spotted
arrays), or lithographically synthesized directly on silicon
chips (i.e., oligonucleotide arrays). The cDNA probes are
generally spotted on standard glass slides. Oligonucleotides
can be either printed (e.g., Agilent arrays) or directly syn-
thesized (e.g., AVymetrix Genechips). Multiple probes may
be designed to target the same mRNA. For example,
AVymetrix Genechips use both perfectly matched (PM) oli-
gonucleotide probes that correspond to a segment of the
targeted transcript, and mismatched (MM) probes that are
identical in sequence to the corresponding PM probe except
for a single MM base at the central position . MM
probes are intended to provide a reference signal to control
for nonspeciWc binding. Each PM probe paired with its cor-
responding MM probe forms a probe pair. Generally, 11–
20 probe pairs form a probe set designed to measure a tran-
script of interest. Probes of diVerent lengths have been used.
AVymetrix uses 25-base PM and MM probes, whereas
other manufacturers of microarrays use 50-, 60-, or 70-base
PM probes without corresponding MM probes.
Researchers can obtain microarrays by purchasing
them from commercial sources, or by printing them on
their own [12,13]. Commercial arrays have the advantage
of stability, quality control, standardized protocols, soft-
ware for data processing, and a broad base of users for
sharing expertise and for multi-study data integration.
Self-printed arrays that are made at individual laborato-
ries or academic facilities are usually less expensive and
more Xexible with regard to probe content, but require
careful control of printing quality, and are generally less
amenable to meta-analysis.
The various platforms are constantly improving in terms
of sensitivity, speciWcity, ease of use, and coverage of all
known genes. Recent studies indicate that when investiga-
tors carefully design and execute their experiments with
standardized protocols and appropriate analyses, the most
common microarray platforms are comparable in perfor-
mance and consistent across platforms [14,15]. One of the
major sources of variation between platforms is in probe
design, as probes that interrogate diVerent segments of a
given gene transcript may produce diVerent results due to
alternative splicing of mRNA, or cross-hybridization with
other transcripts. It is therefore important to apply multiple
methods, including quantitative PCR and in situ hybridiza-
tion histochemistry, to validate important conclusions.
3. Microarray data analysis
Like all other studies, a microarray-based study consists
of an experimental design, data collection, statistical analy-
sis, and interpretation of the results. Issues related to sam-
ple size, technical replication, and assignment of samples to
arrays diVer according to the type of study that the
researcher plans to conduct. Generally, microarray experi-
ments are designed to provide comparisons between groups
of samples to generate lists of diVerentially expressed genes;
A.M. Karssen et al. / Methods 38 (2006) 227–234
clustering and classiWcation of genes or tissue samples; or
class prediction of unknown samples [16,17].
After the collection of scanned optical images of each array
and before the statistical analysis, a series of decisions must be
made with respect to data preprocessing. This includes image
analysis to extract the raw signal intensity values, quality con-
trol, background subtraction and normalization, gene Wlter-
ing, and, in cases where several probes are designed to target
each transcript (e.g., AVymetrix Genechips), summarization of
the intensities of all individual probes in a given probe set.
Due to space limitations, in the following sections we brieXy
review background correction, data normalization, probe set
summary methods, probe annotation, statistical criteria, gene
Wltering, and aspects of gene ontology that are particularly rel-
evant to primate research. The analysis of two-color cDNA
microarray data must also deal with an additional factor
known as “dye bias,” i.e., some Xuorescent molecules are
incorporated more eYciently than others. We refer readers
interested in this issue to several relevant reviews [18,19], and
focus instead on single-color microarrays, particularly the
3.1. Processing methods for AVymetrix data
For AVymetrix data, various software tools have been
developed to perform background correction, normalization,
and probe summary. These include Microarray Suite 5
(MAS5), the default software provided by AVymetrix, and
third-party tools such as the Robust Multi-chip Average
(RMA) method , DNA Chip Analyzer (dCHIP) , the
Positional-Dependent-Nearest-Neighbor (PDNN) method
, and RMA corrected for GC-content (GCRMA) .
All of these tools use the hybridization signals from all indi-
vidual probes within a probe set to generate a single expres-
sion value for each probe set. They diVer, however, in the
algorithms applied; and this leads to signiWcant diVerences in
the outcome. One of the major diVerences is the way in which
MM probe information is included in the analysis. MAS5
generates expression summary values based on PM–MM
diVerences. In contrast, the default version of RMA uses only
PM probes because it was reported that incorporating MM
probe intensities appears to add noise with no obvious gain
in sensitivity [24,25]. For this reason, many users of dCHIP
also prefer the PM-only version to the PM–MM version.
GCRMA is an extension of the RMA algorithm that incor-
porates some MM probe information by using physical mod-
els of nonspeciWc hybridization based on the GC-content of
the probes .
The choice of which method to use aVects the outcome of
the downstream analysis, especially when the samples of inter-
est do not have large gene expression diVerences [26–28]. In
comparative studies of human samples, MAS5 often diVers
from the other methods, but none of the others has emerged
as consistently better [20,27,29,30]. We have observed a similar
pattern when the diVerent methods are used to process human
AVymetrix Genechips applied to samples of squirrel monkey
hippocampus and dorsolateral prefrontal cortex. Of the 12,666
probe sets examined, MAS5 identiWed many more genes as
being diVerentially expressed between brain regions compared
to all other methods at a given fold-diVerence criterion (Table
1). However, a much smaller percentage of the diVerences
identiWed by MAS5 were statistically signiWcant, i.e., 26–33%
for MAS5 compared to 71–98% for the other methods. More-
over, the correlations between MAS5 and each of the other
methods were consistently less than the correlations among
the other methods for the entire set of 12,666 between region
fold-diVerence scores (Table 2). MAS5 appears to generate
data for squirrel monkey brain tissue samples that diVer from
the other methods. This may reXect the eVect of human–mon-
key sequence diVerences in PM probes compounded by addi-
tional background noise from sequence diVerences in the
corresponding MM probes. It may be best to avoid the use of
mismatch probe information on gene expression microarrays
used in cross-species applications.
3.2. A bioinformatic challenge: probe annotation
Appropriate interpretation of microarray results
depends on the correct annotation of individual probes.
The assignment of gene identity to probes can be problem-
atic because the deWnitions of transcribed genes continue to
evolve along with the ongoing progress in genome sequenc-
ing and annotation . Currently, many models of known
transcripts are available to serve as the basis for probe
annotation, e.g., Unigene, Refseq, and ENSEMBL genes.
Occasionally, diVerent models may assign diVerent gene
identities to a given probe. This may result in diVerent inter-
pretations of the same array data. Therefore, it is important
Comparison of methods used to process human AVymetrix Genechips for
a study of diVerences between brain regions in adult squirrel monkeys
a,bNumbers of genes diVerentially expressed in dorsolateral prefrontal
cortex compared to hippocampus at §1.2- and §1.5-fold diVerence
cAlso provided are the numbers of genes identiWed as diVerentially
expressed at a False discovery rate (FDR) of 5%.
dResults from the PM-only version of dCHIP are presented.
Method 1.5-fold diVerenceb
Correlations between fold-diVerence scores computed using four diVerent
methods to process human AVymetrix Genechips for a study of diVer-
ences between squirrel monkey hippocampus and dorsolateral prefrontal
aEach Pearson correlation coeYcient is based on the between region
fold-diVerence scores for 12,666 probe sets.
bResults from the PM-only version of dCHIP are presented.
A.M. Karssen et al. / Methods 38 (2006) 227–234
to state in each published study the speciWc version of the
gene model that is used for probe annotation.
Every few months, AVymetrix updates the mapping of
the target GenBank sequences of each probe set to the most
recent gene identity, but does not update the mapping of
individual probes. When a probe from a given probe set
matches the sequence of another unintended gene, the
probe is not marked as unreliable, or reassigned to a new
probe set that corresponds to the other gene. There is also
much redundancy in the AVymetrix annotation system, as
certain genes are targeted by more than one probe set.
Microarray data are diYcult to interpret when diVerent
probe sets for the same gene yield discordant results.
One way to improve annotation is to remap all individ-
ual probes to the most recent gene deWnitions for a variety
of common gene models, and then reassemble the anno-
tated probes into new probe sets. We have employed this
approach for diVerent versions of AVymetrix Genechips in
our collaborative research eVort to study gene expression
changes in the human brain associated with psychiatric dis-
orders. Every probe that can be uniquely assigned is period-
ically remapped to Refseq, DoTS transcripts, Unigene,
ENSEMBL gene, ENSEMBL transcripts, and ENSEMBL
exons. The annotated probes are then reassembled, and the
resulting probe set deWnition Wles (i.e., CDF Wles) are made
freely available for public use at http://brainar-
3.3. Statistical criteria
It is widely recognized that microarray experiments
should include replicated samples . When groups of rep-
licated samples are compared, appropriate statistical meth-
ods are needed to assess the signiWcance of observed
diVerences in gene expression levels. Although fold-diVer-
ence thresholds are often reported as the primary statistic
in microarray studies, these fail to account for technical or
biological variations, which usually diVer from gene to gene
. To report the reliability of microarray results, statistics
such as the Student t test and its associated P values
are most often presented. As tens of thousands of genes are
analyzed in a typical study, many false positives
are expected by chance. For example, when testing 12,000
independent hypotheses (i.e., one for each probe set),
approximately 600 genes will be misidentiWed as being sta-
tistically signiWcant at a signiWcance level of P<0.05 due to
chance alone. The overall level of statistical signiWcance
must therefore be adjusted to account for multiple tests.
The Bonferroni correction assumes that all genes vary
independently, and is commonly regarded as too conserva-
tive for microarray data because many genes are in fact reg-
ulated in a correlated fashion. Permutation-based
correction methods have been developed to determine the
adjusted family-wise levels of signiWcance for individual
genes . The calculation of false discovery rates (FDRs)
[35–37] is a diVerent method used to adjust for multiple
tests. Instead of specifying the adjusted P value for each
gene, FDR provides the estimated ratio of false positives
among the entire set of signiWcant genes at a certain cutoV.
With an FDR cutoV of 5% applied to our squirrel monkey
data, dCHIP and to a lesser extent RMA identify more
genes compared to either MAS5 or GCRMA (Table 1).
Species diVerences in GC-content may adversely inXuence
the accuracy of data from GCRMA. The speciWc cutoVs
applied to obtain a list of diVerentially expressed genes
depend on the particular circumstances of a given study,
and can strongly impact the downstream analysis . In
many situations cutoVs may be less important, however,
than the ranking of genes by their strength of evidence.
Rankings provide a prioritized list of candidate genes for
follow-up validation, hypothesis formulation, or marker
3.4. Higher-level data interpretation using gene ontology
When a list of genes is obtained, the functions of a few
key genes on the list may lead to valuable insights and
important discoveries. Often, however, it is also informative
to look for general biological “themes” represented by the
list as a whole. One may, for example, examine the biologi-
cal functions of all of the genes to determine whether par-
ticular themes are overrepresented by the list [13,26]. To
accomplish this type of higher-level analysis, a bioinfor-
matic infrastructure is required in which the functions of
known gene products are systematically annotated to
microarray probe sets in a form that is amenable to
Gene ontology , KEGG , and Genmapp  are
examples of gene ontology resources for managing the
wealth of biological knowledge in a controlled and scalable
fashion. Once each gene is linked to one or several diVerent
functional categories, researchers can begin to ask whether
speciWc functional categories are enriched or depleted on a
list of genes identiWed by applying certain cutoVs and Wlter-
ing criteria to microarray data. Recently, numerous tools
have been developed to identify patterns of annotation
terms on lists of ranked or unranked genes (http://
www.geneontology.org/GO.tools.shtml), and to evaluate
the statistical signiWcance of speciWc distribution patterns
of annotation terms [42,43]. These tools are used to advance
our understanding of the neurobiology of behavior beyond
single gene-by-gene conclusions, and represent an impor-
tant step toward fulWlling a central promise of the systems
4. Challenges in studying brain tissue
Lifelong plasticity, individual variability, complexity,
and heterogeneity of the brain, and the need for high qual-
ity tissue present problems for microarray studies in behav-
ioral neuroscience research. Gene expression proWles of
brain tissue samples likely reXect the cumulative eVect of
experiences gained over the entire life span. DiVerences in
A.M. Karssen et al. / Methods 38 (2006) 227–234
early maternal availability in squirrel monkeys, for exam-
ple, are known to alter numerous neurobiological outcomes
measured much later in life . Gene expression proWles in
animals are often highly variable even in carefully con-
trolled experiments . Moreover, post-mortem brain tis-
sue samples provide only a single snapshot of the
transcriptome at one point in time. Gene expression proWles
of brain tissue samples from humans are aVected by condi-
tions at death [47,48], that are generally unrelated to prior
experiences or behavioral traits.
In terms of speciWc practical matters, the power to detect
diVerential expression in brain tissue samples is inXuenced
by sample size, eVect size, variance, and the false positive
rates that investigators are willing to accept. Sample size is
limited by the availability of suitable subjects: some pri-
mate populations are larger than others, while human dis-
orders may be common or rare. Because the great apes (i.e.,
chimpanzees, gorillas, orangutans, and gibbons) are exten-
sively protected in the wild, and tissue samples from great
apes in captivity are generally limited, Old-World and New-
World monkeys have been most extensively studied. Of
these primates, high quality brain tissue from Old-World
monkeys (e.g., macaques) is most easily obtained because
these animals are maintained in large federally funded
breeding facilities for biomedical research.
EVect size varies from gene to gene, but gene expression
diVerences in case-control comparisons rarely exceed a
2-fold threshold in microarray studies of the brain [49–51].
The brain is an unique complex organ in which many gene
transcripts are expressed at low levels [52–54], or are
restricted to speciWc subpopulations of cells [53,55].
Although new technologies will certainly increase the sensi-
tivity of arrays, the current generation of microarrays most
readily detects higher abundance gene transcripts [49,56].
The heterogeneous anatomical distribution creates prob-
lems because the levels of expression for certain transcripts
and fold-diVerence scores in case-control comparisons may
be “diluted” in microarray studies that rely on bulk brain
tissue with limited anatomical resolution. Laser capture
microdissection in combination with linear ampliWcation
may be used to isolate homogenous subsets of cells
in situations where these can be readily discerned [50,57,58].
Small but real diVerences in gene expression can be
obscured by extraneous sources of variation in microarray
studies of the brain. Variations in gene expression levels
may arise from numerous technical and biological factors,
which, if not carefully controlled, will compromise the
power and accuracy of microarray results. A sound experi-
mental design must take all of these factors into consider-
ation, and, whenever possible adopt extensive technical and
biological replications. We have found, for example, that
reagents from diVerent lots, RNA labeling, hybridization,
and optical scanning may all introduce systematic variation
between diVerent batches of hybridized arrays within a sin-
gle experiment. Such technical variation is superimposed on
naturally occurring interindividual diVerences between the
biological samples of interest, and adversely aVects the out-
come of experiments when gene expression diVerences
between cases and controls are small.
To assess all sources of variation, we routinely create
color-coded correlation matrices to illustrate similarities
and diVerences between all possible pairs of arrays within a
particular study. These pair wise correlation “heatmaps”
help to identify technical and biological outliers within a set
of arrays (Fig. 1A). Poor-performing outlier arrays must be
excluded from further analysis, or the aVected samples need
to be rerun. After the exclusion of outlier arrays, the pair
wise color-coded correlation matrix is examined for
Fig. 1. Grayscale-coded correlation matrices illustrate similarities and diVerences between all possible pairs of arrays within a study of squirrel monkey
hippocampus and dorsolateral prefrontal cortex (dlPFC). Within each matrix, individual arrays are listed in the same order from left to right and bottom
to top. Lightly shaded cells indicate high Pearson correlations calculated from all probe set signal intensities for a given pair of arrays, and dark cells indi-
cate low correlations. (A) Correlations between 12 dlPFC and 12 hippocampal samples collected from the same 12 monkeys show modest diVerences
between the two regions: higher correlations are observed between pairs of arrays from the same region (lightly shaded cells) compared to arrays from
diVerent brain regions (darker cells). The hippocampal sample depicted by the vertical and horizontal stripe of dark cells is an aberrant outlier array. (B)
After exclusion of the outlier array and the corresponding dlPFC array from the same monkey, the correlation matrix now reveals a strong batch eVect, as
shown by two distinct blocks of highly correlated arrays within each brain region. The batch eVect is also evident in (A), but is less pronounced due the
presence of the outlier array, which alters the overall grayscale-coding. (C) Median centering of the arrays within blocks adjusts for the batch eVect, and
highlights the brain region-related pattern with highly homogenous samples in each region.
A.M. Karssen et al. / Methods 38 (2006) 227–234
evidence of sample batch eVects (Fig. 1B). Technical factors
related to processing subsets of samples together may alter
an entire batch of microarrays with a constant eVect. Once
identiWed, batch diVerences can be adjusted by using Wxed-
eVect models, or by subtracting from each probe set on
each array the median intensity value of all samples in the
batch. Median centering does not disturb the relative rank-
ing of samples in the batch, and often provides a more
homogeneous data set from which previously obscured pat-
terns may emerge (Fig. 1C).
Because many sources of variation are impossible to
control after the fact, microarray studies of primate brain
tissue should adopt sound design features such as randomi-
zation, double-blind analysis, and/or balanced assignment
of samples to diVerent experimental conditions. After com-
pletion of the data analysis, other methods such as quanti-
tative PCR, in situ hybridization histochemistry, and
immunohistochemistry are often applied to the same set of
samples to validate the microarray results. This type of val-
idation is important, of course, but generally no amount of
reanalysis of the same samples will be as convincing as con-
Wrmation with independent methods applied to new sample
5. Species diVerences: a unique challenge
Currently, the application of microarray technology in
primate research is hampered by the limited availability of
species-speciWc arrays. This is primarily due to the lack of
suYcient gene sequence information for most nonhuman
primates. Even though the genomes of the chimpanzee
&orgDChimp) and rhesus macaque monkey (http://www.
hgsc.bcm.tmc.edu/projects/rmacaque/) have been recently
sequenced, human arrays provide the most broadly accessi-
ble option for the majority of primatologists. New micro-
array platforms designed for primates will gradually
become available in the future. AVymetrix has announced
that a macaque monkey Genechip will be released in 2005,
and a program funded by the European Consortium (http:/
/www.eupeah.org/) is now generating marmoset-speciWc
cDNA arrays. Such eVorts, however, are costly and time
consuming. To develop spotted cDNA microarrays for a
previously uncharacterized transcriptome, one needs to Wrst
obtain a large number of unique cDNA or expressed
sequence tag (EST) sequences . Known cDNA
sequences are also required to design oligonucleotide
probes. Without known sequences, primatologists will
likely continue to rely on the use of microarrays designed
for human transcripts [28,60,61]. This approach is based on
the assumption that similar gene sequences in closely
related species allow reasonably reliable detection of many
Sequence divergence is, however, an important problem
in all cross-species applications of microarrays. Even a 5%
sequence diVerence means that, on average, each 25-base
oligonucleotide probe will contain one mismatch. Because a
contiguous stretch of 16 or more base pair matches has
been considered suYcient for stable hybridization ,
longer probes like those on cDNA microarrays are thought
to be less susceptible to sequence divergence-related prob-
lems. Recent evidence suggests, however, that even these
probes can severely distort results obtained from between-
species comparisons .
The problem of sequence divergence for cross-species
microarray applications is especially troublesome when the
research objective is to compare diVerent species with
respect to evolutionary questions [25,64–68]. DiVerences
observed between species may reXect either genuine diVer-
ences in gene expression levels, or methodological artifacts
related to sequence diVerences that impair microarray
hybridization. These two eVects are entirely confounded,
and hinder the interpretation of data obtained from human
arrays applied to samples obtained from monkeys or apes.
In cross-species comparisons, sequence divergence can also
distort normalization of the signal intensities on each array,
and may result in overestimation of the nonhuman primate
expression levels of genes without sequence diVerences .
For studies that aim to compare groups of samples within a
single species, sequence diVerences carry the same eVect
across all samples under study, and no longer represent a
systematic bias in microarray studies. Yet even for these
within-species applications, a detailed understanding of
probe performance is valuable because poorly hybridized
probes contribute to background noise, and adversely inXu-
ence probe signal summarization for the aVected probe sets.
Only recently has the impact of sequence divergence on
microarray applications been systematically examined for
nonhuman primates. For example, using a detection reli-
ability tool provided by MAS5 software for human
AVymetrix Genechips, Chismar et al.  reported that
genes inconsistently called “Present” among pairs of techni-
cal replicates of frontal lobe tissue had 2-fold lower probe
set signal intensities in rhesus macaque monkeys, and were
more variable in monkeys compared to humans. This pre-
sumably reXects increased variability in probe signal inten-
sities for probe sets aVected by gene sequence mismatches.
Although fewer genes were consistently called “Present”
within pairs of technical replicates for monkey, a similar
percentage of all genes (»8%) was found to switch from
“Present” to “Absent,” or vice versa, in both human and
monkey. These Wndings suggest that sequence diVerences
do not aVect the consistency of calls (i.e., “Present” or
“Absent”), but result in the loss of coverage of the monkey
transcriptome. Other investigators have likewise reported
that fewer genes are consistently called “Present” when
nonhuman primate tissue is compared to human samples
hybridized on human microarrays [65,67,71,72]. In 11 adult
squirrel monkeys we found, for example, that 16% of 12,666
probe sets were consistently called “Present” in hippocam-
pus and dorsolateral prefrontal cortex, and 43% of the
probe sets were called “Absent” on all 22 microarrays. In
12 samples of hippocampus and 12 samples of dorsolateral
prefrontal cortex obtained from 12 healthy adult humans,
A.M. Karssen et al. / Methods 38 (2006) 227–234
these same assessments were, respectively, 24 and 36%. Sim-
ilar results have been reported in studies of humans and
rhesus macaque monkeys for diVerent tissue types [67,71].
Several methods have been proposed to deal with the
sequence divergence problem for studies of nonhuman pri-
mates. In primates with known gene sequences, the AVyme-
trix probes on current human platforms can be individually
realigned to the nonhuman primate sequences to identify
probes that perfectly match conserved segments of ortholo-
gous genes. Only these probes are then used to summarize
probe signal intensities for the nonhuman primate samples
. Wang et al.  reported that the use of more highly
homologous probe sets reduces the relative discordance
between species with respect to signal intensities, but the
sequence-based strategy is of limited utility in species for
which most gene sequences are unknown. Wang et al. 
attempt to deWne interspecies conserved probe sets on the
AVymetrix HG-U133 Plus2.0 Genechip that covers nearly
thirty thousand human Unigenes, successfully deWned only
2704 macaque monkey genes and 1190 chimpanzee genes.
These numbers will undoubtedly increase with the recent
release of the draft genome sequence for rhesus macaque
monkeys and chimpanzees. Recently, for example, we
realigned the probes for diVerent versions of human AVyme-
trix Genechips to putative chimpanzee transcripts deWned
according to human Unigenes. From the AVymetrix probes
that can be uniquely assigned, we then created custom deWni-
tion probe sets Wles corresponding to nearly 19,000 chimpan-
zee “Unigenes.” These Wles are freely available for download
Given the limitations of sequence-based methods for the
majority of nonhuman primates, a potentially fruitful
approach is to exclude problematic probes based solely on
probe hybridization characteristics obtained from actual
microarray data [25,62,74]. Probes harboring between-spe-
cies sequence diVerences are likely to show hybridization
levels in nonhuman primates that are inconsistent with
those in human samples compared to other probes in the
same probe set, as long as the majority of the probes are
showing comparable characteristics in humans and in non-
human primates . Masking out probes based on aber-
rant behavior may indirectly lead to the exclusion of probes
containing sequence diVerences, even in the absence of
actual gene sequence information for nonhuman primates.
This approach has been used to identify a signiWcant pro-
portion of sequence diVerences in chimpanzee brain tissue
[25,65,66], and eVectively reduces the relative number and
magnitude of gene expression diVerences between chimpan-
zee and humans . Currently, the development of tools of
this type is an active area research.
6. Summary and conclusions
Nonhuman primates will continue to play a vital role in
research on the neural basis of emotional, cognitive, and
social aspects of behavior. In recent years, the application
of microarray technology has enabled behavioral neurosci-
entists to analyze global patterns of gene expression within
deWned brain regions. The use of microarrays to study
brain tissue presents, however, various limitations and
unique challenges. The primate brain is a complex organ,
with a heterogeneous distribution of distinct subpopula-
tions of cells, intricate signaling and regulatory circuits, and
exquisite lifelong sensitivity to environmental variation.
These factors result in high levels of interindividual vari-
ability in gene expression, and often subtle diVerences
between the speciWc conditions under study. These inherent
diYculties highlight the importance of a careful controlled
experimental design, extensive replication, standard proto-
cols, and sound analysis practices that encompass key con-
siderations about normalization, probe summary and
annotation, statistical criteria, and higher-level analysis of
gene ontology. In particular, the lack of primate gene
sequence information and, consequently, the limited avail-
ability of species-speciWc microarrays, is a major problem
for the near future. The use of arrays designed for humans
is currently the only option available to many primatolo-
gists, and great attention must be paid to the eVect of
sequence diVerences on the quality and interpretation of
cross-species applications of microarrays. Finally, it is criti-
cal that the outcomes of microarray studies of nonhuman
primates are validated and extended by the classic tech-
niques (e.g., quantitative PCR, in situ hybridization, and
immunohistochemistry) applied to new, independent sets of
brain tissue samples.
The authors are members of the Pritzker Neuropsychiat-
ric Disorders Research Consortium, which is supported by
the Pritzker Neuropsychiatric Disorders Research Fund
L.L.C. A shared intellectual property agreement exists
between the Pritzker Neuropsychiatric Disorders Research
Fund L.L.C and the University of Michigan, the University
of California, and Stanford University to encourage the
development of appropriate Wndings for research and clini-
 L.J. Porrino, D. Lyons, Cereb. Cortex 10 (2000) 326–333.
 T.M. Preuss, J. Cogn. Neurosci. 7 (1995) 1–24.
 D.M. Lyons, in: A.F. Schatzberg, C.B. NemeroV (Eds.), Textbook of
Psychopharmacology, third ed., American Psychiatric Publishing,
Washington, DC, USA, 2003, pp. 117–126.
 G.E. Robinson, Y. Ben-Shahar, Genes Brain Behav. 1 (2002) 197–203.
 D.P. Toma, K.P. White, J. Hirsch, R.J. Greenspan, Nat. Genet. 31
 C.W. WhitWeld, A.M. Cziko, G.E. Robinson, Science 302 (2003) 296–299.
 D.J. Lockhart, H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S.
Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, E.L. Brown,
Nat. Biotechnol. 14 (1996) 1675–1680.
 M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Science 270 (1995)
 J.D. Dougherty, D.H. Geschwind, Neuron 45 (2005) 183–185.
234 Download full-text
A.M. Karssen et al. / Methods 38 (2006) 227–234
 H. Kitano, Science 295 (2002) 1662–1664.
 R.J. Lipshutz, S.P. Fodor, T.R. Gingeras, D.J. Lockhart, Nat. Genet.
21 (1999) 20–24.
 L.K. Nisenbaum, Genes Brain Behav. 1 (2002) 27–34.
 S.L. Karsten, L.C. Kudo, D.H. Geschwind, Int. Rev. Neurobiol. 60
 J.E. Larkin, B.C. Frank, H. Gavras, R. Sultana, J. Quackenbush, Nat.
Methods 2 (2005) 337–344.
 R.A. Irizarry, D. Warren, F. Spencer, I.F. Kim, S. Biswal, B.C. Frank,
E. Gabrielson, J.G. Garcia, J. Geoghegan, G. Germino, C. GriYn, S.C.
Hilmer, E. HoVman, A.E. Jedlicka, E. Kawasaki, F. Martinez-Murillo,
L. Morsberger, H. Lee, D. Petersen, J. Quackenbush, A. Scott, M. Wil-
son, Y. Yang, S.Q. Ye, W. Yu, Nat. Methods 2 (2005) 345–350.
 G.A. Churchill, Nat. Genet. 32 (Suppl.) (2002) 490–495.
 Y.H. Yang, T. Speed, Nat. Rev. Genet. 3 (2002) 579–588.
 M.K. Kerr, M. Martin, G.A. Churchill, J. Comput. Biol. 7 (2000) 819–
 K. Dobbin, J.H. Shih, R. Simon, Bioinformatics 19 (2003) 803–810.
 R.A. Irizarry, B.M. Bolstad, F. Collin, L.M. Cope, B. Hobbs, T.P.
Speed, Nucleic Acids Res. 31 (2003) e15.
 C. Li, W.H. Wong, Proc. Natl. Acad. Sci. USA 98 (2001) 31–36.
 L. Zhang, M.F. Miles, K.D. Aldape, Nat. Biotechnol. 21 (2003)
 Z.J. Wu, R.A. Irizarry, R. Gentleman, F. Martinez-Murillo, F. Spen-
cer, J. Am. Stat. Assoc. 99 (2004) 909–917.
 R.A. Irizarry, B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonel-
lis, U. Scherf, T.P. Speed, Biostatistics 4 (2003) 249–264.
 W.P. Hsieh, T.M. Chu, R.D. WolWnger, G. Gibson, Genetics 165
 C. Konradi, Brain Res. Rev. 50 (2005) 142–155.
 K. Shedden, W. Chen, R. Kuick, D. Ghosh, J. Macdonald, K.R. Cho,
T.J. Giordano, S.B. Gruber, E.R. Fearon, J.M. Taylor, S. Hanash,
BMC Bioinformatics 6 (2005) 26.
 P.E. Lachance, A. Chaudhuri, J. Neurochem. 88 (2004) 1455–1469.
 Y. Barash, E. Dehan, M. Krupsky, W. Franklin, M. Geraci, N. Fried-
man, N. Kaminski, Bioinformatics 20 (2004) 839–846.
 Z. Wu, R.A. Irizarry, Nat. Biotechnol. 22 (2004) 656–658.
 P. Wang, M. Dai, R. Thompson, S.J. Watson, F. Meng, in: M. He, G.
Narasimhan, S. Petoukhov, (Eds.), Advances in bioinformatics and its
applications: Proceedings of the International Conference, Fort Lau-
derdale, Florida edn, World ScientiWc Publishing Company, 2005.
 M.L. Lee, F.C. Kuo, G.A. Whitmore, J. Sklar, Proc. Natl. Acad. Sci.
USA 97 (2000) 9834–9839.
 P. Pavlidis, W.S. Noble, Genome Biol. 2 (2001) 0042.1–0042.15.
 Y.C. Ge, S. Dudoit, T.P. Speed, Test 12 (2003) 1–77.
 Y. Benjamini, Y. Hochberg, J. R. Stat. Soc. Ser. B—Methodol. 57
 Y. Benjamini, D. Yekutieli, Ann. Stat. 29 (2001) 1165–1188.
 J.D. Storey, R. Tibshirani, Proc. Natl. Acad. Sci. USA 100 (2003)
 K.H. Pan, C.J. Lih, S.N. Cohen, Proc. Natl. Acad. Sci. USA 102 (2005)
 M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M.
Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris,
D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E.
Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, Nat. Genet. 25
 M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, Nucleic
Acids Res. 32 (2004) D277–D280.
 K.D. Dahlquist, N. Salomonis, K. Vranizan, S.C. Lawlor, B.R. Conk-
lin, Nat. Genet. 31 (2002) 19–20.
 V.K. Mootha, C.M. Lindgren, K.F. Eriksson, A. Subramanian, S.
Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laur-
ila, N. Houstis, M.J. Daly, N. Patterson, J.P. Mesirov, T.R. Golub, P.
Tamayo, B. Spiegelman, E.S. Lander, J.N. Hirschhorn, D. Altshuler,
L.C. Groop, Nat. Genet. 34 (2003) 267–273.
 D.A. Hosack, G. Dennis Jr., B.T. Sherman, H.C. Lane, R.A. Lempicki,
Genome Biol. 4 (2003) R70.
 G.L. Henry, K. Zito, J. Dubnau, Curr. Opin. Neurobiol. 13 (2003)
 D.M. Lyons, A.F. Schatzberg, Neurobiol. Learn. Mem. 80 (2003)
 C.C. Pritchard, L. Hsu, J. Delrow, P.S. Nelson, Proc. Natl. Acad. Sci.
USA 98 (2001) 13266–13271.
 J.Z. Li, M.P. Vawter, D.M. Walsh, H. Tomita, S.J. Evans, P.V. Chou-
dary, J.F. Lopez, A. Avelar, V. Shokoohi, T. Chung, O. Mesarwi, E.G.
Jones, S.J. Watson, H. Akil, W.E. Bunney Jr., R.M. Myers, Hum. Mol.
Genet. 13 (2004) 609–616.
 H. Tomita, M.P. Vawter, D.M. Walsh, S.J. Evans, P.V. Choudary, J.
Li, K.M. Overman, M.E. Atz, R.M. Myers, E.G. Jones, S.J. Watson, H.
Akil, W.E. Bunney Jr., Biol. Psychiatry 55 (2004) 346–352.
 S.J. Evans, N.A. Datson, M. Kabbaj, R.C. Thompson, E. Vreugdenhil,
E.R. De Kloet, S.J. Watson, H. Akil, Eur. J. Neurosci. 16 (2002) 409–413.
 N.A. Datson, L. Meijer, P.J. Steenbergen, M.C. Morsink, S. van der
Laan, O.C. Meijer, E.R. de Kloet, Eur. J. Neurosci. 20 (2004) 2541–
 K. Mirnics, Nat. Rev. Neurosci. 2 (2001) 444–447.
 H. Ozawa, E. Kushiya, Y. Takahashi, Neurosci. Lett. 18 (1980) 191–196.
 Y. Takahashi, Prog. Neurobiol. 38 (1992) 523–569.
 N.A. Datson, J. van der Perk, E.R. de Kloet, E. Vreugdenhil, Hippo-
campus 11 (2001) 430–444.
 A. Sawatari, E.M. Callaway, Neuron 25 (2000) 459–471.
 D.H. Geschwind, Proc. Natl. Acad. Sci. USA 97 (2000) 10676–10678.
 P.P. Sanna, A.R. King, L.D. van der Stap, V. Repunte-Canonigo,
Brain Res. Brain Res. Protoc. 15 (2005) 66–74.
 V.A. Vincent, J.J. DeVoss, H.S. Ryan, G.M. Murphy Jr., J. Neurosci.
Res. 69 (2002) 578–586.
 C.L. Magness, P.C. Fellin, M.J. Thomas, M.J. Korth, M.B. Agy, S.C.
Proll, M. Fitzgibbon, C.A. Scherer, D.G. Miner, M.G. Katze, S.P.
Iadonato, Genome Biol. 6 (2005) R60.
 D.E. Redmond Jr., J.L. Zhao, J.D. Randall, A.C. Eklund, L.O. Eusebi,
R.H. Roth, S.R. Gullans, R.V. Jensen, Brain Res. Dev. Brain Res. 146
 G. Cheng, M.J. Mustari, S. Khanna, J.D. Porter, Invest. Ophthalmol.
Vis. Sci. 44 (2003) 3842–3855.
 W. Ji, W. Zhou, K. Gregg, N. Yu, S. Davis, S. Davis, Nucleic Acids
Res. 32 (2004) e93.
 Y. Gilad, S.A. Rifkin, P. Bertone, M. Gerstein, K.P. White, Genome
Res. 15 (2005) 674–680.
 M.W. Karaman, M.L. Houck, L.G. Chemnick, S. Nagpal, D. Cha-
wannakul, D. Sudano, B.L. Pike, V.V. Ho, O.A. Ryder, J.G. Hacia,
Genome Res. 13 (2003) 1619–1630.
 M. Caceres, J. Lachuer, M.A. Zapala, J.C. Redmond, L. Kudo, D.H.
Geschwind, D.J. Lockhart, T.M. Preuss, C. Barlow, Proc. Natl. Acad.
Sci. USA 100 (2003) 13030–13035.
 P. Khaitovich, B. Muetzel, X. She, M. Lachmann, I. Hellmann, J. Diet-
zsch, S. Steigele, H.H. Do, G. Weiss, W. Enard, F. Heissig, T. Arendt, K.
Nieselt-Struwe, E.E. Eichler, S. Paabo, Genome Res. 14 (2004) 1462–1473.
 M. Uddin, D.E. Wildman, G. Liu, W. Xu, R.M. Johnson, P.R. Hof, G.
Kapatos, L.I. Grossman, M. Goodman, Proc. Natl. Acad. Sci. USA
101 (2004) 2957–2962.
 W. Enard, P. Khaitovich, J. Klose, S. Zollner, F. Heissig, P. Giavali-
sco, K. Nieselt-Struwe, E. Muchmore, A. Varki, R. Ravid, G.M. Doxi-
adis, R.E. Bontrop, S. Paabo, Science 296 (2002) 340–343.
 T.M. Preuss, M. Caceres, M.C. Oldham, D.H. Geschwind, Nat. Rev.
Genet. 5 (2004) 850–860.
 J.D. Chismar, T. Mondala, H.S. Fox, E. Roberts, D. Langford, E.
Masliah, D.R. Salomon, S.R. Head, Biotechniques 33 (2002) 516–524.
 Z. Wang, M.G. Lewis, M.E. Nau, A. Arnold, M.T. Vahey, BMC Bio-
informatics 5 (2004) 165.
 M. Marvanova, J. Menager, E. Bezard, R.E. Bontrop, L. Pradier, G.
Wong, FASEB J. 17 (2003) 929–931.
 P. Khaitovich, G. Weiss, M. Lachmann, I. Hellmann, W. Enard, B. Muet-
zel, U. Wirkner, W. Ansorge, S. Paabo, PLoS Biol. 2 (2004) E132.
 S. Nagpal, M.W. Karaman, M.M. Timmerman, V.V. Ho, B.L. Pike,
J.G. Hacia, Nucleic Acids Res. 32 (2004) e51.