ArticlePDF Available

The non-invasive diagnosis of lymph-node status based on gene expression profiles of primary breast cancer tumors

Authors:

Abstract and Figures

The ability to predict clinical outcomes of breast cancer such as lymph-node involvement and local or distant recurrence will significantly impact the clinical management of disease progression. For example, the presence or absence of metastatic breast cancer in axillary lymph nodes is a significant predictor of overall survival. We show that whole-genome expression profiling of primary breast tumors results in multi-variate biomarker signatures that are able to predict lymph-node status, and hence, the likelihood of overall survival. Previous attempts to correlate characteristics of primary tumors based on S-phase fraction, tumor grade, ploidy, hormone receptor status and ERBB2 over-expression with lymph-node status have been unsuccessful. Our results show that multi-variate gene expression signatures obtained from whole-genome expression profiling appears to have the resolution necessary to characterize lymph-node status using as few as 35 genes. In addition, the enrichment of such a signature for cancer-related genes, signaling pathways and biological processes provide opportunities to identify novel molecular targets that may contribute to improved treatment and care, and a deeper understanding of the causal mechanisms underlying metastasis and tumor growth.
Content may be subject to copyright.
HAWAI‘I MEDICAL JOURNAL, VOL 66, JANUARY 2007
17
Cancer Research Center Hotline
Carl-Wilhelm Vogel MD, PhD, Contributing Editor
The Non-invasive Diagnosis of Lymph-node Status Based on Gene
Expression Profi les of Primary Breast Cancer Tumors
Gordon S. Okimoto PhD, Director, Informatics Shared Resources,
Cancer Research Center of Hawai‘i
The ability to predict the clinical outcome of breast cancer, includ-
ing lymph-node metastasis or recurrence, will profoundly affect the
clinical decisions made to manage disease progress. For example,
the presence of metastatic breast cancer in axillary lymph nodes is
probably the most signifi cant factor in overall survival.
1
Although
the determination of lymph node status is relatively routine, the
surgical procedure is highly invasive, and selectivity in identifying
nodes for examination introduces biases that suggest some reported
negatives may indeed be truly positive.
2
The ability to accurately
predict axillary lymph node status on the basis of a gene expres-
sion profi le of the primary tumor may obviate the need for axillary
lymph node dissection and the signifi cant morbidity associated with
this procedure.
3
The focus of this article is on the non-invasive diagnosis of
lymph-node status based on gene expression profi ling of the primary
tumor. A closely related goal is the identifi cation of the genes and
pathways that are highly correlated with changes in lymph-node
status using DNA microarrays. Previous attempts to correlate
characteristics of primary tumors such as S-phase fraction, tumor
grade, ploidy, hormone receptor status and ERBB2 over-expression
with lymph-node status have been unsuccessful. Our studies show
that gene expression profi ling appears to have the resolution neces-
sary to characterize lymph-node status using as few as 35 genes.
In addition, the genes, pathways and phenotypic models that result
from a genome-wide analysis of gene expression in breast cancer
tumors provide hypotheses for highly focused molecular studies
with the potential to identify new targets that may contribute to
improved treatment and care, and a deeper understanding of the
causal mechanisms underlying metastasis and tumor growth.
Microarrays and Microarray Experiments
DNA microarrays (or chips) profi le the steady-state messenger-RNA
(mRNA) levels of thousands of genes simultaneously in a single
biological sample. A microarray experiment consists of multiple
microarrays each profi ling a distinct biological sample. The goal
of a typical microarray experiment is to characterize changes in a
clinical phenotype, such as lymph-node status, in terms of a small
number of genes that are differentially expressed between the two
conditions. This is accomplished by statistically comparing the global
gene expression patterns of two groups of samples with known
lymph-node status. Given that the human genome is composed of
tens of thousands of genes, and that microarray data is inherently
noisy, this comparison poses a diffi cult analytical problem. We have
developed a method of analyzing these data to achieve high accuracy
in determining whether a primary tumor has metastasized to the
lymph-nodes based on the gene expression profi le of the tumor.
The Data
The data presented in this article were downloaded from a public
repository of microarray data made available by Duke University
Institute for Genome Sciences and Policy. The original data consisted
of primary tumor biopsies obtained from the Koo Foundation Sun
Yat-Sen Cancer Center in Taipei, Taiwan and Duke University.
Tumors were either positive for both the estrogen and progesterone
receptors or negative for both receptors. Each tumor was diagnosed
as invasive ductal carcinoma and was between 1.5 and 5 cm in
maximal dimension. In each case, a diagnostic axillary lymph node
dissection was performed and lymph-node status was determined.
Total RNA was extracted from tumor tissue, processed and hybrid-
ized to Affymetrix U95-AV-5 GeneChip microarrays using standard
protocols established by the vendor. Each microarray profi led the
steady-state mRNA levels of 12,625 genes simultaneously in a single
tumor sample. The fi nal data consisted of 37 microarrays of which
19 were associated with lymph-node negative (negative) samples
and 18 were associated lymph-node positive (positive) samples.
Statistical Analysis
The 37 GeneChips were arranged as the columns of a data matrix
A
, that had 12,625 rows and 37 columns. The columns of
A
were
ordered so the chips associated with negative samples occupied
columns 1 thru 19, while the positive chips occupied columns 20
thru 37. The columns of
A
were normalized to facilitate comparison
between chips, and logarithmically transformed to equalize random
variation over expression intensity. A novel signal detection algo-
rithm called MANINI was then applied to the rows of
A
to identify
genes that were signifi cantly altered in expression in the positive
sample class. Such genes are called signifi cant genes.
Figure 1 shows the results of a MANINI analysis of the data matrix
A
in the form of a Ratio/Intensity scatter plot. Every point of the
R/I plot represents the average fold change (vertical axis) versus
average expression (horizontal axis) of a single gene in the log-log
space. Genes that are signifi cantly up-regulated on positive samples
are highlighted with open circles, while genes that are signifi cantly
down-regulated on the positive samples are highlighted with open
triangles. MANINI found that 448 genes were signifi cantly up-
regulated, while 391 genes were found to be signifi cantly down-
regulated for a total of 839 genes signifi cantly altered in expression
in the positive breast cancer tumors.
The list of signifi cant genes was further reduced to a list of sig-
nifi cant pathways (or gene networks) where each pathway is a col-
lection of interacting genes that accomplishes a specifi c biological
function. The Ingenuity Pathway Analysis (IPA) tool was used to
identify the signifi cant pathways contained in the MANINI gene
list. IPA is a proprietary knowledgebase containing what is cur-
Figure 1.— Ratio/Intensity plot of the Huang breast cancer data set with up-regulated genes highlighted with open circles and down-regulated genes
highlighted with open triangle. The other points represent noise in the data. A total of 839 genes were called signifi cantly altered in expression on
the positive samples using the MANINI detection algorithm.
Figure 2.— Network diagram of the most signifi cant pathway discovered in the 448 up-regulated genes identifi ed by the MANINI algorithm. Each
node represents a gene and each edge connecting two genes represents an interaction between them. The ERBB2 gene which is the target for the
cancer drug Herceptin is highlighted in the square box.
rently known about gene function and gene-gene interactions. We
considered only the 35 genes in IPA network #2 which had a p-value
of 10
-52
and inferred functions of c
ellular growth and proliferation
and
immune response.
The process of focusing only on genes in
a signifi cant IPA network is known as
pathway compression
. An
interaction diagram of IPA network #2 is shown in Figure 2 and a
list of the genes contained in the network is shown in Figure 3.
The expression profi les of the 35 genes in IPA network #2 were
then wavelet transformed to enhance signal-to-noise ratio and com-
pressed down to 10 features for each sample using singular value
decomposition (SVD). SVD compression in the wavelet
domain is
a novel method of identifying features in microarray data that are
useful for pattern recognition applications in cancer diagnosis and
prognosis. A neural network classifi er was trained to predict the
lymph-node status of a sample based on its 10-dimensional feature
vector.
The ability of the NN classifi er to diagnose positive lymph-node
involvement was objectively evaluated using leave-one-out cross
validation (LOOCV) analysis. Here, a sample is “left-out” of the
data set and the remaining data are processed as described above
resulting in a trained neural network classifi er that is then used to
classify the left-out sample. The neural network output is compared
with the known lymph-node status of the sample and the result is
duly noted. The process is repeated for every sample in the data set
and the percentage of left-out samples that were correctly diagnosed
represents the correct classifi cation rate (CCR) for the system.
LOOCV analysis was repeated 50 times to determine system robust-
ness and the median CCR over the 50 trials was computed.
LOOCV analysis of the neural network described above resulted
in a median CCR of 95%, that is, the lymph-node status of 35 out of
37 samples were correctly diagnosed based on the gene expression
patterns in the primary tumor. Only 35 genes were used to achieve
this high CCR value. This result compares favorably with the result
of the original Huang study that achieved a CCR of 90% using 200
genes. Interestingly, the network of 35 genes used to train the neural
network classifi er contained the ERBB2 (Her2/neu) gene, which
is the target for the cancer drug Herceptin. Moreover, this gene
was absent from the 200 signifi cant genes identifi ed in the Huang
study. This result suggests that pathways modulated by drugs like
Herceptin may be involved in the progression of breast cancer from
low to high risk status.
The results summarized above suggest that global patterns of
gene expression in primary breast cancer tumors contain suffi cient
information to diagnose lymph-node status provided proper infor-
mation processing techniques are employed. Pathway compression,
wavelet signal processing and SVD combine to achieve a signifi cant
reduction in the number of genes that must be considered in the
modeling process without signifi cant loss of relevant information.
For more information on the Cancer Research Center of Hawai‘i,
please visit its website at www.crch.org.
References
1. Krag D., Weaver D. Ashikaga T. Moffat F., Klimberg V.S., Shriver C. et al. The sentinel node in breast
cancer,
N. Engl. J. Med.
339 (1998), pp. 941-946.
2. Jatoi, I., Hilsenbeck S.G., Clark G.M., Osborne C.K. Signifi cance of axillary lymph node metastasis
in primary breast cancer.
J. Clinical. Oncology,
17 (1999), pp. 2334-2340.
3. E. Huang, S. H. Cheng, H. Dressman, J. Pittman, M. Tsou, C. Horng, A. Bild, E. S. Iverson, M. Liao, C.
Chen, M. West, J. R. Nevins and A. T. Huang, Gene expression predictors of breast cancer outcomes,
Lancet,
361 (2003), pp. 1590-1596.
Name
Description
Drugs
ADD3
adducin 3 (gamma)
APC
adenomatosis polyposis coli
AREG
amphiregulin
(schwannoma-derived growth factor)
AURKA
aurora kinase A
BAG2
BCL2-associated athanogene 2
BIRC5
baculoviral IAP repeat-containing 5
(survivin)
BUB1B
BUB1 budding uninhibited by
benzimidazoles 1 homolog beta (yeast)
CD52
CD52 molecule
alemtuzumab
CDC2
cell division cycle 2, G1 to S and G2 to M
avopiridol
CDC20
CDC20 cell division cycle 20 homolog
(S. cerevisiae)
CDKN3
cyclin-dependent kinase inhibitor 3
(CDK2-associated dual specifi city
phosphatase)
COL5A2
collagen, type V, alpha 2
collagenase
CXCL13
chemokine (C-X-C motif) ligand 13
(B-cell chemoattractant)
ERBB2
v-erb-b2 erythroblastic leukemia viral
oncogene homolog 2, neuro/glioblastoma
derived oncogene homolog (avian)
trastuzumab, BMS-
599626, lapatinib
EREG
epiregulin
ETV1
ets variant gene 1
FOXM1
forkhead box M1
IGHD
immunoglobulin heavy constant delta
IGHG1
immunoglobulin heavy constant gamma 1
(G1m marker)
IGHM
immunoglobulin heavy constant mu
IGJ
immunoglobulin J polypeptide, linker
protein for immunoglobulin alpha and
mu polypeptides
IGKC
immunoglobulin kappa constant
IGL@
immunoglobulin lambda locus
KIF3A
kinesin family member 3A
LLGL2
lethal giant larvae homolog 2
(Drosophila)
MAD2L1
MAD2 mitotic arrest defi cient-like 1
(yeast)
NCOA3
nuclear receptor coactivator 3
NR4A3
nuclear receptor subfamily 4, group A,
member 3
PPARD
peroxisome proliferative activated recep-
tor, delta
PRKCI
protein kinase C, iota
PTTG1
pituitary tumor-transforming 1
S100P
S100 calcium binding protein P
SQLE
squalene epoxidase
TFAP2B
transcription factor AP-2 beta (activating
enhancer binding protein 2 beta)
TFF3
trefoil factor 3 (intestinal)
Figure 3.— A list of signifi cant genes contained in IPA network #2. This
network was used to pathway compress the Huang breast cancer data set
to select the most informative genes related to lymph-node status. (Note
the presence of ERBB2 and the drugs that use this gene as a target.)
Article
Powered by rapid technology developments, biomarkers become increasingly diverse, including those detected at genomic, transcriptomic, proteomic, metabolomic and cellular levels. While diverse sets of biomarkers have been utilized in breast cancer predisposition, diagnosis, prognosis, treatment and management, recent additions derived from lincRNA, circular RNA, circulating DNA together with its methylated and hydroxymethylated forms and immune signatures are likely to further transform clinical practice. Here, we take breast cancer as an example of heterogeneous diseases that require many informed decisions from treatment to care to review the huge variety of biomarkers. By assessing the advantages and limitations of modern biomarkers in diverse use scenarios, this article outlines the prospects and challenges of releasing complimentary advantages by augmentation of multiscale molecular biomarkers.
Article
Full-text available
The national death rates from rural trauma are disproportionately higher compared to urban areas. Traumatic brain injury is a major cause of hospital admissions in Hawai'i. This is the first in a two part series to explore this significant public health concern. Data on traumatic brain injuries from 2000-2004 was obtained from 2 sources. Male gender, alcohol use, and lack of protective devices resulted in higher rates of injury. Rates of severe injury were higher in young adults, the elderly, and in rural locations, but rural mortality rates here did not differ compared to urban settings. The greatest potential to reduce morbidity and mortality resides in the formulation and implementation of preventive strategies.
Article
Correlation of risk factors with genomic data promises to provide specific treatment for individual patients, and needs interpretation of complex, multivariate patterns in gene expression data, as well as assessment of their ability to improve clinical predictions. We aimed to predict nodal metastatic states and relapse for breast cancer patients. We analysed DNA microarray data from samples of primary breast tumours, using non-linear statistical analyses to assess multiple patterns of interactions of groups of genes that have predictive value for the individual patient, with respect to lymph node metastasis and cancer recurrence. We identified aggregate patterns of gene expression (metagenes) that associate with lymph node status and recurrence, and that are capable of predicting outcomes in individual patients with about 90% accuracy. The metagenes defined distinct groups of genes, suggesting different biological processes underlying these two characteristics of breast cancer. Initial external validation came from similarly accurate predictions of nodal status of a small sample in a distinct population. Multiple aggregate measures of profiles of gene expression define valuable predictive associations with lymph node metastasis and disease recurrence for individual patients. Gene expression data have the potential to aid accurate, individualised, prognosis. Importantly, these data are assessed in terms of precise numerical predictions, with ranges of probabilities of outcome. Precise and statistically valid assessments of risks specific for patients, will ultimately be of most value to clinicians faced with treatment decisions.
Signifi cance of axillary lymph node metastasis in primary breast cancer
  • I Jatoi
  • S G Hilsenbeck
  • G M Clark
  • C K Osborne
Jatoi, I., Hilsenbeck S.G., Clark G.M., Osborne C.K. Signifi cance of axillary lymph node metastasis in primary breast cancer. J. Clinical. Oncology, 17 (1999), pp. 2334-2340.
  • E Huang
  • S H Cheng
  • H Dressman
  • J Pittman
  • M Tsou
  • C Horng
  • A Bild
  • E S Iverson
  • M Liao
  • C Chen
  • M West
  • J R Nevins
  • A T Huang
E. Huang, S. H. Cheng, H. Dressman, J. Pittman, M. Tsou, C. Horng, A. Bild, E. S. Iverson, M. Liao, C. Chen, M. West, J. R. Nevins and A. T. Huang, Gene expression predictors of breast cancer outcomes, Lancet, 361 (2003), pp. 1590-1596.