Large-scale fine mapping of the HNF1B locus and
prostate cancer risk
Sonja I.Berndt1,∗, JoshuaSampson1, MeredithYeager1,2, Kevin B.Jacobs3, ZhaomingWang1,2,
Amy Hutchinson1,2, Charles Chung1,2, Nick Orr1, Sholom Wacholder1, Nilanjan Chatterjee1,
Kai Yu1, Peter Kraft4, Heather Spencer Feigelson5, Michael J. Thun6, W. Ryan Diver6,
Demetrius Albanes1, Jarmo Virtamo7, Stephanie Weinstein1, Fredrick R. Schumacher8,
Geraldine Cancel-Tassin9, Olivier Cussenot9, Antoine Valeri9, Gerald L. Andriole10,
E. David Crawford11, Christopher Haiman8, Brian Henderson8, Laurence Kolonel12, Loic Le
Marchand12, Afshan Siddiq13, Elio Riboli13, Ruth C. Travis14, Rudolf Kaaks15, William Isaacs16,
Sarah Isaacs16, Kathleen E. Wiley16, Henrik Gronberg17, Fredrik Wiklund17, Pa ¨r Stattin18,
JianfengXu19, S. LillyZheng19, JielinSun19, Lars J.Vatten20, KristianHveem20, IngerNjølstad21,
Daniela S. Gerhard22, Margaret Tucker1, Richard B. Hayes23, Robert N. Hoover1,
Joseph F. Fraumeni Jr1, David J. Hunter24, Gilles Thomas1and Stephen J. Chanock1
1Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer
Institute, National Institutes of Health, Bethesda, MD, USA,2Core Genotyping Facility, Advanced Technology
Program, SAIC-Frederick Inc., NCI-Frederick, Frederick, MD, USA,3Bioinformed Consulting Services, Gaithersburg,
MD, USA,4Program in Molecular and Genetic Epidemiology, Department of Epidemiology, Harvard School of Public
Health, Boston, MA, USA,5Institute for Health Research, Kaiser Permanente, Denver, CO, USA,
6Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA, USA,7Department
of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland,8Department of Preventive
Medicine, Keck School of Medicine, Los Angeles, CA, USA,9Centre de Recherche pour les Pathologies Prostatiques,
Ho ˆpital Tenon, Assistance Publique-Ho ˆpitaux de Paris, 75970 Paris, France,10Division of Urologic Surgery,
Washington University School of Medicine, St Louis, MO, USA,11Department of Surgery, University of Colorado at
Denver and Health Sciences Center, Denver, CO, USA,12Epidemiology Program, Cancer Research Center,
University of Hawaii, Honolulu, HI, USA,13Division of Epidemiology, Public Health and Primary Care, Imperial College
London, London, UK,14Cancer Epidemiology Unit, Nuffield Department of Clinical Medicine, University of Oxford,
Oxford, UK,15Division of Cancer Epidemiology, German Cancer Research Centre (DKFZ), Heidelberg, Germany,
16Department of Urology, Johns Hopkins Medical Institutions, Baltimore, MD, USA,17Department of Medical
Epidemiology and Biostatistics, CLINTEC, Karolinska Institute, Stockholm, Sweden,18Department of Surgical and
Perioperative Sciences, Urology and Andrology, Umea ˚ University, Umea ˚, Sweden,19Center for Cancer Genomics,
Wake Forest University School of Medicine, Winston-Salem, NC, USA,20Department of Public Health and General
Practice, Norwegian University of Science and Technology, Trondheim, Norway,21Institute of Community Medicine,
University of Tromsø, Tromsø, Norway,22Office of Cancer Genomics, Department of Health and Human Services,
National Cancer Institute, National Institutes of Health, Bethesda, MD, USA,23Department of Environmental Medicine,
New York University Medical Center, New York, NY, USA,24Channing Laboratory, Brigham and Women’s Hospital,
Harvard Medical School, Boston, MA, USA
Received February 1, 2011; Revised April 14, 2011; Accepted May 9, 2011
∗To whom correspondence should be addressed at: Sonja Berndt, Division of Cancer Epidemiology and Genetics, National Cancer Institute,
6120 Executive Blvd, EPS 8116, MSC 7240, Bethesda, MD 20892-7240. Tel: +1 3015947898; Fax: +1 3014021819; Email: email@example.com
Published by Oxford University Press 2011.
Human Molecular Genetics, 2011, Vol. 20, No. 16
Advance Access published on May 16, 2011
Previous genome-wide association studies have identified two independent variants in HNF1B as suscepti-
bility loci for prostate cancer risk. To fine-map common genetic variation in this region, we genotyped
79 single nucleotide polymorphisms (SNPs) in the 17q12 region harboring HNF1B in 10 272 prostate
cancer cases and 9123 controls of European ancestry from 10 case–control studies as part of the Cancer
Genetic Markers of Susceptibility (CGEMS) initiative. Ten SNPs were significantly related to prostate
cancer risk at a genome-wide significance level of P < 5 3 1028with the most significant association with
rs4430796 (P 5 1.62 3 10224). However, risk within this first locus was not entirely explained by rs4430796.
Although modestly correlated (r25 0.64), rs7405696 was also associated with risk (P 5 9.35 3 10223) even
after adjustment for rs4430769 (P 5 0.007). As expected, rs11649743 was related to prostate cancer risk
(P 5 3.54 3 1028); however, the association within this second locus was stronger for rs4794758
(P 5 4.95 3 10210), which explained all of the risk observed with rs11649743 when both SNPs were included
in the same model (P 5 0.32 for rs11649743; P 5 0.002 for rs4794758). Sequential conditional analyses indi-
cated that five SNPs (rs4430796, rs7405696, rs4794758, rs1016990 and rs3094509) together comprise the
best model for risk in this region. This study demonstrates a complex relationship between variants in the
HNF1B region and prostate cancer risk. Further studies are needed to investigate the biological basis of
the association of variants in 17q12 with prostate cancer.
Of all cancers, prostate cancer is one of the most heritable with
genetic factors estimated to account for 42% of the risk (1).
Genome-wide association studies (GWAS) have been highly
successful in discovering susceptibility loci for prostate
cancer and at least 30 loci have been identified to date
(2–15). One of the earliest loci to be discovered for prostate
cancer was a variant, rs4430796, in HNF1B at chromosome
17q12 in men of European background (6). In a subsequent
GWAS in Japanese men, the same locus was identified (15).
A second independent variant, rs11649743, located at chromo-
some 17q12 and separated by a recombination hotspot from
the first variant, was subsequently found to be associated
with risk (10). The HNF1B locus as well as two other prostate
cancer susceptibility loci, chromosome 7p15.2 (JAZF1) and
chromosome 2p21 (THADA), have also been shown to be
associated with diabetes risk (6,16). Although epidemiologic
studies have shown that diabetes is inversely associated with
prostate cancer (17), variants in HNF1B and JAZF1 do not
explain the association between diabetes and prostate cancer
The strongest variants associated with prostate cancer risk at
chromosome 17q12 localize to a region that harbors HNF1B, a
gene that encodes a POU homeodomain-containing transcrip-
tion factor. POU transcription factors help regulate the devel-
opment of neuroendocrine organs, and HNF1B has been
shown to play a regulatory role in nephron and pancreas devel-
opment (19,20). Rare mutations in HNF1B have been associ-
ated with maturity-onset diabetes of the young type 5
(MODY5), kidney disorders, pancreatic atrophy, and genital
malformations (21,22). Although the biological mechanism
by which HNF1B may affect prostate cancer risk has not
been elucidated, differential expression of HNF1B has been
associated with prostate cancer recurrence (23).
To further characterize genetic variation in the HNF1B
region and the risk of prostate cancer, we conducted a
large-scale fine mapping study using tag single nucleotide
polymorphisms (SNPs) based on HapMap data in 10 272
prostate cancer cases and 9123 controls of European ancestry
from 10 case–control studies as part of the Cancer Genetic
Markers of Susceptibility (CGEMS) initiative. A total of 79
SNPs in the HNF1B region that were genotyped and passed
quality control criteria were analyzed in this study.
We analyzed 79 SNPs located in a 249 kb region surrounding
HNF1B (chromosome 17: 33,010,707–33,259,778) in 10 272
prostate cancer cases and 9123 controls from 10 studies (Sup-
plementary Material, Table S1). Ten SNPs were significantly
associated with prostate cancer risk below the threshold of
genome-wide significance (P , 5 × 1028); the most signifi-
cant association observed was the previously identified SNP
rs4430796 (P ¼ 1.62 × 10224) (Table 1, Fig. 1A). Eight of
the 10 significant SNPs were located in the first HNF1B
region associated with prostate cancer, and all eight were
highly correlated in controls with D′≥ 0.6 and pairwise r2
values between 0.13 and 0.94. However, risk within this first
locus was not entirely explained by rs4430796. Although mod-
estly correlated (r2¼ 0.64), rs7405696 was also associated
with prostate cancer risk (P ¼ 9.35 × 10223). After condition-
ing on rs4430796, rs7405696 retained an association with risk
(P ¼ 0.007). None of the other SNPs in region 1 remained
associated with risk after adjustment for rs4430796 (Sup-
plementary Material, Table S2).
Two SNPs located in the second identified region by Sun
et al. (10) were also associated with the prostate cancer risk
at a significance level of 5 × 1028. As expected, the pre-
viously identified SNP in the second region, rs11649743,
was associated with prostate cancer risk (P ¼ 3.54 × 1028);
however, the association within this second locus was stronger
for rs4794758 (P ¼ 4.95 × 10210). The two SNPs were corre-
lated in controls (r2¼ 0.61), but when rs11649743 and
rs4794758 were included in the same model, only rs4794758
remained associated with risk (P ¼ 0.002 for rs4794758;
P ¼ 0.32 for rs11649743). Interestingly, it accounted for the
risk associated with rs11649743.
Human Molecular Genetics, 2011, Vol. 20, No. 16 3323
To examine the interdependence of the signals observed on
chromosome 17q12, we first conducted a set of sequential con-
ditional analyses, conditioning on the most significant SNP
from the unconditional analysis and each conditional analysis
sequentially until no SNPs remain nominally associated with
risk (P , 0.05) (Supplementary Material, Table S2). After
conditioning on the most significant SNP (rs4430796 in
region 1), six SNPs remained nominally associated with risk
(P , 0.05) with the most significant SNP being rs4794758
in region 2 (Fig. 1B, Supplementary Material, Table S2).
Although rs4430796 and rs4794758 were separated by a
modest recombination hotspot, there was some correlation
between them (r2¼ 0.04) and consequently the P-value for
rs4794758 was attenuated after conditioning on rs4430796
(P ¼ 3.45 × 1025). After conditioning on both rs4430796
and rs4794758, four SNPs were nominally associated with
risk (P , 0.05) with the most significant SNP being
rs1016990 (P ¼ 0.009). This SNP was only nominally associ-
ated with risk in the unconditional model (P ¼ 0.0002) and not
significantly associated with risk after conditioning on
rs4430796 only. Further sequential conditional analyses
yielded rs7405696 (P ¼ 0.01) as the most significant SNP
after conditioning on rs4430796, rs4794758 and rs1016990,
followed by rs3094509 (P ¼ 0.02) after conditioning on
rs4430796, rs4794758, rs1016990 and rs74056969. No other
SNPs remained nominally associated with prostate cancer
risk after conditioning on these five SNPs, suggesting that
these SNPs(i.e. rs4430796,
rs74056969 and rs3094509) capture the risk in this region.
To ascertain whether the same SNPs were identified using
other statistical approaches, we performed forward stepwise
regression adding SNPs below a significance level of 0.05.
This method resulted in the inclusion of four SNPs:
rs4430796, rs4794758, rs1016990 and rs74056969 in the
model. Using lasso, five SNPs (rs4430796, rs2005705,
rs7405696, rs4794758 and rs11649743) entered the model at
a lambda .0.01. A comparison of the models selected by
these three methods using Akaike information criterion
(AIC) indicated that the SNPs using the sequential conditional
analysis method yielded the best model (Table 2). The sequen-
tial conditional model was also found to be a better model than
the model containing the most significant SNP from region 1
and region 2 (AIC: 25351.114 versus 25364.42 for the sequen-
tial and two SNP models, respectively). We imputed the SNPs
from this region available from the 1000 Genomes Project and
conducted a sequential conditional analysis with the imputed
data. The results were quite similar with at least four of the
five SNPs either being the same SNP or a highly correlated
SNP (r2. 0.95) as observed in our sequential conditional
analysis with directly genotyped SNPs (Supplementary
Material, Table S3).
Our sequential conditional model included three SNPs from
region 1 (rs4430796, rs74056969, rs1016990) and two SNPs
from region 2 (rs4794758, rs3094509). A haplotype analysis
of these five SNPs revealed that the most significant haplotype
associated with risk carried the protective allele at rs4430796,
rs74056969 and rs4794758 (P ¼ 2.78 × 1028) (Supplemen-
tary Material, Table S4). When the combined risk of all five
SNPs was examined, a dose–response was observed with
increasing number of risk variants (Ptrend¼ 1.94 × 10226,
Table 1. SNPs in the HNF1B region that were associated with prostate cancer at genome-wide significance levels (P , 5.0 × 1028)
Base pair position
OR (95% CI)a
1.62 × 10224
2.54 × 10223
9.35 × 10223
1.22 × 10216
1.57 × 10216
1.39 × 10215
4.45 × 10215
4.95 × 10210
1.21 × 10208
3.54 × 10208
aOdds ratio per minor allele.
3324Human Molecular Genetics, 2011, Vol. 20, No. 16
Table 3). Men with eight or more risk alleles had a 1.88-fold
increased risk of prostate cancer compared with men with
zeroto tworisk alleles
P ¼ 4.29 × 1029). In comparison, when only the most signifi-
cant SNP from region 1 (rs4430796) and region 2 (rs4794758)
were examined, men with four or more risk alleles had a
1.69-fold risk compared with men with no risk alleles (95%
P ¼ 6.66 × 1028)
10225. The P-value for the number of risk alleles for the
three other SNPs from the sequential model modeled as a con-
tinuous variable was 1.63 × 1025. The variance explained by
Ptrend¼ 1.80 ×
the five SNPs was 0.9% compared with 0.7% for the two best
SNPs from region 1 and region 2.
Finally, we conducted stratified analyses by family history
of prostate cancer and did observe a qualitative interaction
for rs2107131 (Pinteraction¼ 4.29 × 1025) that remained stat-
istically significant after adjustment for multiple testing
(Padj¼ 0.003) (Supplementary Material, Table S5). Among
men with a family history of prostate cancer, the T allele at
rs2107131 was associated with a reduced risk of prostate
cancer (OR ¼ 0.85; 95% CI: 0.74–0.97), whereas among
men without a family history, it was associated with an
Figure 1. Prostate cancer risk associated with SNPs in the HNF1B region: (A) results from the unconditional analysis; (B) results after conditioning on
rs4430796. The diamond indicates the most statistically significant SNP in the region.
Human Molecular Genetics, 2011, Vol. 20, No. 16 3325
increased risk of prostate cancer (OR ¼ 1.13; 95% CI: 1.08–
1.19). Stratified analyses were also performed for prostate
cancer aggressiveness (Gleason , 7 and Stage A/B versus
Gleason ≥ 7 or Stage C/D); however, there were no significant
differences beyond what would be expected by chance (Sup-
plementary Material, Table S6). We did not observe any sig-
nificant heterogeneity between studies beyond what would
be expected by chance (Supplementary Material, Table S1),
except for rs1058166 (Pheterogeneity¼ 0.0002).
Our fine mapping study of a region on 17q12 associated with
prostate cancer confirmed the previously established signals
(6,7,10) and found evidence that additional variants contribute
to the risk of prostate cancer. Although rs4430796 was the
most significant SNP associated with risk, rs7405696 was
also significant at a genome-wide significance level and
explained part of the risk associated with the first HNF1B
locus, suggesting a more complex genetic architecture for
common variants in this region. Since this study used SNP
markers, further work is needed to investigate the biological
basis of the association with common variants in 17q12,
which may regulate HNF1B, or perhaps another gene in pros-
tate cancer. It is plausible that multiple variants are directly
associated with prostate cancer susceptibility.
In the second HNF1B locus, we found that rs4794758 was
more strongly associated with risk than the previously ident-
ified variant, rs11649743. When both variants were included
in the same model, rs4794758 explained all of the risk associ-
ated with rs11649743, indicating that this variant more aptly
captured the risk attributable to this locus. Although the first
and second HNF1B loci are separated by recombination
hotspot (10), the two loci are not completely independent
and the risk associated with rs4794758 was attenuated after
conditioning on the most significant SNP in the first locus,
In our study, the best model for prostate cancer risk in the
HNF1B region included five SNPs (rs4430796, rs7405696,
rs4794758, rs1016990 and rs3094509). Although three of
these SNPs reached genome-wide significance in uncondi-
tional models, rs1016990 was only nominally associated
with risk (P ¼ 0.0002) and rs3094509 was not associated
with risk (P ¼ 0.80) in unconditional models. Together,
Table 2. Final prostate cancer risk models for HNF1B SNPs from the sequential conditional, stepwise regression and lasso analysesa
Sequential conditional model
Forward stepwise regression
Two SNP model
Base pair position
aFor comparison, all analyses were restricted to subjects with complete genotyping for these SNPs, so that the total number of subjects included in each model was the same.
bOdds ratio per minor allele.
Table 3. Risk of prostate cancer associated with the count of HNF1B risk alle-
No. of risk
OR (95% CI)
aIncludes the five SNPs identified in the sequential conditional analysis
(rs4430796, rs7405696, rs4794758, rs1016990 and rs3094509).
3326 Human Molecular Genetics, 2011, Vol. 20, No. 16
these five variants provided a better model for the data than
other combinations, suggesting that these SNPs together
capture more of the risk associated with this region. Further-
more, haplotype analysis revealed that risk was not explained
by a single haplotype, suggesting a role for multiple variants
or combinations of variants. Imputation using data from the
1000 Genomes Project yielded similar results; however,
additional variants discovered in future next generation
sequencing may also contribute to risk.
Interestingly, we observed a significant interaction between
a SNP located in the second HNF1B locus, rs2107131, and
family history of prostate cancer. Family history was only cap-
tured at baseline for participants and could have changed over
time, and the number of subjects with a positive family history
of prostate cancer was limited (N ¼ 2182 men), so this finding
should be interpreted with caution. However, it is possible that
the interaction reflects haplotype segregation with uncommon
disease-causing alleles associated with familial cases and
common disease-causing alleles associated with sporadic
The biological mechanism by which 17q12 variants may
alter the regulation or splicing of a plausible candidate gene,
such as HNF1B, is not clear. Down-regulation of HNF1B
expression has been associated with renal cell cancer pro-
gression (24), and differential expression of HNF1B has
HNF1B is a transcription factor that encodes three isoforms
in humans. Isoforms HNF1B(A) and HNF1B(B) appear to be
transcriptional activators, whereas isoform HNF1B(C) is a
transcriptional repressor (25). Differences in isoform-specific
HNF1B expression have also been observed between normal
and malignant prostate tissue, with prostate tumors displaying
greater isoform HNF1B(B) expression but less isoform
HNF1B(C) expression than normal prostate tissue (26). It is
possible that variants in HNF1B contribute to the altered the
expression of HNF1B isoforms in prostate cancer. It is also
plausible that variants in this region of 17q12 could also influ-
ence the regulation or expression of other genes at a distance.
In conclusion, this large-scale fine mapping study revealed
that the association between variants in the 17q12 region
and prostate cancer risk is more complex than earlier studies
have indicated. Additional sequencing and functional studies
are needed to pinpoint the variants in this region that are
directly associated with prostate cancer risk and the biological
cancer recurrence (23).
MATERIALS AND METHODS
As described previously (11), prostate cancer cases and con-
trols of European ancestry were drawn from 10 studies in
the USA and Europe: Prostate, Lung, Colorectal, and
Ovarian (PLCO) Cancer Screening Trial (972 cases/927 con-
trols); Alpha-Tocopherol, Beta-Carotene Cancer Prevention
Study (ATBC) (906 cases/868 controls); American Cancer
Society Cancer Prevention Study II (CPSII) (1643 cases/
(HPFS) (595 cases/589 controls); CeRePP French Prostate
Multiethnic Cohort Study (MEC) (676 cases/682 controls);
European Prospective Investigation into Cancer and Nutrition
(EPIC) (682 cases/990 controls); Cohort of Norway (CONOR)
(606 cases/662 controls); Cancer of the Prostate in Sweden
(CAPS) (2213 cases/1362 controls); and a hospital-based
case–control from the Johns Hopkins Hospital (JHH) (990
cases/451 controls). A total of 10 272 prostate cancer cases
and 9123 controls were available for this study. Aggressive
prostate cancer was defined at Gleason score ≥7 or stage C/
D. Among the cases with information on disease aggressive-
ness, 4824 cases were defined as aggressive and 4337 were
non-aggressive (Gleason score ,7 and stage A/B). Family
history of prostate cancer was obtained by self-report from
the participants and available for 8 of the 10 studies. A posi-
tive family history of prostate cancer was defined as a first
degree relative with prostate cancer. Each study obtained an
informed consent from study participants and approval from
their respective institutional review boards for this study.
A total of 88 SNPs within a 249 071 bp region encompassing
HNF1B were selected for fine mapping. The SNPs were
chosen based on the 0.2 cM HapMap recombination region
flanking the most significant SNP (rs4430796) in the HNF1B
locus from the second stage of the CGEMS GWAS (7). The
entire region was tagged to capture SNPs with a minor allele
frequency ≥5% at a D′. 0.6 based on the HapMap CEU
population (Build 26). All three SNPs with a P , 1023from
the second stage of CGEMS in this region (i.e. rs4430796,
rs7501939 and rs11649743) were selected as obligated SNPs
to be included. The final tag SNPs were selected if they
were found to be correlated with an r2of ≥0.8 in the
HapMap CEU, YRI or JPT + CHB populations with the obli-
gated SNPs. The tag SNP selection was performed using
glu-genetics/). The chosen SNPs were primarily located in
first (chromosome 17: 33.161–33.205 Mb, NCBI Build 36)
and second (chromosome 17: 33.116–33.161 Mb, NCBI
Build 36) HNF1B regions identified to be associated with
prostate cancer risk; however, other SNPs outside these
regions were also included.
All SNPs for this study were genotyped on a custom Illumina
iSelect assay panel as part of the third stage of CGEMS as
described previously (11). In brief, a total of 6652 SNPs,
including 1400 SNPs to monitor population stratification,
were attempted in 22 057 samples, including quality control
duplicates. Duplicate samples yielded a 99.97% concordance
rate. Subjects with ,90% completion (n ¼ 1350), missing
covariate data or sparse group (n ¼ 104), likely an intra-study
duplicate based on genotype concordance ≥99% (n ¼ 18), or
non-European ancestry defined by ,0.80 European admixture
as estimated using STRUCTURE (27) (n ¼ 372) were
excluded, leaving 10 272 cases and 9123 controls for analysis.
SNPs with ,80% completion within a study were removed
from analysis for that study. SNPs that failed to provide gen-
otypes for more than six studies (n ¼ 1), had a minor allele
Human Molecular Genetics, 2011, Vol. 20, No. 16 3327
count ≤10 (n ¼ 7) or had genotypes that were inconsistent
with Hardy–Weinberg proportions
(P , 0.001) (n ¼ 1) were excluded from analysis, leaving 79
SNPs for analysis.
To explore the possibility that other variants in the region
could be related to risk, we imputed the 249 071 bp region
encompassing our genotyped SNPs using IMPUTE2 and the
1000 Genomes Project data (June 2010 release). A total of
653 SNPs were imputed from this region, but only SNPs
with a quality score (i.e. info) .0.40 were considered for
analysis (N ¼ 353). All imputed SNPs were analyzed using
SNPTEST v.2.2.0 accounting for imputation uncertainty.
Principal components analysis was conducted using EIGEN-
STRAT (28) with 1399 SNPs that were genotyped, passed
quality control criteria and selected for population stratifica-
tion; these SNPs were chosen because they had minimal cor-
relation (r2, 0.004) (29). The Wilcoxon rank test was used
to test the association between the top five eigenvectors and
case–control status. Four eigenvectors displayed a significant
or borderline significant association with prostate cancer
(P , 0.08) and were included in the analysis. The association
between each SNP and prostate cancer risk was estimated
using logistic regression adjusting for age (,50, 50–59,
60–69, 70–79, ≥80), study (including country for EPIC)
and significant principal components. Stratified analyses
were conducted to examine differences by disease aggressive-
ness, family history and study. Heterogeneity between aggres-
sive and non-aggressive disease was assessed in a case-only
analysis using logistic regression. Heterogeneity between
studies and risk modification by family history was assessed
using a likelihood ratio test comparing the model with and
without the cross-product term(s). P-values for interactions
tests were adjusted for the false discovery rate (30).
To explore the interdependence of the associations
observed, three separate approaches were taken: (i) sequential
conditional analyses, (ii) stepwise regression, and (iii) lasso
(31). Sequential conditional analyses were conducted by
including the most significant SNP in the unconditional logis-
tic regression model followed by sequential inclusion of the
most significant SNP in each conditional model and examining
the association with each of the remaining SNPs indepen-
dently. Forward stepwise regression was performed using the
SNPs that reached a nominal significance level in the uncondi-
tional model. SNPs were sequentially included in the model
based on a minimal P , 0.05. Lasso was conducted in R
using all SNPs without missing data and an unweighted
Linkage disequilibrium measures (D′and r2) were esti-
mated in the controls using Haploview. Haplotypes were
estimated using an expectation-maximization algorithm and
analyzed using the generalized linear regression model
implemented in HaploStats (32). Unless otherwise indicated,
all analyses were conducted using PLINK or STATA.
Supplementary Material is available at HMG online.
This study was supported by the Intramural Research Program
of the Division of Cancer Epidemiology and Genetics,
National Cancer Institute, National Institutes of Health
(NIH). The content of this publication does not necessarily
reflect the views or policies of the Department of Health and
Human Services nor does mention of trade names, commercial
products or organization indicate endorsement by the US Gov-
ernment. The authors thank Drs Christine Berg and Philip
Prorok, Division of Cancer Prevention, NCI, the screening
center investigators and staff of the PLCO Cancer Screening
Trial, Mr Thomas Riley and staff at Information Management
Services, Inc., and Ms Barbara O’Brien and staff at Westat,
Inc. for their contributions to the PLCO. Finally, we are grate-
ful to the study participants for donating their time and making
this study possible.
Conflict of Interest statement. None declared.
This study was supported by the Intramural Research Program
of the Division of Cancer Epidemiology and Genetics,
National Cancer Institute, National Institutes of Health and
in part by National Cancer Institute, National Institutes of
1. Lichtenstein, P., Holm, N.V., Verkasalo, P.K., Iliadou, A., Kaprio, J.,
Koskenvuo, M., Pukkala, E., Skytthe, A. and Hemminki, K. (2000)
Environmental and heritable factors in the causation of cancer—analyses
of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med.,
2. Amundadottir, L.T., Sulem, P., Gudmundsson, J., Helgason, A., Baker, A.,
Agnarsson, B.A., Sigurdsson, A., Benediktsdottir, K.R., Cazier, J.B.,
Sainz, J. et al. (2006) A common variant associated with prostate cancer
in European and African populations. Nat. Genet., 38, 652–658.
3. Yeager, M., Orr, N., Hayes, R.B., Jacobs, K.B., Kraft, P., Wacholder, S.,
Minichiello, M.J., Fearnhead, P., Yu, K., Chatterjee, N. et al. (2007)
Genome-wide association study of prostate cancer identifies a second risk
locus at 8q24. Nat. Genet., 39, 645–649.
4. Gudmundsson, J., Sulem, P., Manolescu, A., Amundadottir, L.T.,
Gudbjartsson, D., Helgason, A., Rafnar, T., Bergthorsson, J.T.,
Agnarsson, B.A., Baker, A. et al. (2007) Genome-wide association study
identifies a second prostate cancer susceptibility variant at 8q24. Nat.
Genet., 39, 631–637.
5. Haiman, C.A., Patterson, N., Freedman, M.L., Myers, S.R., Pike, M.C.,
Waliszewska, A., Neubauer, J., Tandon, A., Schirmer, C., McDonald, G.J.
et al. (2007) Multiple regions within 8q24 independently affect risk for
prostate cancer. Nat. Genet., 39, 638–644.
6. Gudmundsson, J., Sulem, P., Steinthorsdottir, V., Bergthorsson, J.T.,
Thorleifsson, G., Manolescu, A., Rafnar, T., Gudbjartsson, D., Agnarsson,
B.A., Baker, A. et al. (2007) Two variants on chromosome 17 confer
prostate cancer risk, and the one in TCF2 protects against type 2 diabetes.
Nat. Genet., 39, 977–983.
7. Thomas, G., Jacobs, K.B., Yeager, M., Kraft, P., Wacholder, S., Orr, N.,
Yu, K., Chatterjee, N., Welch, R., Hutchinson, A. et al. (2008) Multiple
3328 Human Molecular Genetics, 2011, Vol. 20, No. 16
loci identified in a genome-wide association study of prostate cancer. Nat. Download full-text
Genet., 40, 310–315.
8. Eeles, R.A., Kote-Jarai, Z., Giles, G.G., Olama, A.A., Guy, M.,
Jugurnauth, S.K., Mulholland, S., Leongamornlert, D.A., Edwards, S.M.,
Morrison, J. et al. (2008) Multiple newly identified loci associated with
prostate cancer susceptibility. Nat. Genet., 40, 316–321.
9. Gudmundsson, J., Sulem, P., Rafnar, T., Bergthorsson, J.T., Manolescu,
A., Gudbjartsson, D., Agnarsson, B.A., Sigurdsson, A., Benediktsdottir,
K.R., Blondal, T. et al. (2008) Common sequence variants on 2p15 and
Xp11.22 confer susceptibility to prostate cancer. Nat. Genet., 40,
10. Sun, J., Zheng, S.L., Wiklund, F., Isaacs, S.D., Purcell, L.D., Gao, Z., Hsu,
F.C., Kim, S.T., Liu, W., Zhu, Y. et al. (2008) Evidence for two
independent prostate cancer risk-associated loci in the HNF1B gene at
17q12. Nat. Genet., 40, 1153–1155.
11. Yeager, M., Chatterjee, N., Ciampa, J., Jacobs, K.B., Gonzalez-Bosquet,
J., Hayes, R.B., Kraft, P., Wacholder, S., Orr, N., Berndt, S. et al. (2009)
Identification of a new prostate cancer susceptibility locus on
chromosome 8q24. Nat. Genet., 41, 1055–1057.
12. Al Olama, A.A., Kote-Jarai, Z., Giles, G.G., Guy, M., Morrison, J., Severi,
G., Leongamornlert, D.A., Tymrakiewicz, M., Jhavar, S., Saunders, E.
et al. (2009) Multiple loci on 8q24 associated with prostate cancer
susceptibility. Nat. Genet., 41, 1058–1060.
13. Eeles, R.A., Kote-Jarai, Z., Al Olama, A.A., Giles, G.G., Guy, M., Severi,
G., Muir, K., Hopper, J.L., Henderson, B.E., Haiman, C.A. et al. (2009)
Identification of seven new prostate cancer susceptibility loci through a
genome-wide association study. Nat. Genet., 41, 1116–1121.
14. Gudmundsson, J., Sulem, P., Gudbjartsson, D.F., Blondal, T., Gylfason,
A., Agnarsson, B.A., Benediktsdottir, K.R., Magnusdottir, D.N.,
Orlygsdottir, G., Jakobsdottir, M. et al. (2009) Genome-wide association
and replication studies identify four variants associated with prostate
cancer susceptibility. Nat. Genet., 41, 1122–1126.
15. Takata, R., Akamatsu, S., Kubo, M., Takahashi, A., Hosono, N.,
Kawaguchi, T., Tsunoda, T., Inazawa, J., Kamatani, N., Ogawa, O. et al.
(2010) Genome-wide association study identifies five new susceptibility
loci for prostate cancer in the Japanese population. Nat. Genet., 42,
16. Zeggini, E., Scott, L.J., Saxena, R., Voight, B.F., Marchini, J.L., Hu, T.,
de Bakker, P.I., Abecasis, G.R., Almgren, P., Andersen, G. et al. (2008)
Meta-analysis of genome-wide association data and large-scale replication
identifies additional susceptibility loci for type 2 diabetes. Nat. Genet., 40,
17. Kasper, J.S. and Giovannucci, E. (2006) A meta-analysis of diabetes
mellitus and the risk of prostate cancer. Cancer Epidemiol. Biomarkers
Prev., 15, 2056–2062.
18. Stevens, V.L., Ahn, J., Sun, J., Jacobs, E.J., Moore, S.C., Patel, A.V.,
Berndt, S.I., Albanes, D. and Hayes, R.B. (2010) HNF1B and JAZF1
genes, diabetes, and prostate cancer risk. Prostate, 70, 601–607.
19. Kato, N. and Motoyama, T. (2009) Hepatocyte nuclear
factor-1beta(HNF-1beta) in human urogenital organs: its expression and
role in embryogenesis and tumorigenesis. Histol. Histopathol., 24,
20. Maestro, M.A., Cardalda, C., Boj, S.F., Luco, R.F., Servitja, J.M. and
Ferrer, J. (2007) Distinct roles of HNF1beta, HNF1alpha, and HNF4alpha
in regulating pancreas development, beta-cell function and growth.
Endocr. Dev., 12, 33–45.
21. Bellanne-Chantelot, C., Chauveau, D., Gautier, J.F., Dubois-Laforgue, D.,
Clauin, S., Beaufils, S., Wilhelm, J.M., Boitard, C., Noel, L.H., Velho, G.
and Timsit, J. (2004) Clinical spectrum associated with hepatocyte nuclear
factor-1beta mutations. Ann. Intern. Med., 140, 510–517.
22. Edghill, E.L., Bingham, C., Ellard, S. and Hattersley, A.T. (2006)
Mutations in hepatocyte nuclear factor-1beta and their related phenotypes.
J. Med. Genet., 43, 84–90.
23. Glinsky, G.V., Glinskii, A.B., Stephenson, A.J., Hoffman, R.M. and
Gerald, W.L. (2004) Gene expression profiling predicts clinical outcome
of prostate cancer. J. Clin. Invest., 113, 913–923.
24. Buchner, A., Castro, M., Hennig, A., Popp, T., Assmann, G., Stief, C.G.
and Zimmermann, W. (2010) Downregulation of HNF-1B in renal cell
carcinoma is associated with tumor progression and poor prognosis.
Urology, 76, 507–511.
25. Bach, I. and Yaniv, M. (1993) More potent transcriptional activators or a
transdominant inhibitor of the HNF1 homeoprotein family are generated
by alternative RNA processing. EMBO J., 12, 4229–4242.
26. Harries, L.W., Perry, J.R., McCullagh, P. and Crundwell, M. (2010)
Alterations in LMTK2, MSMB and HNF1B gene expression are
associated with the development of prostate cancer. BMC Cancer,
27. Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of
population structure using multilocus genotype data. Genetics, 155,
28. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A.
and Reich, D. (2006) Principal components analysis corrects for
stratification in genome-wide association studies. Nat. Genet., 38,
29. Yu, K., Wang, Z., Li, Q., Wacholder, S., Hunter, D.J., Hoover, R.N.,
Chanock, S. and Thomas, G. (2008) Population substructure and control
selection in genome-wide association studies. PLoS ONE, 3, e2551.
30. Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple testing. J. R. Statist.
Soc. B, 57, 289–300.
31. Tibshirani, R. (1996) Regression shrinkage and selection via lasso. J. R.
Stat. Soc. B, 58, 267–288.
32. Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M. and Poland,
G.A. (2002) Score tests for association between traits and haplotypes
when linkage phase is ambiguous. Am. J. Hum. Genet., 70, 425–434.
Human Molecular Genetics, 2011, Vol. 20, No. 163329