Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer.
Ian P M Tomlinson, Luis G Carvajal-Carmona, Sara E Dobbins, Albert Tenesa, Angela M Jones, Kimberley Howarth, Claire Palles, Peter Broderick, Emma E M Jaeger, Susan Farrington, Annabelle Lewis, James G D Prendergast, Alan M Pittman, Evropi Theodoratou, Bianca Olver, Marion Walker, Steven Penegar, Ella Barclay, Nicola Whiffin, Lynn Martin, Stephane Ballereau, Amy Lloyd, Maggie Gorman, Steven Lubbe, Bryan Howie, Jonathan Marchini, Clara Ruiz-Ponte, Ceres Fernandez-Rozadilla, Antoni Castells, Angel Carracedo, Sergi Castellvi-Bel, David Duggan, David Conti, Jean-Baptiste Cazier, Harry Campbell, Oliver Sieber, Lara Lipton, Peter Gibbs, Nicholas G Martin, Grant W Montgomery, Joanne Young, Paul N Baird, Steven Gallinger, Polly Newcomb, John Hopper, Mark A Jenkins, Lauri A Aaltonen, David J Kerr, Jeremy Cheadle, Paul Pharoah, Graham Casey, Richard S Houlston, Malcolm G Dunlop
ABSTRACT Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10(-10)) and BMP2 (rs4813802, P = 4.65×10(-11)). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10(-8)) and rs11632715 (P = 2.30×10(-10)). As low-penetrance predisposition variants become harder to identify-owing to small effect sizes and/or low risk allele frequencies-approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.
- Citations (22)
-
Cited In (0)
-
Article: Germline mutations of the gene encoding bone morphogenetic protein receptor 1A in juvenile polyposis.
J R Howe, J L Bair, M G Sayed, M E Anderson, F A Mitros, G M Petersen, V E Velculescu, G Traverso, B Vogelstein[show abstract] [hide abstract]
ABSTRACT: Juvenile polyposis (JP; OMIM 174900) is an autosomal dominant gastrointestinal hamartomatous polyposis syndrome in which patients are at risk for developing gastrointestinal cancers. Previous studies have demonstrated a locus for JP mapping to 18q21.1 (ref. 3) and germline mutations in the homolog of the gene for mothers against decapentaplegic, Drosophila, (MADH4, also known as SMAD4) in several JP families. However, mutations in MADH4 are only present in a subset of JP cases, and although mutations in the gene for phosphatase and tensin homolog (PTEN) have been described in a few families, undefined genetic heterogeneity remains. Using a genome-wide screen in four JP kindreds without germline mutations in MADH4 or PTEN, we identified linkage with markers from chromosome 10q22-23 (maximum lod score of 4.74, straight theta=0.00). We found no recombinants using markers developed from the vicinity of the gene for bone morphogenetic protein receptor 1A (BMPR1A), a serine-threonine kinase type I receptor involved in bone morphogenetic protein (BMP) signaling. Genomic sequencing of BMPR1A in each of these JP kindreds disclosed germline nonsense mutations in all affected kindred members but not in normal control individuals. These findings indicate involvement of an additional gene in the transforming growth factor-beta (TGF-beta) superfamily in the genesis of JP, and document an unanticipated function for BMP in colonic epithelial growth control.Nature Genetics 07/2001; 28(2):184-7. · 35.53 Impact Factor -
Article: Mutations in the SMAD4/DPC4 gene in juvenile polyposis.
J R Howe, S Roth, J C Ringold, R W Summers, H J Järvinen, P Sistonen, I P Tomlinson, R S Houlston, S Bevan, F A Mitros, E M Stone, L A Aaltonen[show abstract] [hide abstract]
ABSTRACT: Familial juvenile polyposis is an autosomal dominant disease characterized by a predisposition to hamartomatous polyps and gastrointestinal cancer. Here it is shown that a subset of juvenile polyposis families carry germ line mutations in the gene SMAD4 (also known as DPC4), located on chromosome 18q21.1, that encodes a critical cytoplasmic mediator in the transforming growth factor-beta signaling pathway. The mutant SMAD4 proteins are predicted to be truncated at the carboxyl-terminus and lack sequences required for normal function. These results confirm an important role for SMAD4 in the development of gastrointestinal tumors.Science 06/1998; 280(5366):1086-8. · 31.20 Impact Factor -
Article: The influence of birth weight and socioeconomic position on cognitive development: Does the early home and learning environment modify their effects?
[show abstract] [hide abstract]
ABSTRACT: To establish whether effects of birth weight and socioeconomic position on cognition are explained or modified by home or learning environments. Prospective birth cohort (n = 13,980) with math tests at 7, 11, and 16 years of age and qualifications by 33 years of age. For 1 kg increase in birth weight, 7-year math Z score increased 0.23 (0.19 adjusted for parental interest in child's progress) and adult qualifications increased 0.22 (on a 5-point scale). Maternal reading benefited math less among lower than higher birth weights (p < .05). The birth weight effect remained unchanged 7 to 16 years of age. For each increment in social class (4 categories; IV&V to I&II), 7-year math increased 0.19 (0.12 adjusted for parental interest). Benefits of mother's reading and father's interest were greatest in classes IV&V (interaction p < .05). The difference in Z scores between classes I&II to IV&V was 0.57 at 7 years; 1.12 at 16 years of age. Estimates were little affected by home and school factors. Adult qualifications increased 0.40 per unit social class (0.33 adjusted for parental interest). Maternal interest reduced the chances of those from unskilled manual origins gaining few qualifications (p < .05). Similarly, interactions were seen for maternal reading and paternal interest. Influences in the home partly underlie associations between social background and cognition, but they do little to explain a birth weight/cognition association.Journal of Pediatrics 02/2006; 148(1):54-61. · 4.11 Impact Factor
Page 1
Multiple Common Susceptibility Variants near BMP
Pathway Loci GREM1, BMP4, and BMP2 Explain Part of
the Missing Heritability of Colorectal Cancer
Ian P. M. Tomlinson1.*, Luis G. Carvajal-Carmona1., Sara E. Dobbins2, Albert Tenesa3, Angela M. Jones1,
Kimberley Howarth1, Claire Palles1, Peter Broderick2, Emma E. M. Jaeger1, Susan Farrington3, Annabelle
Lewis1, James G. D. Prendergast3, Alan M. Pittman2, Evropi Theodoratou4, Bianca Olver2, Marion
Walker3, Steven Penegar2, Ella Barclay1, Nicola Whiffin2, Lynn Martin1, Stephane Ballereau3, Amy
Lloyd2, Maggie Gorman1, Steven Lubbe2, The COGENT Consortium", The CORGI Collaborators", The
EPICOLON Consortium", Bryan Howie5, Jonathan Marchini5, Clara Ruiz-Ponte6, Ceres Fernandez-
Rozadilla6, Antoni Castells7, Angel Carracedo6, Sergi Castellvi-Bel7, David Duggan8, David Conti9, Jean-
Baptiste Cazier1, Harry Campbell10, Oliver Sieber11, Lara Lipton11, Peter Gibbs11, Nicholas G. Martin12,
Grant W. Montgomery12, Joanne Young13, Paul N. Baird14, Steven Gallinger15, Polly Newcomb16, John
Hopper17, Mark A. Jenkins17, Lauri A. Aaltonen18, David J. Kerr19, Jeremy Cheadle20, Paul Pharoah21,
Graham Casey9, Richard S. Houlston2*, Malcolm G. Dunlop3*
1Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, 2Section of Cancer Genetics, Institute of Cancer Research, Sutton, United
Kingdom, 3Colon Cancer Genetics Group, Institute of Genetics and Molecular Medicine, University of Edinburgh and Medical Research Council Human Genetics Unit,
Edinburgh, United Kingdom, 4The University of Edinburgh Medical School, Edinburgh, United Kingdom, 5Department of Statistics, University of Oxford, Oxford, United
Kingdom, 6Galician Public Foundation of Genomic Medicine (FPGMX), Centro de Investigacion Biomedica en Red de Enfermedades Raras (CIBERER), Genomics Medicine
Group, Hospital Clinico, Santiago de Compostela, University of Santiago de Compostela, Galicia, Spain, 7Department of Gastroenterology, Hospital Clinic, CIBERehd,
IDIBAPS, University of Barcelona, Barcelona, Catalonia, Spain, 8Translational Genomics Research Institute, Phoenix, Arizona, United States of America, 9Department of
Preventive Medicine, University of Southern California, Los Angeles, California, United States of America, 10Public Health Sciences, University of Edinburgh, Edinburgh,
United Kingdom, 11Ludwig Colon Cancer Initiative Laboratory, Ludwig Institute for Cancer Research, Royal Melbourne Hospital, Parkville, Australia, 12Genetic and
Molecular Epidemiology Laboratories, Queensland Institute of Medical Research, Brisbane, Australia, 13Familial Cancer Laboratory, Queensland Institute of Medical
Research, Brisbane, Australia, 14Centre for Eye Research Australia, University of Melbourne, Melbourne, Australia, 15Samuel Lunenfeld Research Institute, Mount Sinai
Hospital, Toronto, Canada, 16Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America, 17Centre for Molecular, Environmental, Genetic,
and Analytic Epidemiology, The University of Melbourne, Australia, 18Department of Medical Genetics, Genome-Scale Biology Research Program, Biomedicum Helsinki,
University of Helsinki, Helsinki, Finland, 19Department of Clinical Pharmacology, University of Oxford, Oxford, United Kingdom, 20Institute of Medical Genetics, School of
Medicine, Cardiff University, Cardiff, United Kingdom, 21Cancer Research UK Laboratories, Strangeways Research Laboratory, Department of Oncology, University of
Cambridge, Cambridge, United Kingdom
Abstract
Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are
associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP)
pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for
disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional
predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk.
To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1
(15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We
identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P=3.93610210) and BMP2 (rs4813802,
P=4.65610211). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP
rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P=5.3361028) and
rs11632715 (P=2.30610210). As low-penetrance predisposition variants become harder to identify—owing to small effect
sizes and/or low risk allele frequencies—approaches based on informed candidate gene selection may become increasingly
attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to
independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing
heritability of common diseases.
PLoS Genetics | www.plosgenetics.org1June 2011 | Volume 7 | Issue 6 | e1002105
Page 2
Citation: Tomlinson IPM, Carvajal-Carmona LG, Dobbins SE, Tenesa A, Jones AM, et al. (2011) Multiple Common Susceptibility Variants near BMP Pathway Loci
GREM1, BMP4, and BMP2 Explain Part of the Missing Heritability of Colorectal Cancer. PLoS Genet 7(6): e1002105. doi:10.1371/journal.pgen.1002105
Editor: Greg Gibson, Georgia Institute of Technology, United States of America
Received March 5, 2011; Accepted April 8, 2011; Published June 2, 2011
Copyright: ? 2011 Tomlinson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Cancer Research UK provided principal funding for this study individually to IPM Tomlinson, MG Dunlop, RS Houlston, P Pharoah, and J Cheadle.
Additional funding was provided by the Oxford Comprehensive Biomedical Research Centre (to IPM Tomlinson) and the EU FP7 CHIBCHA grant (to LG Carvajal-
Carmona and IPM Tomlinson). Core infrastructure support to the Wellcome Trust Centre for Human Genetics, Oxford, was provided by grant 075491/Z/04.
Additional funding (to MG Dunlop) was provided by the Medical Research Council (G0000657-53203), CORE, and Scottish Executive Chief Scientist’s Office (K/OP/
2/2/D333, CZB/4/449). The Colon Cancer Family Registry was supported by the National Cancer Institute, National Institutes of Health, under Request for
Application #CA-95-011, and through cooperative agreements with the Australian Colorectal Cancer Family Registry (UO1 CA097735), the USC Familial Colorectal
Neoplasia Collaborative Group (UO1 CA074799), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (UO1 CA074800), Ontario Registry for Studies
of Familial Colorectal Cancer (UO1 CA074783), Seattle Colorectal Cancer Family Registry (UO1 CA074794), and The University of Hawaii Colorectal Cancer Family
Registry (UO1 CA074806). E Theodoratou was funded by a Cancer Research UK Fellowship (C31250/A10107). COIN and COIN-B were funded by the UK Medical
Research Council. COIN sample analysis (J Cheadle) was also funded by Cancer Research Wales, Tenovus & Wales Gene Park. P Pharoah is a Cancer Research UK
Senior Clinical Research Fellow. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: iant@well.ox.ac.uk (IPMT); richard.houlston@icr.ac.uk (RSH); malcolm.dunlop@hgu.mrc.ac.uk (MGD)
. These authors contributed equally to this work.
" Memberships of the consortia are provided in Text S1.
Introduction
Genome-wide association (GWA) studies of colorectal cancer
(CRC) have so far identified 14 common, low-risk susceptibility
variants [1]. Of these 14 variants, 3 are close to loci that are
secreted members of the bone morphogenetic protein (BMP)
signalling pathway: GREM1 (rs4779584); BMP4 (rs4444235); and
BMP2 (rs961253). In the colon, GREM1 is one of several BMP
antagonists produced by sub-epithelial myofibroblasts (ISEMFs).
GREM1 binds to and inactivates the ligands BMP2 and BMP4
that are primarily produced by inter-cryptal stromal cells.
Our GWA studies have utilised a primary phase of genome-
wide typing of tagging single nucleotide polymorphisms (tagSNPs),
followed by larger validation phases of those SNPs with the
strongest signals of association. We have previously used relatively
stringent statistical thresholds to take SNPs forward into the final
validation phases [1]. Whilst such a design has been cost-effective,
the use of a lower threshold may have led to the discovery of more
CRC SNPs, albeit at the cost of a relatively high type I error rate.
One means of reducing false positives might be to select SNPs
using a less stringent threshold where there is a priori evidence for
candidacy. We reasoned that the best candidate loci were those
already identified as harbouring CRC risk alleles. Of those 14 loci,
we prioritised GREM1, BMP2 and BMP4 for further analysis
owing to their strongly-related functions.
The GWA studies had identified a single tagSNP associated
with CRC risk close to each of GREM1, BMP2 and BMP4 [1].
Examination of the regions around these genes in public databases
such as HapMap (http://www.hapmap.org/) showed in all cases
that the coding sequence and predicted surrounding regulatory
regions were present within more than one linkage disequilibrium
(LD) block. For each of the 3 genes, therefore, it was possible that
there were additional genetic determinants of CRC risk,
independent of the already-identified SNPs. We proceeded to test
this hypothesis in large sets of CRC cases and controls of
European origin.
Results
The rs4779584 CRC signal results from two independent
SNPs close to GREM1
In order to refine the location of CRC-associated functional
variation close to the GREM1, BMP4 and BMP2 loci, we
genotyped 442 SNPs close to rs4779584, rs4444235 and
rs961253 in 4,878 CRC cases and 4,914 controls from the UK2
and Scotland2 sample sets and imputed other SNPs within these
regions. No significant localisation of a functional variant was
achieved for rs4444235 or rs961253 (Figure S1), but at GREM1,
rs16969681 (chr15:30,780,403 bases) had a notably stronger signal
of association than rs4779584 (pairwise LD: r2=0.18, D9=0.70)
(Figure 1 and Figure S2). We genotyped rs16969681 in additional
independent CRC case-control series (UK1, UK4, VQ58,
Helsinki, Cambridge and EPICOLON; see Methods). After
combined analysis, a significant association between the minor
allele at rs16969681 and CRC risk was seen (P=5.3361028;
Table 1). Unconditional logistic regression analysis, incorporating
sample series as a co-variate, showed that rs16969681 was more
strongly associated with CRC than rs4779584, but that the signals
were non-independent (for rs16969681, OR=1.16, 95% CI 1.07–
1.25, P=1.9161024; for rs4779584, OR=1.08, 95% CI 1.02–
1.14, P=5.2761023). Akaike information criteria metrics for
rs16969681 and rs4779584 respectively were 25608 and 25922,
consistent with a superior fit of the risk model incorporating the
former SNP. Intriguingly, we found that rs16969681 maps to a site
of open chromatin in GREM1-expressing CRC cell lines, raising
the possibility that it may be directly functional (Figure S3).
Haplotype risk analysis (Table S2) provided evidence that
rs16969681 alleles do not capture all the CRC risk associated with
rs4779584. In brief, data from UK2 and Scotland2 showed that
the risk alleles at rs16969681 and rs4779584 were defined by a
TGGTC haplotypeatrs16969681-rs16969862-rs12594722-
rs4779584-rs9888701. The TT rs16969681-rs4779584 haplotype
was at a frequency of 0.063 in cases and 0.052 in controls
(P=6.2961025). However, there appeared to be a residual effect
of the T allele at rs4779584, since there was also an elevated risk
associatedwiththeCTrs16969681-rs4779584
(P=0.026).
We therefore tested the hypothesis that rs4779584 tags two
independent risk SNPs at GREM1. We used reverse stepwise
logistic regression to search the set of GREM1 SNPs genotyped in
the UK2 and Scotland2 samples (Table S1) for associations that
were independent of rs16969681 genotype and that captured the
residual rs4779584 signal. This analysis led to elimination of
rs4779584 from the regression model and identification of a model
in which only rs16969681 (P=1.0461024) and another SNP,
rs11632715 (P=1.0061023), produced independent signals.
haplotype
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org2June 2011 | Volume 7 | Issue 6 | e1002105
Page 3
rs11632715 (chr15:30,791,539) is in low LD with rs16969681
(r2=0.009, D9=0.31) and modest LD with rs4779584 (r2=0.18,
D9=0.90; Figure S2). Through genotyping of additional case-
control series, we showed that rs11632715 was significantly
associated with CRC risk (P=2.30610210; Table 1). Uncondi-
tional logistic regression in the 21,139 samples typed for both
rs11632715 and rs16969681 provided confirmatory evidence of
the independence of the signals (for rs16969681, P=1.8461026
and for rs11632715, P=6.3661027); these associations were of
very similar magnitude to those obtained when each SNP was
analysed individually in those sample sets (Figure S2). Incorpora-
tion of rs4779584 into the logistic regression model showed that
this SNP had a weaker effect than that of either rs16969681 or
rs11632715 and did not significantly improve the model fit (Table
S3). Inspection of the region containing rs4779584, rs16969681
and rs11632715 (Figure S2) showed that rs4779584 lay within a
recombination hotspot. This finding was consistent with our
discovery that rs4779584 tags two independent functional variants
that are, in turn, tagged by rs16969681 and rs11632715.
Identification of additional, independent CRC
susceptibility SNPs near BMP4 and BMP2
The regions analysed for fine mapping encompassed only a
minority of the transcribed and flanking regions of GREM1, BMP4
and BMP2. We therefore tested for further independent CRC-
associated SNPs around these loci (Table S4) by undertaking a
pooled analysisof data from 5 CRCGWAstudies (UK1, Scotland1,
VQ58,CCFR,Australia)andfrom the UK2and Scotland2 samples
that had been genotyped at 55,000 SNPs with the strongest
evidence of association from meta-analysis of UK1 and Scotland1
(Figure S4) [1]. Since each of the 7 sample sets had been genotyped
using different, but overlapping, SNP panels, we performed the
combined analysis irrespective of the number of studies in which
any SNP had been typed. Figure 2 shows the resulting signals of
association from single SNP analysis in this discovery phase.
We prioritised SNPs for further assessment in the replication
data sets if they passed two thresholds. First, we required SNPs to
show association with CRC at P,161024under the allelic or
Cochran-Armitage tests; this was a less stringent threshold than
that used in our previously-reported hypothesis-free GWA studies
[1,2,3], reflecting the fact that GREM1, BMP4 and BMP2 were
strong candidate susceptibility genes. Four SNPs at BMP4, 3 at
BMP2 and 9 at GREM1 fulfilled this criterion (Figure 2). Second,
since our aim was to test for novel, independent disease variants
rather than to refine existing signals of association, we required
that SNP genotypes were not correlated with each other or with
previously identified risk SNPs (r2,0.05, D9,0.10). After applying
these criteria, one SNP at BMP4 (rs1957636) and one at BMP2
(rs4813802) were retained for subsequent analyses.
rs1957636 and rs4813802 were then genotyped in the validation
sample sets (Figure S4), comprising 15,075 CRC cases and 13,296
controls from six independent European case-control series
(COIN/NBS, UK3, UK4, Scotland3, Cambridge, Helsinki). After
combined analysis, significant associations (Table 2) were shown for
both rs1957636, P=1.3661029(OR=1.08, 95% CI: 1.06–1.011,
Phet=0.009, I2=54%) and rs4813802, P=7.52610211(OR=
1.09, 95% CI: 1.06–1.012, Phet=0.42, I2=3%). In case-only
analysis, neither SNP showed any evidence of association with age
or sex (P.0.05, details not shown).
We used unconditional logistic regression, adjusting for sample
series,totesttheindependenceofthetwopairsofSNPsatBMP4and
atBMP2.Inbothcases,eachsignalremainedindependent,reflecting
the existenceof recombination hotspotsbetween the pairs ofSNPsat
each locus (Figure S5 and Figure S6). For rs4444235 and rs1957636,
association P-values were respectively 2.0961028(I2=47.7%) and
3.93610210(I2=0%)). For rs961253 and rs4813802, P-values were
1.89610215(I2=5%) and 4.65610211(I2=5%)). Thus, all 4 SNPs
represented independent signals of association with CRC. Further
imputation around BMP4 and BMP2 provided no evidence for the
alternative possibility that a single variant was tagged by the two
SNPs in each region (details not shown).
Annotation of the regions around rs1957636 and
rs4813802
rs1957636 (chr14: 53,629,768) is 136 kb upstream of the
transcriptional start site of BMP4, 150 kb telomeric to the previous-
ly-identified CRC susceptibility SNP, rs4444235 (chr14:53,480,669),
which is downstream of BMP4. There is a recombination hotspot at
chr14:53,510,000 (Figure S5) and LD between rs1957636 and rs444-
4235 is very weak (r2=0.004, D9=0.073 from UK1). rs1957636 is
within a region of LD flanked by SNPs rs431669 (chr14:53,512,418)
and rs10150369 (chr14:53,873,515). This region contains no known
transcripts, and the nearest gene apart from BMP4 is CDKN3
(transcriptional start site, chr14:53,933,476). Using SNAP (http://
www.broadinstitute.org/mpg/snap/) to search HapMap3 release 2
and 1000 Genomes Pilot 1, we identified 265 SNPs were in moderate
orgreaterLD(r2.0.20)withrs1957636inEuropeans.OfthoseSNPs,
several mapped to sites of potential functional importance in BMP4
transcription (H3K4Me1, H3K4Me3, DNAseI hypersensitivity,
transcription factor ChIP-Seq), as evidenced by the ENCODE
regulation tracks (http://genome.ucsc.edu/cgi-bin/hgTrackUi?hg-
sid=171775907&c=chr14&g=wgEncodeReg) of the UCSC Ge-
nome Browser. For example, rs12432287 (r2=0.60, D9=1.00 with
rs1957636) and rs728425 (r2=0.69, D9=1.00) lie within a region of
apparently high transcriptional regulatory activity at chr14:53,
642,340–53,652,937. Another SNP, rs8011813 (r2=0.822, D9=
0.811), maps within a similar region at chr14:53, 728, 957–53, 731,
647.AlthoughnoneoftheSNPsinthe regionaround rs1957636isthe
location of a reported eQTL (http://eqtl.uchicago.edu/cgi-bin/
gbrowse/eqtl/), no studies relating transcription to SNP genotype
have yet been undertaken in the colorectum.
Author Summary
Genome-wide association studies (GWAS) have identified
several colorectal cancer (CRC) susceptibility polymor-
phisms near genes that encode proteins in the bone
morphogenetic protein (BMP) pathway. However, most of
the inherited susceptibility to CRC remains unexplained.
We investigated three of the best candidate BMP genes
(GREM1, BMP4, and BMP2) for additional polymorphisms
associated with CRC. By extensive validation of polymor-
phisms with only modest evidence of association in the
initial phases of the GWAS, we identified new, indepen-
dent CRC predisposition polymorphisms close to BMP4
(rs1957636) and BMP2 (rs4813802). Near GREM1, we used
additional genotyping around the GWAS-identified poly-
morphism rs4779584 to demonstrate two independent
signals represented by rs16969681 and rs11632715.
Common genes with modest effects on disease risk are
becoming harder to identify, and approaches based on
informed candidate gene selection may become increas-
ingly attractive. In addition, genetic fine mapping around
polymorphisms identified in GWAS can deconvolute
associations which have arisen owing to two independent
functional variants. These types of study can identify some
of the apparently missing heritability of common disease.
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org3June 2011 | Volume 7 | Issue 6 | e1002105
Page 4
rs4813802 maps to chr20:6,647,595, about 49 kb upstream of
BMP2 and 295 kb telomeric of the previously-identified BMP2
CRC susceptibility SNP, rs961253 (chr20:6,352,281). There is
very little LD between these two SNPs (r2=0.000, D9=0.017
from UK1) owing to a recombination hotspot at chr20:6,587,000
(Figure S5). rs4813802 lies within a region of LD flanked by
rs727689 (chr20:6,636,405) and rs6117401 (chr20:6,664,097).
This region contains 3 spliced ESTs, BX107852, BG822004 and
DB094697; none of these has any known functional role or
homology to other human or non-human transcripts or genes. The
nearest gene to rs4813802 apart from BMP4 is FERMT1
(transcriptional start site, chr20:6,052,191). From HapMap3
release 2 and 1000 Genomes Pilot 1, 29 SNPs were found to be
in moderate or greater LD (r2.0.20) with rs4813802 in
Europeans. OfthoseSNPs,
chr20:6,636,405–6,647,595 mapped to sites of potential functional
importance in BMP2 transcription. None of the SNPs in the area
around rs4813802 is the location of a reported eQTL.
severalin theregion
Gene-gene interactions and other SNPs near BMP
pathway loci
Using a case-control logistic regression design, we searched for
pairwise gene-gene interactions between 5 SNPs associated with
CRC risk (rs4444235, rs1957636, rs961253, rs4813802 and
rs4779584). Risks were additive and no evidence of epistasis was
detected (P.0.2 for all SNP pairs).
Figure 1. Fine mapping around the known CRC risk SNP close to GREM1 (15q13.3). Results for meta-analysis of UK2 and Scotland2 are
shown. Both significance of association (2log10(P)) and effect size (b) are presented. The original CRC-associated tagSNPs are shown in blue. The SNP
with the clearly strongest association signal is the genotyped SNP rs16969681.
doi:10.1371/journal.pgen.1002105.g001
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org4 June 2011 | Volume 7 | Issue 6 | e1002105
Page 5
We also searched for evidence of CRC susceptibility alleles at
tagSNPs close to other BMP pathway genes. Using the transcribed
regions of flanking genes as boundaries, we identified 4,361
tagSNPs mapping to 37 BMP agonist, antagonist and receptor loci
(Table S5). However, we found no statistically significant evidence
of associations with disease (P.1023in all cases).
Discussion
We have identified two new CRC predisposition tagSNPs close
to BMP4 (rs1957636) and BMP2 (rs4813802). To date, few other
loci have been shown at stringent levels of significance to harbour
more than one, independent cancer susceptibility variant. One
notable exception is the locus proximal to MYC on chromosome
8q24.21 that contains multiple regions independently associated
with the risk of prostate and other cancers [4]. Low-penetrance
cancer predisposition loci are becoming increasingly hard to
identify, owing to small effect sizes and/or low risk allele
frequencies – and a return to candidate gene-based approaches
may become increasingly attractive. It is true that in the past,
candidate gene approaches have generally been unsuccessful at
identifying cancer risk loci, but it is now possible to make use of
information, such as expression quantitative trait locus identifica-
tion, that increasingly permits a more considered approach.
Table 1. Genotype counts and statistics of association at rs16969681, rs11632715, and rs4779584.
Sample
series
Case
genotypes
Control
genotypes
Odds
ratio
TTTCCC TTTCCCfreqT(cases)freqT(controls)
rs169696811UK229 500 233528 39924280.097 0.080 1.247
z=5.442 Scotland216 3371653 10 29317540.0920.076 1.230
Poverall=5.3361028
3 UK112 1837464 1457730.110 0.0831.366
OR=1.1814VQNBS82351074 26491 2418 0.0950.093 1.033
95%CI 1.113–1.2545EPICOLON13 3131234 142291025 0.1090.1011.081
Phet=0.013, I2=60.8%6 Helsinki 141897428 1047260.1150.072 1.682
7 UK46 95 4826 1808620.0920.0921.002
Preplicationphase=2.73610248Cambridge 213701852 41279 1818 0.0920.084 1.097
TTTCCCTT TCCC freqT(cases) freqT(controls)
rs47795841 UK21559341762 102857 1858 0.2180.188 1.203
z=6.302 Scotland284 6031276 726081332 0.1960.187 1.063
Poverall=2.98610210
3 UK152 31653330 2886110.233 0.187 1.319
OR=1.1454 VQNBS81564 1155102 79716010.2020.2001.009
95%CI 1.098–1.1955 EPICOLON 61 434878 51 396934 0.202 0.180 1.154
Phet=0.056, I2=45.8%6 Helsinki88 362 378693524180.325 0.2921.167
7UK433174 36127241426 0.2110.2130.992
8 Scotland155331 59139 286676 0.2260.1821.312
9CCFR58412 71632 319647 0.2230.1921.206
10Australia22 149269171362850.2190.194 1.167
AA AG GG AAAG GGfreqA(cases)freqA(controls)
rs11632715 1 UK27191415 718 6001450 7680.5000.4701.128
z=6.342Scotland247599953243810405790.4860.4661.084
Poverall=2.30610210
3UK1244469207222 4572500.5200.4851.151
OR=1.1164VQNBS44691344056712526790.5020.4781.101
95%CI 1.079–1.1555EPICOLON308 684401238 651378 0.4670.4451.092
Phet=0.997, I2=0.0%6Helsinki 182488 272150 409276 0.4520.4251.119
7UK4150 281 141 222534 2850.508 0.470fs1.165
Preplicationphase=1.79610278Scotland1 2384862562194962870.491 0.4661.104
9 CCFR 286592311202 510 2870.4890.4571.137
10Australia 106226 105 88 2311190.5010.4651.158
All data sets in which rs16969681 and/or rs11632715 were genotyped are shown. The sample sets genotyped for the SNPs near GREM1 are overlapping, but non-
identical, largely because rs11632715 and rs4779584 (but not rs16969681) are present on the proprietary Illumina genome-wide arrays, and also because the Cambridge
data set was additionally genotyped for rs16969681. In addition to the overall association test statistics, the P value for the replication phase (excluding UK2 and
Scotland2) is shown for rs16969681 and rs11632715. Although there is considerable overlap, the sample sets genotyped here differ somewhat from those typed for the
BMP2 and BMP4 SNPs. These differences result entirely from sample and data availability and practical issues of genotyping, including the following: GWAS data but not
samples were available from some data sets, so that SNPs such as rs16969681 could not be genotyped in those sample sets; the 1958 Birth Cohort samples were not
available at the time of genotyping rs16969681; and for some sample sets, DNA quantity was limiting.
doi:10.1371/journal.pgen.1002105.t001
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org5June 2011 | Volume 7 | Issue 6 | e1002105
Page 6
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org6June 2011 | Volume 7 | Issue 6 | e1002105
Page 7
We have also found good evidence that the original CRC-
associated SNP near GREM1, rs4779584 [5], tags two indepen-
dent functional SNPs, represented by association signals at
rs16969681 and rs11632715. This finding emphasises that genetic
fine-mapping studies are valuable not only for detecting stronger
association signals, but also for deconvoluting tagSNP associations
that have arisen owing to independent correlation of the tagSNP
with more than one functional SNP. The original rs4779584
tagSNP signal could be described as an example of ‘‘synthetic
association’’, a term that has been used to describe a situation in
which multiple, sometimes rare, variants underlie a tagSNP signal
[6,7]. Synthetic association can explain some of the apparently
missing heritability of complex diseases. Here, we estimate that the
6 SNPs close to the 3 BMP pathway genes contribute
approximately 2% of the heritability of CRC, about double that
estimated before this study.
Finally, our data provide evidence that GREM1, BMP4 and
BMP2 are the targets of the functional variation in each region.
Multiple, independently-acting variants close to these loci
contribute to CRC risk. Perhaps unexpectedly, there are no
detectable genetic interactions among these variants. If the
downstream SMAD effectors that function within both the BMP
and TGF-beta pathways are included, the components of BMP
signalling involved in CRC risk might comprise up to 3 high-
penetrance predisposition genes (SMAD4, BMPR1A, GREM1) and
8 low-penetrance variants at GREM1, BMP4, BMP2, SMAD7 and
LAMA5 (tagged respectively by rs16969681 and rs11632715,
rs4444235 and rs1957636, rs961253 and rs4813802, rs4939827,
and rs4925386) [1,2,3,5,8,9,10,11]. Collectively these data em-
phasise the potential importance of genetic variants in the BMP
pathway for CRC predisposition.
Methods
Ethics statement
Collection of blood samples and clinico-pathological informa-
tion from patients and controls was undertaken with informed
consent and ethical review board approval in accordance with the
tenets of the Declaration of Helsinki.
Overall strategy
The study had two main components: (i) refinement of existing
GWAS signals at the GREM1, BMP4 and BMP2 loci using a dense
genotyping and imputation approach in several thousand cases
and controls previously used for GWAS validation; and (ii) a
search for new, independent CRC tagSNPs at the same three loci
using a less stringent threshold for validation than used previously,
combined with multiple validation sample sets.
Discovery screen data sets
UK1 (CORGI) [1] comprised 922 cases with colorectal
neoplasia (47% male) ascertained through the Colorectal Tumour
Gene Identification (CoRGI) consortium. All had at least one first-
degree relative affected by CRC and one or more of the following
phenotypes: CRC at age 75 or less; any colorectal adenoma
(CRAd) at age 45 or less; $3 colorectal adenomas at age 75 or less;
or a large (.1 cm diameter) or aggressive (villous and/or severely
dysplastic) adenoma at age 75 or less. The 929 controls (45%
males) were spouses or partners unaffected by cancer and without
a personal family history (to 2nddegree relative level) of colorectal
neoplasia. Known dominant polyposis syndromes, HNPCC/
Lynch syndrome or bi-allelic MYH mutation carriers were
excluded. All cases and controls were of white UK ethnic origin.
Scotland1 (COGS) [1] included 980 CRC cases (51% male;
mean age at diagnosis 49.6 years, SD66.1) and 1,002 cancer-free
population controls (51% male; mean age 51.0 years; SD65.9).
Cases were for early age at onset (age #55 years). Known
dominant polyposis syndromes, HNPCC/Lynch syndrome or bi-
allelic MYH mutation carriers were excluded. Control subjects
were sampled from the Scottish population NHS registers,
matched by age (65 years), gender and area of residence within
Scotland.
VQ58 comprised 1,832 CRC cases (1,099 males, mean age of
diagnosis 62.5 years; SD610.9) from the VICTOR [12] and
QUASAR2 (www.octo-oxford.org.uk/alltrials/trials/q2.html) tri-
als. There were 2,720 population control genotypes (1,391 males,)
from the Wellcome Trust Case-Control Consortium 2 (WTCCC2)
1958 birth cohort (also known as the National Child Development
Study), which included all births in England, Wales and Scotland
during a single week in 1958 [13].
The Colon Cancer Family Registry (CCFR) data set [14]
comprised 1,332 familial CRC cases and 1,084 controls Colon
Cancer Family Registry (Colon-CFR) (http://epi.grants.cancer.
gov/CFR/about_colon.html). The cases were recently diagnosed
CRC cases reported to population complete cancer registries in
the USA (Puget Sound, Washington State) who were recruited by
the Seattle Familial Colorectal Cancer Registry; in Canada
(Ontario) who were recruited by the Ontario Familial Cancer
Registry; and in Australia (Melbourne, Victoria) who were
recruited by the Australasian Colorectal Cancer Family Study.
Controls were population-based and for this analysis were
restricted to those without a family history of colorectal cancer.
The Australian study [15] comprised 591 patients treated for
CRC at the Royal Melbourne, Western and St Francis Xavier
Cabrini Hospitals in Melbourne from 1999 to 2009. The 2,353
controls were derived from Queensland or Melbourne: for the
former, the controls came from the Brisbane Twin Nevus Study
[16]; for the latter, individuals were participants in the Genes in
Myopia study [17]. There was no overlap between the CFR and
Australian data sets. Owing to potential residual ethnic heteroge-
neity within the Melbourne population, for the Australian cohort
only we performed an additional screen to minimise heterogeneity
after performing principal components analysis (PCA) to remove
individuals who clustered with non-CEU individuals (see below).
We achieved this by performing PCA on the Australian cases and
controls without reference samples of known ancestry. We then
paired each case with a control in a 1:1 ratio based on a maximum
separation of 0.050 using the first and second eigenvectors. All
unpaired samples were excluded, leaving 441 cases and 441
controls in the study. The genomic inflation factor, lGC, was 1.02
after this filtering.
UK2 (NSCCG) [1] consisted of 2,854 CRC cases (58% male,
mean age at diagnosis 59.3 years; SD68.7) ascertained through
two ongoing initiatives at the Institute of Cancer Research/Royal
Marsden Hospital NHS Trust (RMHNHST) from 1999 onwards -
The National Study of Colorectal Cancer Genetics (NSCCG) and
Figure 2. Search for additional colorectal cancer susceptibility SNPs near GREM1, BMP4, and BMP2. Association signals from discovery
phase around GREM1, BMP4 and BMP2 are shown. For GREM1, the labelled SNPs are highyl correlated tagSNPs originally reported as associated with
CRC; these signals are non-independent. For BMP4 and BMP2, the labelled SNPs are the original tagSNPs and the subsequently proven new signals at
rs1957636 and rs4813802 respectively.
doi:10.1371/journal.pgen.1002105.g002
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org7 June 2011 | Volume 7 | Issue 6 | e1002105
Page 8
Table 2. Summary of individual SNP association analysis for rs4444235, rs1957636, rs961253, and rs4813802.
Sample
series
Case
genotypes
Control
genotypes
OR
CCCTTTCCCT TTfreqC (cases) freqC (controls)
rs44442351 UK1247 441233 184 4702740.508 0.452 1.252
Poverall=1.95610211
2Scotland1220500256195512294 0.4820.4511.133
OR=1.0913UK2684140776163913228570.4870.4611.106
95%CI 1.064–1.1194 Scotland2449 1017540 428 999630 0.4770.451 1.112
Phet=0.589, I2=0.0%5VQ58 410886503 6031312 7730.4740.4681.023
6CCFR 2985952902274962740.5030.4761.114
7Australia108 208 12476233129 0.4820.4391.186
Pdiscovery=1.6161027
8Helsinki202459 272150405 2730.4620.426 1.161
9Cambridge 5371083 618 519 1086650 0.482 0.471 1.045
Preplication=1.5661025
10COIN/NBS 510 10445935321246 722 0.4810.4621.078
11UK31828 38652012 1006 211612470.4880.4721.065
12Scotland3 268 554305 4321130628 0.4840.455 1.121
13 UK4127306 141210 5442880.488 0.463 1.107
AA AGGGAAAGGG freqA(cases) freqA(controls)1.06
rs19576361 UK1171440310 151 4353430.425 0.3971.122
Poverall=1.3661029
2 Scotland1172 484 320148499 3540.424 0.3971.118
OR=1.0843 UK2475138310024571348 10440.4080.397 1.046
95%CI 1.056–1.1124Scotland2 365964677 324971761 0.4220.394 1.125
Phet=0.009, I2=54.1%5VQ58 302891606 4451303 9410.4160.408 1.032
6CCFR206 594386142 4863700.4240.386 1.173
7Australia68 22015262208168 0.405 0.3791.113
Pdiscovery=2.7961025
8Helsinki169 462 292146397 2530.433 0.4331.002
9Cambridge 3641071788 38210537650.4050.413 0.966
Preplication=1.2461025
10COIN/NBS 4051087690 320 1056 7970.4350.3901.201
11 UK3123434432438 660 2076 15190.415 0.3991.070
12Scotland3 223 601448326 1160 8080.412 0.395 1.071
13UK4 128 274182 159485400 0.4540.3851.329
AAAC CCAA AC CC freqA(cases)freqA(controls)
rs9612531UK11604183431334233730.4010.3711.134
Poverall=4.45610216
2Scotland1151460366 127468406 0.3900.3611.133
OR=1.1173UK241813351099351130011670.3810.355 1.115
95%CI 1.088–1.1484 Scotland2 300939767 2658988940.384 0.347 1.171
Phet=0.379, I2=6.8%5VQ58 219771 620 35712491083 0.375 0.3651.046
6CCFR 159558469 116 4704100.369 0.352 1.076
7Australia63 218159 58193 1870.391 0.3531.178
Pdiscovery=1.2961027
8Helsinki 11540841872 3354070.3390.2941.230
9Cambridge 287928801275971 8980.3730.3551.080
Preplication=4.97610210
10COIN/NBS 3051034 8433321171 9970.3770.3671.042
11UK31073366529135511944 1823 0.3800.353 1.124
12 Scotland3 177 5034232609759470.3880.3431.219
13UK495271204 1474744220.4040.3681.165
GG TGTTGGTG TTfreqG(cases)freqG(controls)
rs48138021UK1141 425350130398 3930.386 0.3571.131
Poverall=7.52610211
2 Scotland1 147 452373151450 3930.384 0.3781.023
OR=1.0933UK2 374 129910263351294 10370.3790.3681.048
95%CI 1.064–1.1224Scotland2 359 918724 277 9558300.4090.3661.198
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org8 June 2011 | Volume 7 | Issue 6 | e1002105
Page 9
the Royal Marsden Hospital Trust/Institute of Cancer Research
Family History and DNA Registry. The 2,822 controls (41%
males; mean age 59.8 years; SD610.8) were the spouses or
unrelated friends of patients with malignancies. None had a
personal history of malignancy at time of ascertainment. All cases
and controls had self-reported European ancestry, and there were
no obvious differences in the demography of cases and controls in
terms of place of residence within the UK.
Scotland2 (SOCCS) [1] comprised 2,024 CRC cases (61%
male; mean age at diagnosis 65.8 years, SD68.4) and 2,092
population controls (60% males; mean age 67.9 years, SD69.0)
ascertained in Scotland. Cases were taken from an independent,
prospective, incident CRC case series and aged ,80 years at
diagnosis. Control subjects were population controls matched by
age (65 years), gender and area of residence within Scotland.
Replication data sets
UK3 (NSCCG) [1] comprised 7,912 CRC cases (65% male;
mean age at diagnosis 59 years, SD68.2) and 4,398 controls (40%
male; mean age 62 years, SD611.5) ascertained through NSCCG
post-2005.
Scotland3 (SOCCS) [1] comprised 1,145 CRC cases (50% male;
mean age at diagnosis 53.2 years, SD615.4) and 2,203 cancer-free
population controls (47% male; mean age 51.8 years, SD611.5).
Controls were recruited as part of the Generation Scotland study.
UK4 (CORGI2BCD) [1] consisted of 621 CRC cases (46%
male; mean age at diagnosis 58.3 years; SD614.1) and 1,121
cancer-free population or spouse controls (45% male; mean age
45.1 years, SD615.9).
Cambridge/SEARCH consisted of 2,248 CRC cases (56%
male; mean age at diagnosis 59.2 years, SD68.1) and 2,209
controls (42% males; mean age 57.6 years; SD615.1. Samples
were ascertained through the SEARCH (Studies of Epidemiology
and Risk Factors in Cancer Heredity, http://www.cancerhelp.org.
uk/trials/a-study-looking-at-genetic-causes-of-cancer) study based
in Cambridge, UK. Recruitment started in 2000; initial patient
contact was though the general practitioner. Control samples were
collected post-2003. Eligible individuals were sex- and frequency-
matched in five-year age bands to cases.
The COIN samples [18] were 2,151 cases derived from the
COIN and COIN-B clinical trials of metastatic CRC. Median age
was 63 years. COIN cases were compared against genotypes from
2,501 population controls (1,237 males,) from the WTCCC2
National Blood Service (NBS) cohort (50% male; mean age at
diagnosis 53.2 years, SD615.4).
The Helsinki (FCCPS) study (http://research.med.helsinki.fi/
gsb/aaltonen/) comprised 988 cases from a population-based
collection centred on south-eastern Finland and 864 population
controls from the same collection.
EPICOLON [19] included 1,410 cases matched with the same
number of controls collected in a prospective fashion from centres
in Spain. Exclusion criteria were Mendelian CRC syndromes and
a personal history of inflammatory bowel disease.
In all cases CRC was defined according to the ninth revision of
the International Classification of Diseases (ICD) by codes 153–
154 and all cases had pathologically proven adenocarcinomas.
Sample preparation and genotyping
DNA was extracted from samples using conventional methods
and quantified using PicoGreen (Invitrogen). The VQ, UK1,
Scotland1 and Australia GWA cohorts were genotyped using
Illumina Hap300, Hap370, or Hap550 arrays. 1958BC and NBS
genotyping was performed as part of the WTCCC2 study on
Hap1M arrays. The CCFR samples were genotyped using
Illumina Hap1M or Hap1M-Duo arrays. In UK2 and Scotland2,
genotyping was conducted using custom Illumina Infinium arrays
according to the manufacturer’s protocols. Some COIN SNPs
were typed on custom Illumina Goldengate arrays. To ensure
quality of genotyping, a series of duplicate samples was genotyped,
resulting in 99.9% concordant calls in all cases.
Other genotyping was conducted using competitive allele-
specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire,
UK), Taqman (Life Sciences, Carlsbad, California) or MassAR-
RAY (Sequenom Inc., San Diego, USA). All primers, probes and
conditions used are available on request. Genotyping quality
control was tested using duplicate DNA samples within studies and
SNP assays, together with direct sequencing of subsets of samples
to confirm genotyping accuracy. For all SNPs, .99% concordant
results were obtained.
Quality control and sample exclusion
We excluded SNPs from analysis if they failed one or more of
the following thresholds: GenCall scores ,0.25; overall call rates
,95%; MAF,0.01; departure from Hardy-Weinberg equilibrium
(HWE) in controls at P,1024or in cases at P,1026; outlying in
terms of signal intensity or X:Y ratio; discordance between
duplicate samples; and, for SNPs with evidence of association,
poor clustering on inspection of X:Y plots.
We excluded individuals from analysis if they failed one or more
of the following thresholds: duplication or cryptic relatedness to
GGTG TTGGTG TTfreqG(cases)freqG(controls)
Phet=0.416, I2=3.1%5VQ58 268832 694351120811310.3810.3551.120
6 CCFR176 546 460112 4454360.380 0.3371.206
7Australia53208169532121730.365 0.3631.009
Pdiscovery=2.7561027
8 Helsinki13140039588366375 0.3570.3271.145
9 Cambridge 280992 707284 10398320.3920.3731.085
Preplication=3.4761025
10COIN/NBS 313982 818 327 116210120.3810.3631.078
11UK310373582 2969 544 20251751 0.3730.3601.055
12Scotland3205523 451 318 10438780.3960.3751.091
13UK492265221102 285 301 0.388 0.3551.152
Results of allelic test of association in all sample sets are shown.
doi:10.1371/journal.pgen.1002105.t002
Table 2. Cont.
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org9 June 2011 | Volume 7 | Issue 6 | e1002105
Page 10
estimated identity by descent (IBD) .6.25%; overall successfully
genotyped SNPs ,95%; mismatch between predicted and
reported gender; outliers in a plot of heterozygosity versus
missingness; and evidence of non-white European ancestry by
PCA-based analysis in comparison with HapMap samples (http://
hapmap.ncbi.nlm.nih.gov/). We excluded 6 duplicate samples
using PCA (see below) within the UK samples that had undergone
analysis of over 200 SNPs (UK1, Scotland1, UK2, Scotland2, VQ,
1958BC, NBS, COIN). We excluded duplicates from other UK
cohorts on the basis of names (or initials where release of names
was not possible) and dates of birth. No duplicates were found
from the CCFR or Australian sample sets.
To identify individuals who might have non-northern European
ancestry, we merged our case and control data from all sample sets
with the 60 European (CEU), 60 Nigerian (YRI), and 90 Japanese
(JPT) and90 HanChinese(CHB)individuals from the International
HapMap Project. For each pair of individuals, we calculated
genome-wide identity-by-state distances based on markers shared
between HapMap2 and our SNP panel, and used these as
dissimilaritymeasuresuponwhichtoperform principalcomponents
analysis. Principal components analysis was performed using
Eigenstrat/SmartPCA using CEU, YRI and HCB HapMap
samples as reference. The first two principal components for each
individual were plotted and any individual not present in the main
CEU cluster (that is, .5% of the PC distance from HapMap CEU
cluster centroid) was excluded from subsequent analyses.
We had previously shown the adequacy of the case-control
matching and possibility of differential genotyping of cases and
controls using Q-Q plots of test statistics in STATA. The inflation
factor lGCwas calculated by dividing the mean of the lower 90%
of the test statistics by the mean of the lower 90% of the expected
values from a x2distribution with 1 d.f. Deviation of the genotype
frequencies in the controls from those expected under HWE was
assessed by x2test (1 d.f.), or Fisher’s exact test where an expected
cell count was ,5.
SNP selection and genotyping for fine mapping
Regions selected for fine mapping were: chr15:30,733,560–
30,802,752; chr14:53,430,973n 53,530,761; and chr20:6,292,730–
6,402,661. These corresponded to the haplotype blocks and
immediately flanking regions harbouring rs4779584, rs4444235,
and rs961253. To define these haplotype blocks and the
recombination hotspots harbouring these CRC-associated SNPs,
we used Haploviewand SequenceLDHot.From dbSNP(build128),
we selected all SNPs between the recombination hotspots flanking
the haplotype block. All these SNPs were submitted to Illumina for
assay design and those with a design score.0.3 were genotyped on
custom arrays in the UK2 and Scotland2 case-control series. In
total, we genotyped 81, 42 and 60 SNPs in the 15q13.3, 14q22.2
and 20p12.3 regions respectively. A list of these SNPs is shown in
Table S1. Association statistics, using an additive model, were
obtained with SNPTEST v2 (www.stats.ox.ac.uk/,marchini/
software/gwas/snptest.html). We used genotype data from the
1000 Genomes CEPH (http://www.1000genomes.org/) and Hap-
Map3 CEPH and TSI samples (www.hapmap.org/) and the
IMPUTE v2 software (https://mathgen.stats.ox.ac.uk/impute/
impute_v2.html) to generate in silico genotypes at additional SNPs
in all three regions. This imputation resulted in the addition of 74,
113 and 255 markers in the chromosome 15q13.3, 14q22.2 and
20p12.3 regions respectively (for details on imputed and genotyped
markers see Table S1). Association meta-analyses only included
markers with proper_info scores .0.5, imputed call rates per SNP
.0.9 and minor allele frequencies (MAFs) .0.01. Meta-analyses of
the two sample sets were carried out with Meta (http://www.stats.
ox.ac.uk/,jsliu/meta.html) using the genotype probabilities from
IMPUTE v2, where a SNP was not directly typed. To test for the
presence of additional independent risk alleles in each region, we
carried out logistic regression analysis within each region, both
pairwise with the original tagSNP and then in a backwards analysis
that included all SNPs with evidence of association in the meta-
analysis at P,561024.
Statistical analysis
Association between SNP genotype and disease status was
primarily assessed in STATA v10 (http://www.stata.com/) and
PLINK v1.07 (http://pngu.mgh.harvard.edu/,purcell/plink/)
using allelic and Cochran-Armitage tests (both with 1df)
respectively, or by Fisher’s exact test where an expected cell
count was ,5. Genotypic (2df), dominant (1df) and recessive (1df)
tests were also performed. The risks associated with each SNP
were estimated by allelic, heterozygous and homozygous odds
ratios (ORs) using unconditional logistic regression, and associated
95% confidence intervals (CIs) were calculated.
Joint analysis of data generated from multiple phases was
conducted using standard methods for combining raw data based
on the Mantel-Haenszel method in STATA and PLINK. The
reported meta-analysis statistics were derived from analysis of
allele frequencies, and joint ORs and 95% CIs were calculated
assuming fixed- and random-effects models. Tests of the
significance of the pooled effect sizes were calculated using a
standard normal distribution. Cochran’s Q statistic to test for
heterogeneity [20] and the I2statistic [21] to quantify the
proportion of the total variation due to heterogeneity were
calculated. Large heterogeneity is typically defined as I2$75%.
Where significant heterogeneity was identified, results from the
random effects model were reported. Alongside, we also
performed meta-analysis based on allele dosage (0, 1, 2) and
incorporated age and sex as co-variates. Although age and sex are
associated with colorectal cancer risk, they were not associated
with SNP genotype and did not materially affect the significance of
any of the 6 reported associations (details not shown).
We used Haploview software v4.2 (http://www.broadinstitute.
org/haploview) to infer the LD structure of the genome in the
regions around GREM1, BMP2 and BMP4. The combined effects of
pairs of loci identified as associated with CRC risk were investigated
by multiple logistic regression analysis in PLINK to test for
independent effects of each SNP and stratifying by sample series.
Evidence for interactive effects between SNPs (epistasis) was
assessed bylikelihoodratiotestassuminganallelicmodelinPLINK.
The sibling relative risk attributable to a given SNP was
calculated using the formula
l ? ~p pr2zqr1
ðÞ2zq pr1zq
ðÞ2
p2r2z2pqr1zq2
ðÞ2
where p is the population frequency of the minor allele, q=12p,
and r1 and r2 are the relative risks (estimated as OR) for
heterozygotes and rare homozygotes, relative to common
homozygotes [22]. Assuming a multiplicative interaction, the
proportion of the familial risk attributable to a SNP was calculated
as log(l*)/log(l0), where l0 is the overall familial relative risk
estimated from epidemiological studies of CRC, assumed to be 2.2
[23]. UK2/NSCCG2 samples were used for this estimation. The
Akaike information criterion was calculated using the swaic
command in STATA.
Genome co-ordinates were taken from the NCBI build 36/hg18
(dbSNP b126).
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org 10June 2011 | Volume 7 | Issue 6 | e1002105
Page 11
Supporting Information
Figure S1
close to (a) BMP4 (14q22) and (b) BMP2 (20p12).
(DOCX)
Fine mapping around the known CRC risk SNPs
Figure S2
rs16969681 and rs11632715 near GREM1 (upper) and position of
recombination hotspot (lower).
(DOCX)
Pairwise linkage disequilibrium between rs4779584,
Figure S3
of GREM1.
(DOCX)
Histone methylation and acetylation marks upstream
Figure S4
(DOCX)
Study design for discovery and validation phases.
Figure S5
and BMP2.
(DOCX)
Large-scale LD structure in regions around BMP4
Figure S6
around BMP4 and BMP2.
(DOCX)
Locations of recombination hotspots in regions
Table S1
the fine mapping of the regions around rs4779584, rs4444235, and
rs961253 in UK2 and Scotland2.
(DOCX)
SNPs genotyped directly or predicted by imputation in
Table S2
(DOCX)
Haplotype risk analysis at rs16969681 and rs4779584.
Table S3
genotypes at rs4779584, rs16969681, and rs11632715.
(DOCX)
Logistic regression model analysis of CRC risk and
Table S4
for new associations.
(DOCX)
TagSNPs around GREM1, BMP4, and BMP2 analysed
Table S5
associations with CRC were analysed.
(DOCX)
Additional BMP pathway genes around which tagSNP
Text S1
(DOCX)
Consortium co-authors.
Acknowledgments
This study made use of genotyping data on the 1958 Birth Cohort and
NBS samples, kindly made available by the Investigators of those studies
and the Wellcome Trust Case-Control Consortium 2; a full list of the
investigators who contributed to the generation of the data is available
from http://www.wtccc.org.uk/. Finally, we would like to thank all
individuals who participated in the study.
Author Contributions
Conceived and designed the experiments: IPM Tomlinson, RS Houlston,
MG Dunlop. Performed the experiments: LG Carvajal-Carmona, AM
Jones, K Howarth, P Broderick, EEM Jaeger, S Farrington, A Lewis, JGD
Pendergast, AM Pittman, B Olver, M Walker, N Whiffin, C Ruiz-Ponte, C
Fernandez-Rozadilla. Analyzed the data: IPM Tomlinson, LG Carvajal-
Carmona, SE Dobbins, A Tenesa, AM Jones, C Palles, E Theodoratou, S
Ballereau, A Lloyd, J-B Cazier. Contributed reagents/materials/analysis
tools: S Penegar, E Barclay, L Martin, M Gorman, S Lubbe, B Howie, J
Marchini, A Castells, A Carracedo, S Castellvi-Bel, D Duggan, D Conti, H
Campbell, O Sieber, L Lipton, P Gibbs, NG Martin, GW Montgomery, J
Young, PN Baird, P Newcomb, J Hopper, MA Jenkins, LA Aaltonen, DJ
Kerr, J Cheadle, P Pharoah, G Casey, S Gallinger. Wrote the paper: IPM
Tomlinson, RS Houlston, MG Dunlop.
References
1. Houlston RS, Cheadle J, Dobbins SE, Tenesa A, Jones AM, et al. (2010) Meta-
analysis of three genome-wide association studies identifies susceptibility loci for
colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet 42:
973–977.
2. Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, et al. (2007)
A genome-wide association scan of tag SNPs identifies a susceptibility variant for
colorectal cancer at 8q24.21. Nat Genet 39: 984–988.
3. Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, et al. (2008)
Meta-analysis of genome-wide association data identifies four new susceptibility
loci for colorectal cancer. Nat Genet 40: 1426–1435.
4. Al Olama AA, Kote-Jarai Z, Giles GG, Guy M, Morrison J, et al. (2009)
Multiple loci on 8q24 associated with prostate cancer susceptibility. Nat Genet
41: 1058–1060.
5. Jaeger E, Webb E, Howarth K, Carvajal-Carmona L, Rowan A, et al. (2008)
Common genetic variants at the CRAC1 (HMPS) locus on chromosome
15q13.3 influence colorectal cancer risk. Nat Genet 40: 26–28.
6. Dickson S, Wang K, Krantz I, Hakonarson H, Goldstein D (2010) Rare variants
create synthetic genome-wide associations. PLoS Biol 8: e1000294. doi:10.1371/
journal.pbio.1000294.
7. Wray N, Purcell SM, Visscher PM (2011) Synthetic associations created by rare
variants do not explain most GWAS results. PLoS Biol 9: e1000579.
doi:10.1371/journal.pbio.1000579.
8. Broderick P, Carvajal-Carmona L, Pittman AM, Webb E, Howarth K, et al.
(2007) A genome-wide association study shows that common alleles of SMAD7
influence colorectal cancer risk. Nat Genet 39: 1315–1317.
9. Howe JR, Bair JL, Sayed MG, Anderson ME, Mitros FA, et al. (2001) Germline
mutations of the gene encoding bone morphogenetic protein receptor 1A in
juvenile polyposis. Nat Genet 28: 184–187.
10. Howe JR, Roth S, Ringold JC, Summers RW, Jarvinen HJ, et al. (1998)
Mutations in the SMAD4/DPC4 gene in juvenile polyposis. Science 280:
1086–1088.
11. Jaeger EE, Woodford-Richens KL, Lockett M, Rowan AJ, Sawyer EJ, et al.
(2003) An ancestral Ashkenazi haplotype at the HMPS/CRAC1 locus on 15q13-
q14 is associated with hereditary mixed polyposis syndrome. Am J Hum Genet
72: 1261–1267.
12. Midgley RS, McConkey CC, Johnstone EC, Dunn JA, Smith JL, et al. (2010)
Phase III randomized trial assessing rofecoxib in the adjuvant setting of
colorectal cancer: final results of the VICTOR trial. J Clin Oncol 28:
4575–4580.
13. Power C, Jefferis BJ, Manor O, Hertzman C (2006) The influence of birth
weight and socioeconomic position on cognitive development: Does the early
home and learning environment modify their effects? J Pediatr 148: 54–61.
14. Newcomb PA, Baron J, Cotterchio M, Gallinger S, Grove J, et al. (2007) Colon
Cancer Family Registry: an international resource for studies of the genetic
epidemiology of colon cancer. Cancer Epidemiol Biomarkers Prev 16:
2331–2343.
15. Tie J, Gibbs P, Lipton L, Christie M, Jorissen RN, et al. (2010) Optimizing
targeted therapeutic development: Analysis of a colorectal cancer patient
population with the BRAF(V600E) mutation. Int J Cancer.
16. Duffy DL, Iles MM, Glass D, Zhu G, Barrett JH, et al. (2010) IRF4 variants
have age-specific effects on nevus count and predispose to melanoma. Am J Hum
Genet 87: 6–16.
17. Baird PN, Schache M, Dirani M (2010) The GEnes in Myopia (GEM) study in
understanding the aetiology of refractive errors. Prog Retin Eye Res 29:
520–542.
18. Adams R, Meade A, Wasan H, Griffiths G, Maughan T (2008) Cetuximab
therapy in first-line metastatic colorectal cancer and intermittent palliative
chemotherapy: review of the COIN trial. Expert Rev Anticancer Ther 8:
1237–1245.
19. Abuli A, Bessa X, Gonzalez JR, Ruiz-Ponte C, Caceres A, et al. (2010)
Susceptibility genetic variants associated with colorectal cancer risk correlate
with cancer phenotype. Gastroenterology 139: 788–796, 796 e781–786.
20. Petitti DB (1994) Coronary heart disease and estrogen replacement therapy. Can
compliance bias explain the results of observational studies? Ann Epidemiol 4:
115–118.
21. Higgins JP, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis.
Stat Med 21: 1539–1558.
22. Houlston RS, Ford D (1996) Genetics of coeliac disease. QJM 89: 737–743.
23. Johns LE, Houlston RS (2001) A systematic review and meta-analysis of familial
colorectal cancer risk. Am J Gastroenterol 96: 2992–3003.
Multiple BMP SNPs Associate with Colorectal Cancer
PLoS Genetics | www.plosgenetics.org11 June 2011 | Volume 7 | Issue 6 | e1002105