Refinement of the associations between risk of colorectal cancer and polymorphisms on chromosomes 1q41 and 12q13.13.
Sarah L Spain, Luis G Carvajal-Carmona, Kimberley M Howarth, Angela M Jones, Zhan Su, Jean-Baptiste Cazier, Jennet Williams, Lauri A Aaltonen, Paul Pharoah, David J Kerr, Jeremy Cheadle, Li Li, Graham Casey, Pavel Vodicka, Oliver Sieber, Lara Lipton, Peter Gibbs, Nicholas G Martin, Grant W Montgomery, Joanne Young, Paul N Baird, Hans Morreau, Tom van Wezel, Clara Ruiz-Ponte, Ceres Fernandez-Rozadilla, Angel Carracedo, Antoni Castells, Sergi Castellvi-Bel, Malcolm Dunlop, Richard S Houlston, Ian P M Tomlinson
ABSTRACT In genome-wide association studies (GWASs) of colorectal cancer, we have identified two genomic regions in which pairs of tagging-single nucleotide polymorphisms (tagSNPs) are associated with disease; these comprise chromosomes 1q41 (rs6691170, rs6687758) and 12q13.13 (rs7163702, rs11169552). We investigated these regions further, aiming to determine whether they contain more than one independent association signal and/or to identify the SNPs most strongly associated with disease. Genotyping of additional sample sets at the original tagSNPs showed that, for both regions, the two tagSNPs were unlikely to identify a single haplotype on which the functional variation lay. Conversely, one of the pair of SNPs did not fully capture the association signal in each region. We therefore undertook more detailed analyses, using imputation, logistic regression, genealogical analysis using the GENECLUSTER program and haplotype analysis. In the 1q41 region, the SNP rs11118883 emerged as a strong candidate based on all these analyses, sufficient to account for the signals at both rs6691170 and rs6687758. rs11118883 lies within a region with strong evidence of transcriptional regulatory activity and has been associated with expression of PDGFRB mRNA. For 12q13.13, a complex situation was found: SNP rs7972465 showed stronger association than either rs11169552 or rs7136702, and GENECLUSTER found no good evidence for a two-SNP model. However, logistic regression and haplotype analyses supported a two-SNP model, in which a signal at the SNP rs706793 was added to that at rs11169552. Post-GWAS fine-mapping studies are challenging, but the use of multiple tools can assist in identifying candidate functional variants in at least some cases.
- Citations (11)
-
Cited In (0)
-
Article: Searching for the missing heritability of complex diseases.
Human Mutation 02/2011; 32(2):259-62. · 5.69 Impact Factor -
Article: MTHFR and MTRR genotype and haplotype analysis and colorectal cancer susceptibility in a case-control study from the Czech Republic.
Barbara Pardini, Rajiv Kumar, Alessio Naccarati, Rashmi B Prasad, Asta Forsti, Veronika Polakova, Ludmila Vodickova, Jan Novotny, Kari Hemminki, Pavel Vodicka[show abstract] [hide abstract]
ABSTRACT: Polymorphic variants in genes involved in one-carbon metabolism, in particular of dietary folate, may modulate the risk for colorectal cancer through aberrant DNA-methylation and altered nucleotide synthesis and repair. In the present study, we have assessed the association of six polymorphisms and relative haplotypes in the MTHFR gene (rs1801133 and rs1801131) and in the MTRR gene (rs1801394, rs1532268, rs162036, and rs10380) with the risk for colorectal cancer in 666 patients and 1377 controls from the Czech Republic. We found that the 677 C>T polymorphism in the MTHFR gene significantly decreased the risk for colorectal cancer in homozygous carriers of the variant allele (OR, 0.58; 95% CI, 0.39-0.87). Also, we noted a significantly different distribution of genotypes between cases and controls for the 66A>G polymorphism in the MTRR gene. In particular, homozygous carriers of the G-containing allele of this polymorphism were at an increased risk for colorectal cancer (OR, 1.39; 95% CI, 1.04-1.85). Haplotype analysis of the two MTHFR polymorphisms showed a moderate difference in the distribution of the TA haplotype between cases and controls. In comparison to the most common haplotype (CA), the TA haplotype was associated with a decreased risk for colorectal cancer (OR, 0.84; 95% CI, 0.71-0.99). No difference in the distribution between cases and controls was observed for the haplotypes based on the four polymorphisms in the MTRR gene. The present study suggests that the 677TT genotype and the TA haplotype in the MTHFR gene may also have a role in colorectal cancer risk in the Czech population, indicating the importance of genes involved in folate metabolism with respect to cancer risk. For MTRR, additional studies on larger populations are needed to clarify the possible role of variation in this gene in colorectal carcinogenesis.Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 03/2011; 721(1):74-80. · 2.85 Impact Factor -
SourceAvailable from: PubMed Central
Article: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.
[show abstract] [hide abstract]
ABSTRACT: Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.PLoS Genetics 07/2009; 5(6):e1000529. · 8.69 Impact Factor
Page 1
Refinement of the associations between risk
of colorectal cancer and polymorphisms
on chromosomes 1q41 and 12q13.13
Sarah L. Spain1,2, Luis G. Carvajal-Carmona1, Kimberley M. Howarth1, Angela M. Jones1,
Zhan Su3, Jean-Baptiste Cazier4, Jennet Williams1, Lauri A. Aaltonen5, Paul Pharoah6,
David J. Kerr7, Jeremy Cheadle8, Li Li9, Graham Casey10, Pavel Vodicka11, Oliver Sieber12,
Lara Lipton12, Peter Gibbs12, Nicholas G. Martin13, Grant W. Montgomery13, Joanne Young14,
Paul N. Baird15, Hans Morreau16, Tom van Wezel16, Clara Ruiz-Ponte17, Ceres Fernandez-
Rozadilla17, Angel Carracedo17, Antoni Castells18, Sergi Castellvi-Bel18, Malcolm Dunlop19,
Richard S. Houlston20and Ian P.M. Tomlinson1,∗
1Nuffield Department of Clinical Medicine,3Department of Statistics and4Bioinformatics, Wellcome Trust Centre for
Human Genetics, University of Oxford, Oxford OX3 7BN, UK,2Division of Genetics and Molecular Medicine, King’s
College London, Guy’s Hospital, London SE1 9RT, UK,5Department of Medical Genetics, Genome-Scale Biology
Research Program, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland,6Cancer Research UK
Laboratories, Strangeways Research Laboratory, Department of Oncology, University of Cambridge, Cambridge CB1
8RN, UK,7Department of Clinical Pharmacology, Oxford University, Old Road Campus Research Building, Oxford
OX3 7DQ, UK,8Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN,
UK,9Department of Family Medicine-Research Division, Case Western Reserve University, 11001 Cedar Avenue,
Cleveland, OH 44106-7136, USA,10Department of Preventive Medicine, University of Southern California, Los
Angeles, CA, USA,11Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Academy of
Science of Czech Republic, Prague 14220, Czech Republic,12Ludwig Colon Cancer Initiative Laboratory, Ludwig
Institute for Cancer Research, Royal Melbourne Hospital, Parkville, Victoria, Australia,13Genetic and Molecular
Epidemiology Laboratories and14Familial Cancer Laboratory, Queensland Institute of Medical Research, Herston
Q4006, Australia,15Centre for Eye Research Australia, University of Melbourne, 32 Gisborne Street, East Melbourne,
VIC 3002, Australia,16Department of Pathology, Leiden University Medical Centre, Leiden, The Netherlands,
17Genomic Medicine Group, Fundacion Publica Galega de Medicina Xenomica, Spanish National Genotyping Center
(CeGen)-USC, Centro de Investigacion Biomedica en Red de Enfermedades Raras, Hospital Clinico, Santiago de
Compostela, Galicia, Spain,18Department of Gastroenterology, Hospital Clinic, CIBERehd, IDIBAPS, University of
Barcelona, Barcelona, Catalonia, Spain,19Colon Cancer Genetics Group, Institute of Genetics and Molecular
Medicine, University of Edinburgh and MRC Human Genetics Unit, Edinburgh EH4 2XU, UK and20Section of Cancer
Genetics, Institute of Cancer Research, Sutton SM2 5NG, UK
Received June 16, 2011; Revised October 26, 2011; Accepted November 7, 2011
In genome-wide association studies (GWASs) of colorectal cancer, we have identified two genomic regions in
which pairs of tagging-single nucleotide polymorphisms (tagSNPs) are associated with disease; these com-
prise chromosomes 1q41 (rs6691170, rs6687758) and 12q13.13 (rs7163702, rs11169552). We investigated
these regions further, aiming to determine whether they contain more than one independent association
∗To whom correspondence should be addressed. Tel: +44 1865287500; Fax: +44 1865287501; Email iant@well.ox.ac.uk
# The Author 2011. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is prop-
erly cited.
Human Molecular Genetics, 2012, Vol. 21, No. 4
doi:10.1093/hmg/ddr523
Advance Access published on November 10, 2011
934–946
Page 2
signal and/or to identify the SNPs most strongly associated with disease. Genotyping of additional sample
sets at the original tagSNPs showed that, for both regions, the two tagSNPs were unlikely to identify a
single haplotype on which the functional variation lay. Conversely, one of the pair of SNPs did not fully cap-
ture the association signal in each region. We therefore undertook more detailed analyses, using imputation,
logistic regression, genealogical analysis using the GENECLUSTER program and haplotype analysis. In the
1q41 region, the SNP rs11118883 emerged as a strong candidate based on all these analyses, sufficient to
account for the signals at both rs6691170 and rs6687758. rs11118883 lies within a region with strong evidence
of transcriptional regulatory activity and has been associated with expression of PDGFRB mRNA. For
12q13.13, a complex situation was found: SNP rs7972465 showed stronger association than either
rs11169552 or rs7136702, and GENECLUSTER found no good evidence for a two-SNP model. However, logis-
tic regression and haplotype analyses supported a two-SNP model, in which a signal at the SNP rs706793 was
added to that at rs11169552. Post-GWAS fine-mapping studies are challenging, but the use of multiple tools
can assist in identifying candidate functional variants in at least some cases.
INTRODUCTION
Using genome-wide association studies (GWASs), we have
identified14regionsthatcontaintaggingsinglenucleotidepoly-
morphisms (tagSNPs) associated with the risk of colorectal
cancer(CRC)(1).Withinthreeoftheseregions—chromosomes
14q22.2, 15q13.3 and 20p12.3—we have shown that there exist
two SNPs that are independently associated with disease (2). In
two further regions—chromosomes 1q41 and 12q13.13—there
are two SNPs associated with CRC risk, but from the original
GWA analysis, it was unclear as to whether these represented
independent signals of association (1). At 1q41, these SNPs
are rs6691170 (chr1: 220,112,069 bases) and rs6687758 (chr1:
220,231,571); they are in modest pairwise linkage disequilib-
rium (LD) (r2¼ 0.22; D′¼ 0.71). At 12q13.13, the two
SNPs are rs7136702 (chr12: 49,166,483) and rs11169552
(chr12: 49,441,930); these SNPs too are moderately correlated
(r2¼ 0.11, D′¼ 0.76). Our previous analyses had not resolved
the issue of whether there could be more than one independent
CRC SNP in either of these regions (1).
One of the aims of GWASs is the discovery of functional/
causal variants, the effects of which are manifest in the
tagSNP associations. It is, however, very challenging to
proceed from a tagSNP association to identifying functional
variants, and relatively few such studies have been reported to
date. One reason for this is that the correlation matrix
between tagSNP(s) and functional variant(s) at any locus may
be complex. If two association signals occur at tagSNPs at
the same locus, the possible causes include the following:
(i) the associated tagSNPs are in LD;
(ii) there are two independent functional sites, each in LD
with one tagSNP;
(iii) there are two functional sites, but there is true epistasis;
(iv) there is a single functional site on a haplotype defined by
the two tagSNPs;
(v) there are .2 independent functional sites in LD with one
or more tagSNPs;
(vi) there is a mixture of the above possibilities.
It can be extremely hard to distinguish among these possi-
bilities and our inability to de-convolute association signals
may help explain why so much of the heritability of
complex diseases is unexplained by GWASs to date (3).
Despite these problems, functional variant discovery may be
aided by a deeper examination of genetic variation in the
LD blocks in which the tagSNPs reside. Such discovery is
likely to benefit from efforts such as the 1000 Genomes
Project, where a comprehensive discovery of novel variants
has been carried out in several populations.
In this study, we had three aims. First, we wished to inves-
tigate as fully as possible whether there was likely to be one or
more than one functional variant underlying the association
signals at 1q41 and 12q13.13 in CRCs. Secondly, we wanted
to investigate other tagSNPs in these regions for evidence of
further, independent association signals. Thirdly, we wished
to use imputation and functional annotation to refine the
most likely location of the ‘disease-causing’ variant in both
the 1q41 and 12q13.13 regions.
RESULTS
The 1q41 region
We genotyped a total of 48 174 samples (22 832 cases and 25
892 controls) from 17 sample sets at rs6691170 and
rs6687758. This analysis included five replication case/
control cohorts that were not previously reported (1) for
these SNPs: Kentucky; Prague; EPICOLON; Leiden; and
Australia. After meta-analysis in STATA, both rs6691170
and rs6687758 were, as expected, significantly associated
with CRC risk (Table 1), with no evidence of heterogeneity
among studies. Incorporating both SNPs into an unconditional
logistic regression model showed that neither of the pair of
SNPs fully captured the association signal in the region
[odds ratio (OR) ¼ 1.06, P ¼ 1.06 × 1024for rs6691170
and OR ¼ 1.07, P ¼ 2.48 × 1024for rs6687758]. We used
PLINK to examine the possibility that the two tagSNPs indi-
cated a single high-risk haplotype on which an unknown func-
tional SNP was present (that is, all the functional risk alleles
resided on a haplotype composed solely of one of the four pos-
sible pairs of tagSNP alleles). However, the association signal
was not simply present on the high-risk haplotype TG (for
rs6991170|rs6687758). Instead, the risks for the ‘compound’
(high-low or low-high) haplotypes—GG and TA—were
Human Molecular Genetics, 2012, Vol. 21, No. 4935
Page 3
Table 1. Summary of genotyping and association results at the original four tagSNPs on 1q41 and 12q13.13 in the extended data sets
SummarySeriesCa11Ca12Ca22 Co11 Co12Co22 Ca1Ca2Co1 Co2MAF
ca
MAF
co
OR Ntot Nca Nco
rs6691170; chr1:
220,112,069; z ¼ 6.87;
P ¼ 6.42 × 10212;
OR ¼ 1.10; 95%
CI ¼ 1.07–1.12;
Phet ¼ 0.39;
I2¼ 5.9%; allele
1 ¼ T; allele 2 ¼ G
UK1/CORGI
Scotland1/COGS
UK2/NSCCG
Scotland2/SOCCS
VQ58
CFR
UK3/NSCCG
Scotland3/SOCCS
UK4/CORGI2BCD
Cambridge
COIN/NBS
Helsinki
Prague
Kentucky
EPICOLON
Australia
Leiden
130
134
398
248
277
149
406
103
70
324
300
143
147
156
193
64
141
435
463
1395
941
833
581
1448
376
212
1068
1054
435
424
466
613
223
404
355
379
1058
817
688
447
1137
326
213
805
797
351
363
388
520
129
310
100
130
355
239
359
155
367
117
129
280
326
105
82
244
184
58
92
429
433
1304
967
1234
436
1251
447
473
1013
1170
372
317
709
632
212
291
393
435
1159
851
1096
399
1198
363
445
890
1005
340
252
630
578
168
304
695
731
2191
1437
1387
879
2260
582
352
1716
1654
721
718
778
999
351
686
1145
1221
3511
2575
2209
1475
3722
1028
638
2678
2648
1137
1150
1242
1653
481
1024
629
693
2014
1445
1952
746
1985
681
731
1573
1822
582
481
1197
1000
328
475
1215
1303
3622
2669
3426
1234
3647
1173
1363
2793
3180
1052
821
1969
1788
548
899
0.38
0.37
0.38
0.36
0.39
0.37
0.38
0.36
0.36
0.39
0.38
0.39
0.38
0.39
0.38
0.42
0.40
0.34
0.35
0.36
0.35
0.36
0.38
0.35
0.37
0.35
0.36
0.36
0.36
0.37
0.38
0.36
0.37
0.35
1.172
1.126
1.122
1.031
1.102
0.986
1.116
0.975
1.029
1.138
1.090
1.146
1.066
1.030
1.081
1.219
1.268
1215
1303
3622
2669
3426
1234
3647
1173
1363
2793
3180
1052
821
1969
1788
548
899
920
976
2851
2006
1798
1177
2991
805
495
2197
2151
929
934
1010
1326
416
855
922
998
2818
2057
2689
990
2816
927
1047
2183
2501
817
651
1583
1394
438
687
rs6687758; chr1:
220,231,571; z ¼ 5.64;
P ¼ 1.70 × 1028;
OR ¼ 1.09; 95%
CI ¼ 1.06–1.13;
Phet ¼ 0.28;
I2¼ 14.8%; allele
1 ¼ G; allele 2 ¼ A
UK1/CORGI
Scotland1/COGS
UK2/NSCCG
Scotland2/SOCCS
VQ58
CFR
UK3/NSCCG
Scotland3/SOCCS
UK4/CORGI2BCD
Cambridge
COIN/NBS
Helsinki
Prague
Kentucky
EPICOLON
Australia
Leiden
37
63
312
308
985
694
605
364
947
263
158
755
701
385
335
312
429
151
284
568
606
1746
1235
1106
756
1920
519
306
1366
1330
476
552
657
840
264
521
32
34
98
74
113
51
115
51
45
76
89
49
33
57
46
17
28
299
325
898
639
832
327
850
315
309
664
770
317
230
509
442
152
212
598
642
1822
1344
1742
607
1861
566
669
1444
1642
437
388
1017
906
269
448
386
434
1227
848
777
464
1191
359
206
933
905
519
429
394
543
201
374
1448
1520
4477
3164
2817
1876
4787
1301
770
3487
3361
1337
1439
1626
2109
679
1326
363
393
1094
787
1058
429
1080
417
399
816
948
415
296
623
534
186
268
1495
1609
4542
3327
4316
1541
4572
1447
1647
3552
4054
1191
1006
2543
2254
690
1108
0.21
0.22
0.22
0.21
0.22
0.20
0.20
0.22
0.21
0.21
0.21
0.28
0.23
0.20
0.20
0.23
0.22
0.20
0.20
0.19
0.19
0.20
0.22
0.19
0.22
0.20
0.19
0.19
0.26
0.23
0.20
0.19
0.21
0.19
1.098
1.169
1.138
1.133
1.125
0.888
1.053
0.958
1.104
1.165
1.151
1.114
1.013
0.989
1.087
1.098
1.166
1495
1609
4542
3327
4316
1541
4572
1447
1647
3552
4054
1191
1006
2543
2254
690
1108
917
977
2852
2006
1797
1170
2989
830
488
2210
2133
928
934
1010
1326
440
850
929
1001
2818
2057
2687
985
2826
932
1023
2184
2501
803
651
1583
1394
438
688
121
77
86
50
122
48
24
89
102
67
47
41
57
25
45
rs7136702; chr12:
49,166,483; z ¼ 6.69;
P ¼ 2.23 × 10211;
OR ¼ 1.10; 95%
CI ¼ 1.07–1.12;
Phet ¼ 0.51;
I2¼ 0.0%; allele
1 ¼ T; allele 2 ¼ C
UK1/CORGI
Scotland1/COGS
UK2/NSCCG
Scotland2/SOCCS
VQ58
CFR
UK3/NSCCG
Scotland3/SOCCS
UK4/CORGI2BCD
Cambridge
COIN/NBS
Helsinki
Prague
Kentucky
EPICOLON
Leiden
131
146
380
276
237
155
402
118
81
332
287
103
85
140
198
115
433
443
1331
975
869
604
1388
310
215
955
893
389
419
478
642
388
357
388
1140
755
694
427
1190
270
190
903
844
436
430
392
486
341
113
126
329
275
295
103
359
122
151
261
321
72
57
215
187
92
430
444
1306
935
1290
444
1283
401
466
1015
1121
334
291
750
623
269
386
431
1183
847
1102
450
1180
356
444
906
1059
414
303
618
584
321
695
735
2091
1527
1343
914
2192
546
377
1619
1467
595
589
758
1038
618
1147
1219
3611
2485
2257
1458
3768
850
595
2761
2581
1261
1279
1262
1614
1070
656
696
1964
1485
1880
650
2001
645
768
1537
1763
478
405
1180
997
453
1202
1306
3672
2629
3494
1344
3643
1113
1354
2827
3239
1162
897
1986
1791
911
0.38
0.38
0.37
0.38
0.37
0.39
0.37
0.39
0.39
0.37
0.36
0.32
0.32
0.38
0.39
0.37
0.35
0.35
0.35
0.36
0.35
0.33
0.35
0.37
0.36
0.35
0.35
0.29
0.31
0.37
0.36
0.33
1.110
1.131
1.083
1.088
1.106
1.296
1.059
1.108
1.117
1.079
1.044
1.147
1.020
1.011
1.155
1.162
1202
1306
3672
2629
3494
1344
3643
1113
1354
2827
3239
1162
897
1986
1791
911
921
977
2851
2006
1800
1186
2980
698
486
2190
2024
928
934
1010
1326
844
929
1001
2818
2057
2687
997
2822
879
1061
2182
2501
820
651
1583
1394
682
rs11169552; chr12:
49,441,930; z ¼ 6.88;
P ¼ 5.99 × 10212;
OR ¼ 0.90; 95%
CI ¼ 0.88–0.93;
Phet ¼ 0.50;
I2¼ 0.0%; allele
1 ¼ T; allele 2 ¼ C
UK1/CORGI
Scotland1/COGS
UK2/NSCCG
Scotland2/SOCCS
VQ58
CFR
UK3/NSCCG
Scotland3/SOCCS
UK4/CORGI2BCD
Cambridge
COIN/NBS
Helsinki
Prague
Kentucky
EPICOLON
Leiden
56
60
328
369
1062
808
665
450
1179
127
175
824
818
407
375
377
453
304
537
544
1580
1087
1026
663
1625
176
277
1241
1107
401
508
575
817
492
67
76
350
406
1124
821
1046
408
1142
321
395
853
973
356
273
665
471
251
512
519
1494
1084
1442
516
1463
490
554
1172
1338
303
340
825
848
378
440
489
1480
1030
883
594
1513
155
243
1134
1088
613
477
493
565
410
1402
1457
4222
2982
2717
1776
4429
479
729
3306
3032
1209
1391
1527
2087
1288
484
558
1522
1125
1448
554
1570
481
555
1179
1351
662
349
851
621
365
1374
1444
4112
2989
3930
1440
4068
1301
1503
3197
3649
962
953
2315
2167
1007
0.24
0.25
0.26
0.26
0.25
0.25
0.25
0.24
0.25
0.26
0.26
0.34
0.26
0.24
0.21
0.24
0.26
0.28
0.27
0.27
0.27
0.28
0.28
0.27
0.27
0.27
0.27
0.41
0.27
0.27
0.22
0.27
0.891
0.869
0.947
0.918
0.882
0.869
0.885
0.875
0.903
0.930
0.969
0.737
0.936
0.878
0.945
0.878
1374
1444
4112
2989
3930
1440
4068
1301
1503
3197
3649
962
953
2315
2167
1007
921
973
2851
2006
1800
1185
2971
317
486
2220
2060
911
934
1010
1326
849
929
1001
2817
2057
2689
997
2819
891
1029
2188
2500
812
651
1583
1394
686
209
111
109
72
167
14
34
155
135
103
51
58
56
53
199
152
201
73
214
80
80
163
189
153
38
93
75
57
Ca, cases; Co, controls; 11, rare homozygote; 12, heterozygote; 22, common homozygote; 1, minor allele; 2, major allele. Allele 1 is risk allele for rs6691170, rs6687758 and
rs7136702; allele 2 is risk allele for rs11169552. MAF, minor allele frequency; OR, odds ratio.
936Human Molecular Genetics, 2012, Vol. 21, No. 4
Page 4
greater than those for the low-low haplotype (GA), inconsist-
ent with a functional SNP being in complete LD with a haplo-
type indicated by the pair of tagSNPs (Supplementary
Material, Table S1). We also tested for evidence of epistasis
between rs6691170 and rs6687758 using case–control logistic
regression analysis, incorporating interaction between SNPs as
a variable, but no evidence of deviation from log-additive SNP
effects was found (P ¼ 0.292).
Having failed to find evidence for the simplest situations—
namely that one of each tagSNP pair captured the great major-
ity of the association signal or that the tagSNPs essentially
acted as simple two-locus tags for the functional variants in
each region—we attempted to deconvolute the 1q41 signal
by association testing of imputed SNPs in the region. The
three GWAS sample sets, UK1, Scotland 1 and VQ58, were
imputed to the combined 1000 genomes and HapMap3 refer-
ence set. A total of 630 SNPs in the 220–221 Mb region on
chromosome 1q41 was successfully imputed from 76 geno-
typed SNPs. The strongest association signal (Fig. 1, Supple-
mentary Material, Table S2), as measured by association test
P-value, was at rs11118883 (chr1: 220,127,645), an imputed
SNP in moderate LD with rs6691170 (r2¼ 0.40, D′¼ 0.74)
and rs6687758 (r2¼ 0.31, D′¼ 0.77).
We then used reverse stepwise logistic regression analysis
to determine whether rs6691170 and rs6687758, or other com-
binations of SNPs, best accounted for the association between
CRC and 1q41 variation. Using a final significance threshold
of P ¼ 0.01, we found that two imputed SNPs, rs11118883
and rs12726661, were most strongly associated with the
CRC risk (Table 2, Supplementary Material, Table S2). By
comparison, a joint analysis of rs6687758 and rs6691170 in
the same three GWAS data sets gave much weaker evidence
of association, as assessed using the Akaike Information Cri-
terion (AIC). Indeed, a model incorporating rs11118883
alone—although not one with rs12726661 alone—provided a
better fit than a model incorporating both rs6687758 and
rs6691170; haplotype-based association analysis supported
these findings (data not shown).
We were surprised to note that in a single-SNP analysis the
direction of effect for rs12726661 was reversed—the minor
Figure 1. Individual SNP associations in the 1q41 region. Association testing was performed in SNPtest using typed and imputed genotypes from the three
GWAS series (UK1, Scotland1 and VQ58) and displayed using SNAP (http://www.broadinstitute.org/mpg/snap/). The X-axis shows position on chromosome
1 and the Y-axis, 2log10(P) from the per allele association test. The most strongly associated SNP, rs11118883, is shown as a large diamond, and the colours of
other data points reflect the LD between that SNP and rs11118883. The smaller diamond points indicate genotyped SNPs and the triangles indicate imputed
SNPs. The blue line represents recombination rates.
Table 2. Two-SNP logistic regression analysis showing best signal in the 1q41 region in comparison with the originally reported SNPs
SNPs Positions (bases)LD (r2, D′)Risk allele (freqcases, freqcontrols) No. cases, no. controls OR95% CI
ZP-valueAIC
rs6691170
rs6687758
220,112,069
220,231,571
0.15,
0.65
T (0.38,0.36)
G (0.22,0.20)
3272,
4572
1.09
1.08
1.01–1.17
0.98–1.18
2.95
1.61
0.0032
0.108
10486
rs11118883
rs12726661
220,127,645
220,134,411
0.92,
1.00
A (0.32,0.29)
A (0.68,0.71)
3206,
4452
2.32
0.49
1.49–3.62
0.32–0.76
3.71
3.16
2.07 × 1024
1.58 × 1023
10475
LD is shown for the pair of SNPs being tested. OR, odds ratio; AIC, Akaike information criterion (AIC ¼ 22∗log-likelihood + 2∗(numberof parameters)).Note
the lower AIC, showing a better model fit, for the test of rs11118883 + rs12726661 compared with rs6691170 + rs6687758. Individual AICs for these four SNPs
were, respectively, 10487, 10489, 10484 and 10487.
Human Molecular Genetics, 2012, Vol. 21, No. 4937
Page 5
allele was associated with disease risk—compared with that in
the two-SNP analysis. We determined that rs11118883 and
rs12726661 were in strong LD (r2¼ 0.98, D′¼ 1.00) in our
samples, consistent with data from the 1000 genomes project
and HapMap3 that had been used for imputation. Examination
of the genotype distribution in our data set showed that devi-
ation from perfect LD between the SNPs resulted from two
sets of individuals: (i) 50 homozygous for the major allele at
rs12726661 and heterozygous at rs11118883; and (ii) 15 het-
erozygous at rs12726661 and homozygous for the minor
allele at rs11118883. Specifically, 28/50 in category (i) were
cases and 4/11 in category (ii) were cases. For these 65 indi-
viduals, the risk of CRC was significantly greater than that
of individuals with the other genotypes at rs12726661 and
rs11118883 (OR ¼ 2.10, P ¼ 0.003, x2
explanation for our apparently paradoxical findings is that
thereexists anotherallele,
rare, that is associated with the minor allele of rs12726661
(but not with rs11118883), and that is protective against the
CRC risk.
We then analysed our UK1, Scotland 1 and VQ58 indivi-
duals using GENECLUSTER with the original GWAS SNP
genotypes in the rs6691170/6687758 region as inputs. There
was no evidence to favour an underlying two-locus model
over a one-SNP model (Fig. 2). The predicted most strongly
associated single SNP was rs11577023, a SNP that is in
very strong LD with rs11118883 (r2¼ 0.93, D′¼ 1.0) in
our data.
We genotyped rs11118883 directly in a set of 84 UK control
samplesandfoundcompleteconcordancewiththeimputedgen-
otypes. rs11118883 (chr1:220,127,645) lies in a gene desert,
within a region of LD that extends approximately from 220.0
to 220.3 Mb. The nearest gene, ?150 kb towards the centro-
mere, is the MAP kinase regulator dual-specificity phosphatase
10 (DUSP10). DUSP10 inactivates p38 and also the Jun
N-terminal kinase that phosphorylates c-Jun which is believed
to play a role in CRC pathogenesis. rs11118883 itself lies
upstream of DUSP10 within a region with strong evidence of
transcriptional regulatory activity (http://genome.ucsc.edu).
Using 1000 genomes data, we found that rs11118883 is in
strong LD (r2. 0.7) with at least six SNPs (rs12738322,
rs12726661,rs4129271, rs11577023,
rs12137702). Of these, rs10746414 and rs12137702 are also
close to regions with potential effects on transcription.
1 test). A potential
almostcertainly relatively
rs10746414 and
The 12q13.13 region
Analysis of the 12q13.13 region proceeded in parallel with
that of the 1q41 region using essentially the same strategy.
We initially confirmed the individual associations of SNPs
rs7136702 and rs11169552 with the CRC risk in the extended
data sets (Table 1). Unconditional logistic regression analysis
did not exclude the possibility that the two SNPs had
independent effects; for rs7136702 and rs11169552, the
association statistics were P ¼ 1.63 × 1025(OR ¼ 1.07) and
P ¼ 1.70 × 1027(OR ¼ 0.92), respectively, showing that
one SNP did not simply capture all of the association
signals. Further analysis showed that the association signal
was not derived from a single high-risk haplotype tagged
by rs7136702 and rs11169552 (Supplementary Material,
Table S3) and there was no evidence of epistasis between
the SNPs (P ¼ 0.903).
We imputed SNPs within the 48.5–50 Mb region of
chromosome 12 using the combined 1000 genomes and
HapMap3 reference panel in the 3 GWAS sample sets
(UK1, Scotland 1 and VQ58) (Fig. 3, Supplementary Material,
Table S4). A total of 2736 SNPs was successfully imputed
from 158 genotyped SNPs. The most significant single-SNP
association was at the imputed SNP rs7972465 [OR ¼ 1.18,
95% confidence interval (CI) 1.11–1.27, P ¼ 8.22 × 1027),
a signal slightly stronger than that of rs11169552 (OR ¼
0.85, 95% CI 0.79–0.91, P ¼ 1.08 × 1025) and notably stron-
ger than that of rs7136702 (OR ¼ 1.13, 95% CI 2.06–1.21,
P ¼ 3.85 × 1024). Direct genotyping in 91 UK control indivi-
duals showed that imputation of rs7972465 was very good,
although not perfect (r2¼ 0.93).
Reverse stepwise logistic regression analysis was then used
to assess whether rs11169552 and rs7136702, or other combi-
nations of SNPs in the region, best accounted for the associ-
ation between CRC and 12q13.13 variation (Table 3). Many
highly correlated SNPs exist within the region, making this
analysis difficult. Nonetheless, while rs11169552 remained
in the regression model after stepwise elimination of less
strongly associated SNPs, a number of SNPs provided
improved or similar associations compared with rs7136702
in a two-SNP model with rs11169552. One of these SNPs
was rs7972465 (Table 3, Fig. 4), but another SNP, rs706793,
a SNP in very low LD with rs11169552 (Table 3), provided
a larger improvement in the AIC (see also Supplementary
Material, Table S4).
We then undertook GENECLUSTER analysis of the UK1,
Scotland 1 and VQ58 sample sets in the 12q13.13 region.
There was no good evidence to distinguish between under-
lying two-locus and one-locus models (Fig. 5), although the
association signal showed two peaks at ?48.85 Mb (close to
rs706793) and at ?49.45 Mb (very close to rs11169552)
that could not readily be explained by long-range LD
betweenthese two regions
Fig. S1). The predicted most strongly associated SNP under
the one-SNP model was rs3184122 (Supplementary Material,
Table S4), a variant that is in moderate or strong LD
(Fig. 5) with rs11169552 (r2¼ 0.19, D′¼ 0.92), rs7136702
(r2¼ 0.49, D′¼ 0.73) and rs706793 (r2¼ 0.47, D′¼ 0.94),
and strong LD with rs7972465 (r2¼ 0.87, D′¼ 1.00).
Since the various analyses had not resolved the question of
whether there exist one or two independent CRC-associated
SNPs in the 12q13.13 region, we used PLINK to examine
the associations with disease of the haplotypes (Fig. 4) for
rs706793, rs7972465 and rs11169552. As expected, the haplo-
type CGC was most strongly associated with risk (Table 4,
Supplementary Material, Table S5). The G (risk) allele at
rs7972465 was essentially present only on this haplotype,
but it appeared that haplotypes containing the T allele at
rs7972465 were not all low risk and therefore
rs7972465 did not explain all the association signal. We there-
fore considered the association signals when we fixed the
alleles at rs706793 and rs11169552 and allowed those at
rs7972465 to vary, and vice versa. Initially, we undertook
simple comparisons between haplotype frequencies in cases
and controls, and found that the rs706793 and rs11169552
(SupplementaryMaterial,
that
938 Human Molecular Genetics, 2012, Vol. 21, No. 4
Page 6
Figure 2. GENECLUSTER output for the 1q41 region. The upper left panel compares the Bayes factors (BFs) for models in which the association signals at
rs6691170 and 6687758 are derived from either one functional SNP (red) or two functional SNPs (green). Recombination rates are also shown as a red line. The
upper right panel shows the log10(BF) at the focal position—the site of the highest log10(BF), here chr1:220,129,000 bases—under one- and two-SNP models.
The lower right panel shows reconstructed genealogies for UK1, Scotland1 and VQ58 combined, based on each individual’s genotypes in the region from the
Illumina Hap300/370/550 panels and HapMap2 data. The most likely positions of SNP origins under the one-SNP model (blue, rs11577023) and two-SNP model
(green and red) are shown. These result in counts of cases and controls and relative risks as indicated in the upper right panel. The lower left panel shows hap-
lotypes (rows) and SNPs (columns). Note that the region analysed extends for several Mb flanking rs6691170 and 6687758; although no signal reaches nominal
significance at log10(BF) ¼ 4, there is some evidence of a second independent region of 1q associated with CRC at ?218.2 Mb, as we have reported previously.
The importance of rs11577023 was supported by the Margarita analysis in which it was the second most strongly associated with disease (P ¼ 3.59 × 1024).
Human Molecular Genetics, 2012, Vol. 21, No. 4939
Page 7
risk alleles, but not the rs7972465 risk allele, were found at
significantly higher frequencies in cases than controls (Supple-
mentary Material, Table S6). Since this analysis suggested that
theremight be independent
rs11169552—and that the signal at rs7972465 resulted from
LD with these two SNPs—we proceeded to a further evalu-
ation of this possibility using conditional haplotype analysis
in PLINK. We again compared two scenarios, (i) in which
the CGC and CTC haplotypes were equivalent (that is,
varying rs7972465) and (ii) in which the CTC and TTT hap-
lotypes were equivalent (that is, varying rs706793 and
rs11169552). No effect was seen in the first case (likelihood
ratio test, P ¼ 0.35), whereas there was a significant difference
in the second case (P ¼ 0.023), again supporting effects of
rs706793 and rs11169552 rather than rs7972465.
effectsof rs706793and
Furthergenotypinginadditionalsamplesetsstrengthenedthe
rs706793associationwithCRC,althoughitdidnotreachformal
significance and there was some evidence of inter-study hetero-
geneity, the origins of which remain unclear (Supplementary
Material,TableS7).Logisticregressionanalysisintheextended
samplesetcontinuedtosupportamodelincorporatingrs706793
and rs11169552 (P ¼ 8.38 × 1024
respectively,AIC ¼ 27932)
and rs11169552 (P ¼ 3.05 × 1025
AIC ¼ 27999).
rs706793(chr12:48,754,036)
(chr12:49,441,930) are separated by a predicted recombination
hotspot at ?48.8 Mb in the HapMap data (Fig. 3) but not in
our own data (Fig. 5), although LD in the region is complex
(Supplementary Material, Fig. S1). The 12q13.13 region
and P ¼ 7.82 × 1026,
one with
and P ¼ 9.05 × 1023,
over rs7163702
and rs11169552
Figure 3. Individual SNP associations in the 12q13.13 region. Legend is as for Figure 1.
Table 3. Two-SNP logistic regression analysis showing best signals in the 12q13.13 region in comparison with the original reported SNPs
SNPsGenotyped or
imputed?
Positions
(bases)
LD (r2, D′) Risk allele (freqcases, freqcontrols) No, cases,
no. controls
OR95% CI
zP-valueAIC
rs11169552
rs7136702
Genotyped
Genotyped
49,441,930
49,166,483
0.040
0.57
C (0.76, 0.73)
T (0.37, 0.35)
3276
4576
0.92
1.08
0.81–0.94
1.01–1.16
3.37
2.13
7.52 × 1024
0.033
10473
rs11169552
rs3184122
Genotyped
Imputed
49,441,930
48,856,394
0.19
0.92
C (0.76,0.73)
C (0.41,0.37)
3206
4480
0.89
1.12
0.82–0.97
1.04–1.21
2.69
2.96
0.007
0.003
10469
rs11169552
rs35031884
Genotyped
Imputed
49,441,930
49,063,840
0.06
1.00
C (0.76, 0.73)
A (0.32,0.29)
3268
4554
0.88
1.15
0.81–0.95
1.05–1.25
3.40
3.19
6.74 × 1024
0.0014
10468
rs11169552
rs7972465
Genotyped
Imputed
49,441,930
48,832,392
0.17
0.89
C (0.76,0.73)
G (0.21,0.18)
3268
4563
0.90
1.14
0.83–0.97
1.06–1.22
2.64
3.43
0.008
6.04 × 1024
5.12 × 1026
0.0012
10466
rs11169552
rs706793
Genotyped
Genotyped
49,441,930
48,754,036
0.002
0.095
C (0.76,0.73)
C (0.60,0.57)
3266
4557
0.84
0.49
0.78–0.91
0.32–0.76
4.56
3.23
10426
SNP pairs are shownin order of descendingAikake Information Criterion(AIC). Note that in single-SNPanalysis, rs706793 provided only slightly worse evidence
of association (OR ¼ 0.90, 95% CI 0.84–0.96, P ¼ 0.002) than in combined analysis with rs11169552. Individual AICs for rs11169552, rs7136702, rs3184122,
rs35031884, rs7972465 and rs706793 were, respectively, 10476, 10483, 10475, 10477, 10471 and 10447. Incorporation of rs7972465 into a regression model with
rs11169552 and rs706793 did not improve the model’s fit (AIC ¼ 10426).
940Human Molecular Genetics, 2012, Vol. 21, No. 4
Page 8
contains coding genes ACCN2, SMARCD1, GPD1, LASS5,
LIMA1 and ATF1. ACCN2 probably encodes an ion channel
protein, SMARCD1 is part of
complex SNF/SWI, GPD1 is glycerol-3-phosphate dehydro-
genase and LASS5 is probably a ceramide synthase. LIMA1
codes for EPLIN, a protein downregulated in some cancers.
ATF1 is a transcription factor centrally involved in the stress
response and in the pathogenesis of angiomatoid fibrous his-
tiocytoma and clear cell sarcoma through translocation. Sup-
plementary Material, Table S8 lists SNPs in strong LD (r2.
0.70) with rs706793, rs7972465 or rs11169552, and provides
annotation for those with evidence of potential roles in gene
or protein regulation or function.
chromatin remodelling
DISCUSSION
We have undertaken additional genotyping and more detailed
analysis in order to understand better the dual tagSNP associ-
ation signals that we observed on chromosomes 1q41
(rs6991170, rs6687758)and
rs7136702) in a GWAS of CRC (1). In both cases, genotyping
of additional sample series confirmed the originally reported
associations, without demonstrating good evidence for the
three simplest scenarios: independent functional variants;
capture of the association signal by one of the pair of SNPs;
or two-SNP tagging of a single haplotype on which functional
variation lay. We therefore proceeded to more detailed ana-
lyses in each region, after imputation of genotypes where
appropriate in the data sets with best coverage of each
region (UK1, Scotland1 and VQ58). It is conceivable that
the analysis of these three data sets, which had already been
used in SNP discovery, would introduce a small amount of
bias into the fine mapping. However, we reasoned that the
marginal differences in association that might occur would
be more than outweighed by the power provided by the use
of these data sets.
For 1q41, the single-SNP association test, logistic regres-
sion analysis and GENECLUSTER all found that SNP
rs11118883, or a SNP in strong LD, was most likely to be
12q13.13 (rs11169552,
responsible for the signal of association. This SNP itself is a
very good functional candidate, lying within or immediately
adjacent to regionsbearing
acetylation marks, DNAse I hypersensitive sites and sites of
transcription factor binding (http://genome.ucsc.edu/cgi-bin/
hgTrackUi?hgsid=195445293&c=chr1&g=wgEncodeReg).
The SCAN expression Quantitative Trait Locus (eQTL) data-
base (4) reports rs11118883 being associated (P ¼ 8 × 1025)
in Europeans with expression of platelet-derived growth
factor b (PDGFRB, chr5q31–q32), although this association
requires confirmation in appropriate cell types for the CRC
risk and is not present in the Genevar eQTL database
(http://www.sanger.ac.uk/resources/software/genevar)
The possibility that the minor allele of rs12726661 is asso-
ciated with a second, presumably rare, variant that is protect-
ive against the CRC risk is intriguing. While speculative, such
a scenario has precedents, such as the MDM2 promoter SNP
rs117039649 (6).
For 12q13.13, a complex situation was found. Single-SNP
analysis found variants with much stronger association
signals than either rs11169552 or rs7136702, notably at
rs7972465 although small imputation inaccuracies may have
inflated this signal. GENECLUSTER analysis found no
greater evidence for a two-SNP than one-SNP model and
detected the best signal for the former at a SNP, rs3184122,
that is in strong LD with rs7972465. Logistic regression
analysis, however, supported a two-SNP model, in which
a signal atrs706793
rs11169552. rs706793 and rs11169552 are in very weak LD,
but rs706793 is in moderate LD with rs7163702 (r2¼ 0.20,
D′¼ 0.60). Haplotype analysis supported the logistic regres-
sion analysis, in that the genotype at rs7972465 did not
affect the risk associated with the rs706793–rs11169552 hap-
lotypes, whereas the reverse scenario (high- versus low-risk
rs706793–rs11169552haplotypes)
regardseQTLs forthe
shows rs706793 to be associated with LASS5 expression
(at P , 1024), although this association is not reported in
SCAN.
histone methylationand
(5).
was addedtothatat
didaffect
SNPs,
risk.
Genevar
As
12q13.13
Figure 4. LD and main haplotypes at SNPs with best evidence of association on 12q13.13. Note that in this Haploview output from HapMap3 data, the alleles at
rs706793 are shown on the opposite strand (that is G/A rather than C/T as used in the rest of this manuscript).
Human Molecular Genetics, 2012, Vol. 21, No. 4 941
Page 9
Clearly, all post-GWAS fine-mapping studies face intrinsic
difficulties, such as the use of imputed genotypes, despite the
use of stringent criteria for SNP inclusion, and a limited ability
to differentiate among association signals of similar magni-
tudes. The analysis of the 12q13.13 region illustrates some
of these problems well. Although a much more strongly
Figure 5. GENECLUSTER output for the 12q13.13 region. The legend is as for Figure 2, except that the focal position is Chr12:48,849,000 and the double peak
of association at ?48.85 and 49.45 Mb should be noted. The top SNP (blue dot) under the one-SNP model is the imputed SNP rs3184122. The top-genotyped
SNP in the GENECLUSTER analysis was rs7138945, which was the SNP with the second-best association signal in Margarita (P ¼ 1.14 × 1025).
942 Human Molecular Genetics, 2012, Vol. 21, No. 4
Page 10
CRC-associated SNP than the original tagSNPs was identified
through imputation, the balance of evidence slightly favours
this signal resulting from two independent association
signals, as we have previously found for the GREM1 locus
(2). In the 1q41 region, in contrast, rs11118883—a SNP in
moderate LD with both the original tagSNPs—emerged as
an excellent candidate for the functional variant.
MATERIALS AND METHODS
Sample sets
The Kentucky samples comprised 1020 incident colon
cancer cases and 1598 population controls of white Euro-
pean origin recruited between July 2003 and December
2009. Eligible cases were identified through the population-
based Surveillance, Epidemiology and End Results (SEER)
Kentucky Cancer Registry covering all residents living in
the State of Kentucky at the time of diagnosis. We used
random digital dialling to recruit population controls who
were 40 years of age or older and had no personal history
of cancer other than skin cancer. We excluded those with
known inflammatory bowel diseases, family history of
familial adenomatouspolyposis
polyposis CRC.
The Prague cases (7) were patients with histologically con-
firmed CRC recruited between September 2004 and February
2009 from nine oncology departments in the Czech Republic:
Prague (two), Benesov, Brno, Liberec, Ples, Pribram, Usti nad
Labem and Zlin. During this period, a total of 1554 cases pro-
vided blood samples. This study includes 1001 subjects who
could be interviewed, provided biological samples and were
genotyped. Controls were 683 hospital-based volunteers with
negative colonoscopy results for malignancy or idiopathic
bowel diseases (CFCC, cancer-free colonoscopy inspected
controls). CFCCs were selected from among individuals
admitted to the same hospitals during the same period of the
recruitment of the cases. The reasons for undergoing the
colonoscopy were: (i) positive faecal occult blood test, (ii)
haemorrhoids, (iii) abdominal pain of unknown origin, or
(iv) macroscopic bleeding.
Details of other sample sets have been reported previously
(2) and are provided briefly below.
and hereditarynon-
UK1 (CORGI) comprised 922 cases with colorectal neopla-
sia (47% male) ascertained through the Colorectal Tumour
Gene Identification (CORGI) consortium. All had at least
one first-degree relative affected by CRC and one or more
of the following phenotypes: CRC at age 75 or less; any colo-
rectal adenoma (CRAd) at age 45 or less; ≥3 CRAds at age 75
or less; or a large (.1 cm diameter) or aggressive (villous
and/or severely dysplastic) adenoma at age 75 or less. The
929 controls (45% males, 55% females) were spouses or part-
ners unaffected by cancer and without a personal family
history (to second degree relative level) of colorectal neopla-
sia. Known dominant polyposis syndromes, HNPCC/Lynch
syndrome or bi-allelic MUTYH mutation carriers were
excluded.
Scotland1 (COGS) included 980 CRC cases (51% male;
mean age at diagnosis 49.6 years, SD+6.1) and 1002 cancer-
free population controls (51% male; mean age 51.0 years;
SD+5.9). Cases were for early age at onset (age ≤ 55
years). Known dominant polyposis syndromes, HNPCC/
Lynch syndrome or bi-allelic MUTYH mutation carriers were
excluded. Control subjects were sampled from the Scottish
population NHS registers, matched by age (+5 years),
gender and area of residence within Scotland.
VQ58 comprised 1832 CRC cases (1099 males, mean age of
diagnosis 62.5 years; SD+10.9) from the VICTOR and
QUASAR2(www.octo-oxford.org.uk/alltrials/trials/q2.html)
clinical trials of adjuvant therapy in stage II/III CRC. There
were 2720 population control genotypes (1391 males) from
the Wellcome Trust Case-Control Consortium 2 (WTCCC2)
1958 birth cohort (also known as the National Child Develop-
ment Study), which included all births in England, Wales and
Scotland during a single week in 1958.
The Australian study comprised 591 patients treated for
CRC at the Royal Melbourne, Western and St Francis
Xavier Cabrini Hospitals in Melbourne from 1999 to 2009.
The 2353 controls were derived from Queensland or
Melbourne: for the former, the controls came from the
Brisbane Twin Nevus Study; for the latter, individuals were
participants in the Genes in Myopia study. There was no
overlap between the CFR and Australian data sets. Owing to
potential residual ethnic heterogeneity within the Melbourne
population, for the Australian cohort only we performed an
additional screen to minimize heterogeneity after performing
principal components analysis (PCA) to remove individuals
who clustered with non-CEU individuals (see below). We
achieved this by performing PCA on the Australian cases
and controls without reference samples of known ancestry.
We then paired each case with a control in a 1:1 ratio based
on a maximum separation of 0.050 using the first and
second eigenvectors. All unpaired samples were excluded,
leaving 441 cases and 441 controls in the study. Calculation
of the genomic inflation factor, lGC, showed this to be 1.02
after this filtering.
UK2 (NSCCG) consisted of 2854 CRC cases (58% male,
mean age at diagnosis 59.3 years; SD+ 8.7) ascertained
through two ongoing initiatives at the Institute of Cancer
Research/Royal Marsden Hospital NHS Trust (RMHNHST)
from 1999 onwards—The National Study of Colorectal
Cancer Genetics (NSCCG) and the Royal Marsden Hospital
Trust/Institute of Cancer Research Family History and DNA
Table 4. Haplotype analysis in the 12q13.13 region
Haplotype Freq. in casesFreq. in controls OR
P-value
TTT
CTT
CGC
TTC
CTC
0.0922
0.1407
0.3829
0.3102
0.0739
0.1085
0.1567
0.3443
0.3194
0.0710
0.82
0.87
1.19
0.96
1.05
5.3 × 1024
4.5 × 1023
6.2 × 1027
0.212
0.483
Haplotypes (cen-tel) at rs706793, rs7972465 and rs11169552 were analysed,
notwithstanding the low LD between the first and last of these SNPs. Five
haplotypes with frequencies of .0.01 were predicted. Ca, cases; Co, controls.
OR, odds ratio relative to all other haplotypes. P-value is from analysis of
effects of all haplotypes on disease risk in a logistic regression model. Odds
ratios and P-values relative to reference haplotype TTT are given in
Supplementary Material, Table S5.
Human Molecular Genetics, 2012, Vol. 21, No. 4943
Page 11
Registry. The 2822 controls (41% males; mean age 59.8
years; SD+10.8) were the spouses or unrelated friends of
patients with malignancies. None had a personal history of
malignancy at the time of ascertainment. All cases and con-
trols had self-reported European ancestry, and there were no
obvious differences in the demography of cases and controls
in terms of place of residence within the UK.
Scotland2 (SOCCS) comprised 2024 CRC cases (61% male;
mean age at diagnosis 65.8 years, SD+8.4) and 2092 popu-
lation controls (60% males; mean age 67.9 years, SD+9.0)
ascertained in Scotland. Cases were taken from an independ-
ent, prospective, incident CRC case series and aged ,80
years at diagnosis. Control subjects were population controls
matched by age (+5 years), gender and area of residence
within Scotland.
UK3 (NSCCG) comprised 7912 CRC cases (65% male;
mean age at diagnosis 59 years, SD+ 8.2) and 4398 controls
(40% male; mean age 62 years, SD+ 11.5) ascertained
through NSCCG post-2005.
Scotland3 (SOCCS) comprised 1145 CRC cases (50% male;
mean age at diagnosis 53.2 years, SD+15.4) and 2203
cancer-free population controls (47% male; mean age 51.8
years, SD+11.5). Controls were recruited as part of the
Generation Scotland study.
UK4 (CORGI2BCD) consisted of 621 CRC or CRAd cases
(46% male; mean age at diagnosis 58.3 years; SD+14.1) and
1121 cancer-free population or spouse controls (45% male;
mean age 45.1 years, SD+15.9), sampled using the same
criteria as UK1.
Cambridge/SEARCH consisted of 2248 CRC cases (56%
male; mean age at diagnosis 59.2 years, SD+ 8.1) and 2209
controls (42% males; mean age 57.6 years, SD+ 15.1).
Samples were ascertained through the SEARCH (Studies of
Epidemiology and Risk Factors in Cancer Heredity, http://
www.cancerhelp.org.uk/trials/a-study-looking-at-genetic-causes-
of-cancer) study based in Cambridge, UK. Recruitment started
in 2000; initial patient contact was though the general practi-
tioner. Control samples were collected post-2003. Eligible
individuals were sex and frequency matched in 5-year age
bands to cases.
The COIN samples were 2151 cases derived from the
COIN and COIN-B clinical trials of metastatic CRC.
Median age was 63 years. COIN cases were compared
against genotypes from 2501 population controls (1237
males), from the WTCCC2 National Blood Service (NBS)
cohort (50% male; mean age at diagnosis 53.2 years,
SD+15.4).
The Helsinki (FCCPS) study (http://research.med.helsinki.
fi/gsb/aaltonen/) comprised 988 cases from a population-based
collection centred on south-eastern Finland and 864 popula-
tion controls from the same collection.
EPICOLON included 1410 CRC cases matched with the
same number of controls collected in a prospective fashion
from centres in Spain. Exclusion criteria were Mendelian
CRC syndromes and a personal history of inflammatory
bowel disease.
The Leiden sample set included 858 unselected cases with
CRC and 690 controls ascertained through genetic testing pro-
grammes for non-cancer-related conditions from the Leiden
area.
In all cases, CRC was defined according to the ninth revi-
sion of the International Classification of Diseases (ICD) by
codes 153–154 and all cases had pathologically proven
disease. Only individuals of white European origin were
included in the study.
Sample preparation and genotyping
Collection of blood samples and clinico-pathological informa-
tion from patients and controls was undertaken with informed
consent and ethical review board approval in accordance with
the tenets of the Declaration of Helsinki. DNA was extracted
from samples using conventional methods and quantified
using PicoGreen (Invitrogen). The VQ, UK1, Scotland1 and
Australia GWA cohorts were genotyped using Illumina
Hap300, Hap370 or Hap550 arrays. 1958BC and NBS geno-
typing was performed as part of the WTCCC2 study on
Hap1.2M arrays. In UK2 and Scotland2, genotyping was con-
ducted using custom Illumina Infinium arrays according to the
manufacturer’s protocols. Some COIN SNPs were typed on
custom Illumina Goldengate arrays. To ensure quality of geno-
typing, a series of duplicate samples was genotyped, resulting
in 99.9% concordant calls in all cases.
Othergenotypingwasconducted
allele-specific PCR KASPar chemistry (KBiosciences Ltd,
Hertfordshire, UK), Taqman (Life Sciences, Carlsbad, CA,
USA) or MassARRAY (Sequenom Inc., San Diego, CA, USA).
All primers, probes and conditions used are available on
request. Genotyping quality control was tested using duplicate
DNA samples within studies and SNP assays, together with
direct sequencing of subsets of samples to confirm genotyping
accuracy.ForallSNPs, .99%concordantresultswereobtained.
We excluded SNPs from analysis if they failed one or more
of the following thresholds: GenCall scores ,0.25; overall
call rates ,95%; minor allele frequency (MAF),0.01;
departure from the Hardy–Weinberg equilibrium (HWE) in
controls at P,1024or in cases at P , 1026; outlying in
terms of signal intensity or X:Y ratio; discordance between
duplicate samples; and, for SNPs with evidence of association,
poor clustering on inspection of X:Y plots. We excluded
individuals from the GWA analyses if they had evidence of
non-white European ancestry by PCA-based analysis in com-
parison with HapMap samples (http://hapmap.ncbi.nlm.nih.
gov/) or by self-report. Deviation of the genotype frequencies
in the controls from those expected under HWE was assessed
by the x2test (1 df), or Fisher’s exact test where an expected
cell count was ,5.
using competitive
Association statistics and imputation
Associations between SNP genotype and disease status were
primarily assessed in STATA v10 (http://www.stata.com/)
and PLINK v1.07 (http://pngu.mgh.harvard.edu/~purcell/p
link/) using allelic and Cochran–Armitage tests (both with
1df) respectively, or by Fisher’s exact test where an expected
cell count was ,5. Genotypic (2 df), dominant (1 df) and
recessive (1 df) tests were also performed. The risks associated
with each SNP were estimated by allelic, heterozygous and
homozygous ORs using unconditional logistic regression,
and associated 95% CIs were calculated.
944 Human Molecular Genetics, 2012, Vol. 21, No. 4
Page 12
Joint analysis of data generated from multiple phases was
conducted using standard methods for combining raw data
based on the Mantel–Haenszel method in STATA and in
PLINK. Joint ORs and 95% CIs were calculated assuming
fixed- and random-effects models. Tests of the significance
of the pooled effect sizes were calculated using a standard
normal distribution. Cochran’s Q statistic to test for heterogen-
eity and the I2statistic to quantify the proportion of the total
variation due to heterogeneity were calculated. Large hetero-
geneity is typically defined as I2≥75%. Where significant het-
erogeneity was identified, results from the random-effects
modelwere reported.Alongside,
meta-analysis based on allele dosage (0, 1, 2) and incorporated
age and sex as co-variates. Although age and sex are asso-
ciated with the CRC risk, they were not associated with
SNP genotype and did not materially affect the significance
of any of the reported associations (data not shown).
The combined effects of pairs or other multiples of loci
identified as possibly associated with the CRC risk were
investigated by unconditional or conditional logistic regres-
sion analysis in PLINK and STATA to test for independent
effects of each SNP, stratifying by sample series. Logistic re-
gression was undertaken both pairwise with the original
tagSNP and then in a backwards analysis that initially
included all SNPs with good evidence of association in
each region. We used Haploview software v4.2 (http
://www.broadinstitute.org/haploview) to infer the LD struc-
ture of the genome in the 1q41 and 12q13.13 regions, and
used the expectation maximum algorithms in Haploview or
PLINK to infer haplotypes.
To predict genotypes at untyped SNPs in both regions,
imputation of the UK1, Scotland 1 and VQ58 data sets was
performed using the IMPUTE2 software and the combined
CEU 1000 Genomes low-coverage pilot and complete
HapMap3 haplotypes reference set, which was filtered to
remove duplicate haplotypes (both from https://mathgen.sta
ts.ox.ac.uk/impute/impute_v2.html) (8). Association statistics
for imputed SNPs were calculated in SNPTEST v1.1.5
(www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html)
using the ‘–proper’ option, which is an additive model score
test based on missing data likelihood, to allow for the uncer-
tainty of imputed genotypes (9). Imputed markers with prop-
er_info scores ,0.5, imputed call rates per SNP ,0.9
(using a maximum genotype probability threshold of 0.9 to
call a genotype) and MAFs ,0.01 were excluded from the
analyses. Meta-analyses of the sample sets were carried out
with Meta (10) (http://www.stats.ox.ac.uk/~jsliu/meta.html)
and in STATA, using the genotype probabilities from
IMPUTE2 where a SNP was not directly typed.
The GENECLUSTER program (11) was used to analyse our
UK1, Scotland1 and VQ58 samples specifically in order to test
whether one- or two-SNP models better fitted the association
signals in each SNP region. GENECLUSTER is a Bayesian
method that uses HapMap haplotypes to estimate genealogy
of samples and by jointly testing all SNPs on each branch of
the genealogy in cases and controls, the program indicates
the identities of the SNP(s) most likely to have the strongest
association signal, thus potentially helping to identify
functional variation. The default model parameters were
used, specifically mutation model prior: (0.50, 0.50, 0.00),
we also performed
max number of trees to consider per location: 1 and beta
risk prior parameters: (5.00, 5.00).
Essentially for comparative purposes, we also ran the Mar-
garita program (12), based on ancestral recombination graphs
(ARGs), in UK1, Scotland1 and VQ58 for the genotyped SNPs
in the 1q and 12q regions. This program aims to maximize
available information as to the location and identity of a func-
tional SNP by reconstructing the genealogical history of the
sample population. For each ARG, a putative risk mutation
is placed on the marginal tree and the frequency of each
branch in cases and controls is assessed. For each region,
30ARGs were constructed and the significance of a SNP at
each branchpoint assessed by 10 000 permutations. Unlike
GENECLUSTER, Margarita does not specifically address
the issue of whether there are two independent underlying
SNP in each region, and comparison was therefore restricted
to the single-SNP scenario.
Genome co-ordinates were taken from the NCBI build
36/hg18 (dbSNP b126).
SUPPLEMENTARY MATERIAL
Supplementary Material is available at HMG online.
ACKNOWLEDGEMENTS
This study made use of genotyping data on the 1958 Birth
Cohort and NBS samples, kindly made available by the Inves-
tigators of those studies and the Wellcome Trust Case-Control
Consortium 2; a full list of the investigators who contributed to
the generation of the data is available from http://www.wtccc.
org.uk/. We are also grateful to the Spanish National Genotyp-
ing Center (CEGEN-ISCIII)-USC node. The work was carried
out (in part) at the Esther Koplowitz Centre, Barcelona. We
are grateful to colleagues in the EPICOLON, CORGI and
COGENT consortia. Finally, we would like to thank all indi-
viduals who participated in the study.
Conflict of Interest statement. None declared.
FUNDING
Funding was primarily provided by Cancer Research UK. The
EU FP7 CHIBCHA grant supported LGC-C through funding
to IPMT, SC-B and ACar. Core infrastructure support to the
Wellcome Trust Centre for Human Genetics, Oxford was pro-
vided by grant 090532/Z/09/Z. I.P.M.T. received support from
the Oxford NIHR Comprehensive Biomedical Research
Centre. The UK National Cancer Research Network supported
the NSCCG. Additional funding to M.D. was provided by the
Medical Research Council (G0000657-53203), CORE and
Scottish Executive Chief Scientist’s Office (K/OPR/2/2/
D333, CZB/4/449). The EPICOLON work was supported by
grants from the Fondo de Investigacio ´n Sanitaria/FEDER
(08/0024, 08/1276, PS09/02368), Ministerio de Ciencia e
Innovacio ´n (SAF2010-19273), Asociacio ´n Espan ˜ola contra el
Ca ´ncer (Fundacio ´n Cientı ´fica y Junta de Barcelona) and Fun-
dacio ´ Olga Torres (CRP). S.C.-B. and C.F.-R. are supported
by contracts from the Fondo de Investigacio ´n Sanitaria
Human Molecular Genetics, 2012, Vol. 21, No. 4945
Page 13
(CP03-0070 and PS09/02368). CIBERehd and CIBERER are
funded by the Instituto de Salud Carlos III. For the Melbourne
cases, work was supported by the Hilton Ludwig Cancer
Metastasis Initiative. The specimens and data from Australian
colon cancer patients were provided by the Victorian Cancer
Biobank and BioGrid Australia with appropriate ethics
approval. The Victorian Cancer Biobank is supported by the
Victorian Government. CERA receives operational infrastruc-
ture support from the Victorian Government. Funding to pay
the Open Access publication charges for this article was pro-
vided by the Wellcome Trust.
REFERENCES
1. Houlston, R.S., Cheadle, J., Dobbins, S.E., Tenesa, A., Jones, A.M.,
Howarth, K., Spain, S.L., Broderick, P., Domingo, E., Farrington, S. et al.
(2010) Meta-analysis of three genome-wide association studies identifies
susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and
20q13.33. Nat. Genet., 42, 973–977.
2. Tomlinson, I., Carvajal-Carmona, L., Dobbins, S., Tenesa, A., Jones, A.,
Howarth, K., Palles, C., Broderick, P., Jaeger, E., Farrington, S. et al.
(2011) Multiple common susceptibility variants near BMP pathway loci
GREM1, BMP4, and BMP2 explain part of the missing heritability of
colorectal cancer. PLoS Genet., 7, e1002105.
3. Hemminki, K., Forsti, A., Houlston, R. and Bermejo, J.L. (2011)
Searching for the missing heritability of complex diseases. Hum. Mutat.,
32, 259–262.
4. Gamazon, E.R., Zhang, W., Konkashbaev, A., Duan, S., Kistner, E.O.,
Nicolae, D.L., Dolan, M.E. and Cox, N.J. (2010) SCAN: SNP and copy
number annotation. Bioinformatics, 26, 259–262.
5. Yang, T.P., Beazley, C., Montgomery, S.B., Dimas, A.S.,
Gutierrez-Arcelus, M., Stranger, B.E., Deloukas, P. and Dermitzakis, E.T.
(2010) Genevar: a database and Java application for the analysis and
visualization of SNP-gene associations in eQTL studies. Bioinformatics,
26, 2474–2476.
6. Knappskog, S., Bjornslett, M., Myklebust, L.M., Huijts, P.E., Vreeswijk,
M.P., Edvardsen, H., Guo, Y., Zhang, X., Yang, M., Ylisaukko-Oja, S.K.
et al. (2011) The MDM2 promoter SNP285C/309G haplotype diminishes
Sp1 transcription factor binding and reduces risk for breast and ovarian
cancer in Caucasians. Cancer Cell, 19, 273–282.
7. Pardini, B., Kumar, R., Naccarati, A., Prasad, R.B., Forsti, A., Polakova,
V., Vodickova, L., Novotny, J., Hemminki, K. and Vodicka, P. (2011)
MTHFR and MTRR genotype and haplotype analysis and colorectal
cancer susceptibility in a case-control study from the Czech Republic.
Mutat. Res., 721, 74–80.
8. Howie, B.N., Donnelly, P. and Marchini, J. (2009) A flexible and accurate
genotype imputation method for the next generation of genome-wide
association studies. PLoS Genet., 5, e1000529.
9. Marchini, J., Howie, B., Myers, S., McVean, G. and Donnelly, P. (2007) A
new multipoint method for genome-wide association studies by
imputation of genotypes. Nat. Genet., 39, 906–913.
10. Liu, J.Z., Tozzi, F., Waterworth, D.M., Pillai, S.G., Muglia, P., Middleton,
L., Berrettini, W., Knouff, C.W., Yuan, X., Waeber, G. et al. (2010)
Meta-analysis and imputation refines the association of 15q25 with
smoking quantity. Nat. Genet., 42, 436–440.
11. Su, Z. and Cardin, N., The Wellcome Trust Case Control Consortium,
Donnelly, P. and Marchini, J. (2009) A Bayesian method for detecting and
characterizing allelic heterogeneity and boosting signals in genome-wide
association studies. Stat. Sci., 23, 430–450.
12. Minichiello, M.J. and Durbin, R. (2006) Mapping trait loci by use of
inferred ancestral recombination graphs. Am. J. Hum. Genet., 79,
910–922.
946Human Molecular Genetics, 2012, Vol. 21, No. 4