Confounding from Cryptic Relatedness in Case-Control Association Studies

Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
PLoS Genetics (Impact Factor: 8.17). 10/2005; 1(3):e32. DOI: 10.1371/journal.pgen.0010032
Source: PubMed

There has long been concern in the human genetics community that case-control association studies may be subject to high rates of false positives if there is unrecognized population structure. After being considered rather suspect in the 1990s for this reason, case-control studies are regaining popularity, and will no doubt be used widely in future genome-wide association studies.
Therefore, it is important to fully understand the types of factors that can lead to excess rates of false positives in case-control studies. Virtually all of the previous discussion in the literature of excess false positives (confounding) in case-control studies has focused on the role of population structure. Yet a widely cited 1999 paper by Devlin and Roeder (that introduced the genomic control concept) argued that, in fact, “cryptic relatedness” (referring to the idea that some members of a case-control sample might actually be close relatives, unbeknownst to the investigator) is likely to be a far more important confounder than population structure. Moreover, one of the two main types of statistical approaches for dealing with confounding in case-control studies (i.e., structured association methods) does not correct for cryptic relatedness.
This work provides the first careful model of cryptic relatedness, and outlines exactly when cryptic relatedness is and is not likely to be a problem. The authors provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. The analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives.

Download full-text


Available from: Benjamin F Voight, Jul 24, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The objectives of the study were to assess genome wide association study (GWAS) for sugarcane on a panel of 183 accessions and to evaluate the impact of population structure and family relatedness on QTL detection. The panel was genotyped with 3327 AFLP, DArT and SSR markers and phenotyped for 13 traits related to agro-morphology, sugar yield, bagasse content and disease resistances. Marker-trait associations were detected using (i) general linear models that took population structure into account with either a Q matrix from STRUCTURE software or principal components from a principal component analysis added as covariates, and (ii) mixed linear models that took into account both population structure and family relatedness estimated using a similarity matrix K* computed using Jaccard’s coefficient. With general linear models analysis, test statistics were inflated in most cases, while mixed linear models analysis allowed the inflation of test statistics to be controlled in most cases. When only detections in which both population structure and family relatedness were correctly controlled were considered, only 11 markers were significantly associated with three out of the 13. Among these 11 markers, six were linked to the major resistance gene Bru1, which has already been identified. Our results confirm that the use of GWAS is feasible for sugarcane in spite of its complex polyploid genome but also underline the need to take into account family relatedness and not only population structure. The small number of significant associations detected suggests that a larger population and/or denser genotyping are required to increase the statistical power of association detection.
    Euphytica 02/2015; 202:269-284. DOI:10.1007/s10681-014-1294-y · 1.69 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The PCK1 gene, encoding cytosolic phosphoenolpyruvate carboxykinase (PEPCK-C), has previously been implicated as a candidate gene for type 2 diabetes (T2D) susceptibility. Rodent models demonstrate that over-expression of Pck1 can result in T2D development and a single nucleotide polymorphism (SNP) in the promoter region of human PCK1 (-232C/G) has exhibited significant association with the disease in several cohorts. Within the UK-resident South Asian population, T2D is 4 to 6 times more common than in indigenous white Caucasians. Despite this, few studies have reported on the genetic susceptibility to T2D in this ethnic group and none of these has investigated the possible effect of PCK1 variants. We therefore aimed to investigate the association between common variants of the PCK1 gene and T2D in a UK-resident South Asian population of Punjabi ancestry, originating predominantly from the Mirpur area of Azad Kashmir, Pakistan. Methods We used TaqMan assays to genotype five tagSNPs covering the PCK1 gene, including the -232C/G variant, in 903 subjects with T2D and 471 normoglycaemic controls. Results Of the variants studied, only the minor allele (G) of the -232C/G SNP demonstrated a significant association with T2D, displaying an OR of 1.21 (95% CI: 1.03 - 1.42, p = 0.019). Conclusion This study is the first to investigate the association between variants of the PCK1 gene and T2D in South Asians. Our results suggest that the -232C/G promoter polymorphism confers susceptibility to T2D in this ethnic group.
    BMC Medical Genetics 10/2009; 10. DOI:10.1186/1471-2350-10-83 · 2.45 Impact Factor