Population substructure and control selection in genome-wide association studies.

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.
PLoS ONE (Impact Factor: 3.53). 02/2008; 3(7):e2551. DOI: 10.1371/journal.pone.0002551
Source: PubMed

ABSTRACT Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Some investigators argue that controlling for self-reported race or ethnicity, either in statistical analysis or in study design, is sufficient to mitigate unwanted influence from population stratification. In this report, we evaluated the effectiveness of a study design involving matching on self-reported ethnicity and race in minimizing bias due to population stratification within an ethnically admixed population in California. We estimated individual genetic ancestry using structured association methods and a panel of ancestry informative markers, and observed no statistically significant difference in distribution of genetic ancestry between cases and controls (P=0.46). Stratification by Hispanic ethnicity showed similar results. We evaluated potential confounding by genetic ancestry after adjustment for race and ethnicity for 1260 candidate gene SNPs, and found no major impact (>10%) on risk estimates. In conclusion, we found no evidence of confounding of genetic risk estimates by population substructure using this matched design. Our study provides strong evidence supporting the race- and ethnicity-matched case-control study design as an effective approach to minimizing systematic bias due to differences in genetic ancestry between cases and controls.
    09/2011; 1:101. DOI:10.4172/2161-1165.1000101
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dubowitz syndrome is a rare disorder characterized by multiple congenital anomalies, cognitive delay, growth failure, an immune defect, and an increased risk of blood dyscrasia and malignancy. There is considerable phenotypic variability, suggesting genetic heterogeneity. We clinically characterized and performed exome sequencing and high-density array SNP genotyping on three individuals with Dubowitz syndrome, including a pair of previously-described siblings (Patients 1 and 2, brother and sister) and an unpublished patient (Patient 3). Given the siblings' history of bone marrow abnormalities, we also evaluated telomere length and performed radiosensitivity assays. In the siblings, exome sequencing identified compound heterozygosity for a known rare nonsense substitution in the nuclear ligase gene LIG4 (rs104894419, NM_002312.3:c.2440C>T) that predicts p.Arg814X (MAF:0.0002) and an NM_002312.3:c.613delT variant that predicts a p.Ser205Leufs*29 frameshift. The frameshift mutation has not been reported in 1000 Genomes, ESP, or ClinSeq. These LIG4 mutations were previously reported in the sibling sister; her brother had not been previously tested. Western blotting showed an absence of a ligase IV band in both siblings. In the third patient, array SNP genotyping revealed a de novo ∼3.89 Mb interstitial deletion at chromosome 17q24.2 (chr 17:62,068,463-65,963,102, hg18), which spanned the known Carney complex gene PRKAR1A. In all three patients, a median lymphocyte telomere length of ≤1st centile was observed and radiosensitivity assays showed increased sensitivity to ionizing radiation. Our work suggests that, in addition to dyskeratosis congenita, LIG4 and 17q24.2 syndromes also feature shortened telomeres; to confirm this, telomere length testing should be considered in both disorders. Taken together, our work and other reports on Dubowitz syndrome, as currently recognized, suggest that it is not a unitary entity but instead a collection of phenotypically similar disorders. As a clinical entity, Dubowitz syndrome will need continual re-evaluation and re-definition as its constituent phenotypes are determined.
    PLoS ONE 06/2014; 9(6):e98686. DOI:10.1371/journal.pone.0098686 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background The precise etiology of rotator cuff disease is unknown, but prior evidence suggests a role for genetic factors. Variants of estrogen-related receptor-β (ESRRB) have been previously associated with rotator cuff disease. The purpose of the present study was to confirm the association between multiple candidate genes, including ESRRB, and rotator cuff disease in an independent set of patients with rotator cuff tear. Materials and methods The Illumina 5M (Illumina Inc, San Diego, CA, USA) single nucleotide polymorphism (SNP) platform was used to genotype 175 patients with rotator cuff tear. Genotypes were used to select a set of 2595 genetically matched Caucasian controls available from the Illumina iControls database. Tests of association were performed with Genome-wide Efficient Mixed Model Association (GEMMA) software at 69 SNPs that fell within 20 kb of 6 candidate genes (DEFB1, DENND2C, ESRRB, FGF3, FGF10, and FGFR1). Results Tests of association revealed 1 significantly associated SNP occurring in ESRRB (rs17583842; P = 4.4E–4). Another SNP within ESRRB (rs7157192) had a nominal P value of 7.8E–3. FastPHASE software estimated 2 frequent haplotypes among 54 individuals who carried both risk alleles at these 2 SNPs. The first haplotype had a frequency of 13.9% (n = 15) in risk-allele carriers and only 2.2% in controls (odds ratio, 6.9; 95% confidence interval, 3.9-2.2). The second haplotype had a frequency of 12.9% in risk-allele carriers and only 2.7% in controls (odds ratio, 5.3; 95% confidence interval, 3.0-9.5). Conclusions The significant association and the presence of high-risk haplotypes identified in the ESRRB gene confirm the association of variants in ESRRB and rotator cuff disease.

Full-text (2 Sources)

Available from
Jun 2, 2014