Basic statistical analysis in genetic case-control studies

Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Nature Protocol (Impact Factor: 9.67). 02/2011; 6(2):121-33. DOI: 10.1038/nprot.2010.182
Source: PubMed


This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take ~1 h to complete.

Download full-text


Available from: Krina T Zondervan,
71 Reads
  • Source
    • "Each SNP was separately assessed for the Hardy–Weinberg equilibrium in cases and controls using Chi square test. The additive model was conducted by the Cochran Armitage trend test (Clarke et al. 2011). As the risk of developing PD varied for males and females and for subjects of different ages, p values and ORs were also adjusted by age and sex to minimize the effect of age and sex using the logistic regression method. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, a series of studies found that the single-nucleotide polymorphisms (SNPs) rs6812193 in the family with sequence similarity 47, member E (FAM47E), rs6825004 in the scavenger receptor class B member 2 (SCARB2) and rs4889603 in the Syntaxin1B (STX1B) genes increase the risk for Parkinson's disease (PD). However, the results of subsequent independent studies were inconsistent. To explore the associations between the three SNPs and PD in the Chinese population, a large cohort was analyzed in a case-control study. A total of 1994 subjects, including 1179 PD and 815 healthy controls (HCs), were investigated. All subjects were genotyped for rs6812193, rs6825004 and rs4889603 using the Sequenom iPLEX Assay. There was no significant difference in additive genetic model of rs6812193, rs6825004 and rs4889603 between PD and controls, even after being stratified by sex and age. In addition, no significant differences were found between other subgroups of PD patients with regard to clinical presentation. Our findings suggested that FAM47E rs6812193, SCARB2 rs6825004 and STX1B rs4889603 do not confer a significant risk for PD in Chinese population.
    Journal of Neural Transmission 07/2015; DOI:10.1007/s00702-015-1430-4 · 2.40 Impact Factor
  • Source
    • "The traditional approach to test for association of T2D susceptibility with a genetic variant, irrespective of whether it is assayed through re-sequencing, array genotyping or imputation, is to compare the frequencies of the three possible genotypes between cases and controls [25]. In this setting, the most flexible approach makes use of a logistic regression model, typically assuming an additive effect of each allele on the disease (i.e. a multiplicative effect on the odds ratio). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies of type 2 diabetes have been extremely successful in discovering loci that contribute genetic effects to susceptibility to the disease. However, at the vast majority of these loci, the variants and transcripts through which these effects on type 2 diabetes are mediated are unknown, limiting progress in defining the pathophysiological basis of the disease. In this review, we will describe available approaches for assaying genetic variation across loci and discuss statistical methods to determine the most likely causal variants in the region. We will consider the utility of trans-ethnic meta-analysis for fine mapping by leveraging the differences in the structure of linkage disequilibrium between diverse populations. Finally, we will discuss progress in fine-mapping type 2 diabetes susceptibility loci to date and consider the prospects for future efforts to localise causal variants for the disease.
    Current Diabetes Reports 11/2014; 14(11):549. DOI:10.1007/s11892-014-0549-2 · 3.08 Impact Factor
  • Source
    • "Typically, hundreds of thousands of single-nucleotide polymorphisms (SNPs) are tested for association in these studies. Associations between the SNPs and the phenotypes are determined on the basis of differences in allele frequencies between cases and controls [1]. Several statistical methods have been proposed to control the family-wise error rate (FWER) for multiple comparison testing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Several methods have been proposed to account for multiple comparisons in genetic association studies. However, investigators typically test each of the SNPs using multiple genetic models. Association testing using the Cochran-Armitage test for trend assuming an additive, dominant, or recessive genetic model, is commonly performed. Thus, each SNP is tested three times. Some investigators report the smallest p-value obtained from the three tests corresponding to the three genetic models, but such an approach inherently leads to inflated type 1 errors. Because of the small number of tests (three) and high correlation (functional dependence) among these tests, the procedures available for accounting for multiple tests are either too conservative or fail to meet the underlying assumptions (e.g., asymptotic multivariate normality or independence among the tests). Results We propose a method to calculate the exact p-value for each SNP using different genetic models. We performed simulations, which demonstrated the control of type 1 error and power gains using the proposed approach. We applied the proposed method to compute p-value for a polymorphism eNOS -786T>C which was shown to be associated with breast cancer risk. Conclusions Our findings indicate that the proposed method should be used to maximize power and control type 1 errors when analyzing genetic data using additive, dominant, and recessive models.
    BMC Genetics 06/2014; 15(1):75. DOI:10.1186/1471-2156-15-75 · 2.40 Impact Factor
Show more