A Robust Statistical Method for Association-Based eQTL Analysis

School of Biosciences, University of Birmingham, Birmingham, United Kingdom.
PLoS ONE (Impact Factor: 3.23). 08/2011; 6(8):e23192. DOI: 10.1371/journal.pone.0023192
Source: PubMed


It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.
We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.
The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.

Download full-text


Available from: Lindsey Leach, Apr 10, 2015
  • Source
    • "The eQTLs thus defined were further classified based on their physical distance from the associated gene, either as cis eQTLs if the SNP locates within 500 kb upstream of the transcript start and 500 kb downstream of 3’ end of the gene or otherwise as trans eQTLs. The 11,290 significant associations gave rise to 1,043 eQTLs, of which about two third (671) were trans eQTLs while only 372 were in cis, consistent with previous eQTL studies (for example, [7,22]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: While the possible sources underlying the so-called 'missing heritability' evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits. Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed. We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability.
    Full-text · Article · Jan 2014 · BMC Genomics
  • Source
    • "A clear diagonal is visualized, which demonstrates that the expression of many genes is cis regulated. As described in other studies (Schadt et al., 2008; Holloway et al., 2011; Jiang et al., 2011), generally cis eQTLs exert stronger effects than trans eQTLs. In this study, cis eQTLs have LOD scores ranging from 3.0 to 30.4 (average 5.4, median 4.2), while trans eQTLs range from 3.0 to 58.6 (average 4.3, median 3.5). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The role of many genes and interactions among genes involved in flowering time have been studied extensively in Arabidopsis, and the purpose of this study was to investigate how effectively results obtained with the model species Arabidopsis can be applied to the Brassicacea with often larger and more complex genomes. Brassica rapa represents a very close relative, with its triplicated genome, with subgenomes having evolved by genome fractionation. The question of whether this genome fractionation is a random process, or whether specific genes are preferentially retained, such as flowering time (Ft) genes that play a role in the extreme morphological variation within the B. rapa species (displayed by the diverse morphotypes), is addressed. Data are presented showing that indeed Ft genes are preferentially retained, so the next intriguing question is whether these different orthologues of Arabidopsis Ft genes play similar roles compared with Arabidopsis, and what is the role of these different orthologues in B. rapa. Using a genetical-genomics approach, co-location of flowering quantitative trait loci (QTLs) and expression QTLs (eQTLs) resulted in identification of candidate genes for flowering QTLs and visualization of co-expression networks of Ft genes and flowering time. A major flowering QTL on A02 at the BrFLC2 locus co-localized with cis eQTLs for BrFLC2, BrSSR1, and BrTCP11, and trans eQTLs for the photoperiod gene BrCO and two paralogues of the floral integrator genes BrSOC1 and BrFT. It is concluded that the BrFLC2 Ft gene is a major regulator of flowering time in the studied doubled haploid population.
    Full-text · Article · Sep 2013 · Journal of Experimental Botany