Article

Robust linear regression methods in association studies

Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
Bioinformatics (Impact Factor: 4.62). 03/2011; 27(6):815-21. DOI: 10.1093/bioinformatics/btr006
Source: PubMed

ABSTRACT It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination.
Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach.
The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.

3 Followers
 · 
125 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Increased postprandial lipid (PPL) response to dietary fat intake is a heritable risk factor for cardiovascular disease (CVD). Variability in postprandial lipids results from the complex interplay of dietary and genetic factors. We hypothesized that detailed lipid profiles (eg, sterols and fatty acids) may help elucidate specific genetic and dietary pathways contributing to the PPL response. Methods and Results We used gas chromatography mass spectrometry to quantify the change in plasma concentration of 35 fatty acids and 11 sterols between fasting and 3.5 hours after the consumption of a high-fat meal (PPL challenge) among 40 participants from the GOLDN study. Correlations between sterols, fatty acids and clinical measures were calculated. Mixed linear regression was used to evaluate associations between lipidomic profiles and genomic markers including single nucleotide polymorphisms (SNPs) and methylation markers derived from the Affymetrix 6.0 array and the Illumina Methyl450 array, respectively. After the PPL challenge, fatty acids increased as well as sterols associated with cholesterol absorption, while sterols associated with cholesterol synthesis decreased. PPL saturated fatty acids strongly correlated with triglycerides, very low-density lipoprotein, and chylomicrons. Two SNPs (rs12247017 and rs12240292) in the sorbin and SH3 domain containing 1 (SORBS1) gene were associated with b-Sitosterol after correction for multiple testing (P≤4.5*10−10). SORBS1 has been linked to obesity and insulin signaling. No other markers reached the genome-wide significance threshold, yet several other biologically relevant loci are highlighted (eg, PRIC285, a co-activator of PPARa). Conclusions Integration of lipidomic and genomic data has the potential to identify new biomarkers of CVD risk.
    PLoS ONE 06/2014; 9(6):e99509. DOI:10.1371/journal.pone.0099509 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epistasis (synergistic interaction) among SNPs governing gene expression is likely to arise within transcriptional networks. However, the power to detect it is limited by the large number of combinations to be tested and the modest sample sizes of most datasets. By limiting the interaction search space firstly to cis-trans and then cis-cis SNP pairs where both SNPs had an independent effect on the expression of the most variable transcripts in the liver and brain, we greatly reduced the size of the search space. Within the cis-trans search space we discovered three transcripts with significant epistasis. Surprisingly, all interacting SNP pairs were located nearby each other on the chromosome (within 290 kb-2.16 Mb). Despite their proximity, the interacting SNPs were outside the range of linkage disequilibrium (LD), which was absent between the pairs (r2 < 0.01). Accordingly, we redefined the search space to detect cis-cis interactions, where a cis-SNP was located within 10 Mb of the target transcript. The results of this show evidence for the epistatic regulation of 50 transcripts across the tissues studied. Three transcripts, namely, HLA-G, PSORS1C1 and HLA-DRB5 share common regulatory SNPs in the pre-frontal cortex and their expression is significantly correlated. This pattern of epistasis is consistent with mediation via long-range chromatin structures rather than the binding of transcription factors in trans. Accordingly, some of the interactions map to regions of the genome known to physically interact in lymphoblastoid cell lines while others map to known promoter and enhancer elements. SNPs involved in interactions appear to be enriched for promoter markers. In the context of gene expression and its regulation, our analysis indicates that the study of cis-cis or local epistatic interactions may have a more important role than interchromosomal interactions.
    BMC Genomics 02/2015; 16(1):109. DOI:10.1186/s12864-015-1300-3 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Glycolytic potential (GP) in skeletal muscle is economically important in the pig industry because of its effect on pork processing yield. We have previously mapped a major quantitative trait loci (QTL) for GP on chromosome 3 in a White Duroc × Erhualian F2 intercross. We herein performed a systems genetic analysis to identify the causal variant underlying the phenotype QTL (pQTL). We first conducted genome-wide association analyses in the F2 intercross and an F19 Sutai pig population. The QTL was then refined to an 180-kb interval based on the 2-LOD drop method. We then performed expression QTL (eQTL) mapping using muscle transcriptome data from 497 F2 animals. Within the QTL interval, only one gene (PHKG1) has a cis-eQTL that was colocolizated with pQTL peaked at the same SNP. The PHKG1 gene encodes a catalytic subunit of the phosphorylase kinase (PhK), which functions in the cascade activation of glycogen breakdown. Deep sequencing of PHKG1 revealed a point mutation (C>A) in a splice acceptor site of intron 9, resulting in a 32-bp deletion in the open reading frame and generating a premature stop codon. The aberrant transcript induces nonsense-mediated decay, leading to lower protein level and weaker enzymatic activity in affected animals. The mutation causes an increase of 43% in GP and a decrease of>20% in water-holding capacity of pork. These effects were consistent across the F2 and Sutai populations, as well as Duroc × (Landrace × Yorkshire) hybrid pigs. The unfavorable allele exists predominantly in Duroc-derived pigs. The findings provide new insights into understanding risk factors affecting glucose metabolism, and would greatly contribute to the genetic improvement of meat quality in Duroc related pigs.
    PLoS Genetics 10/2014; 10(10):e1004710. DOI:10.1371/journal.pgen.1004710 · 8.17 Impact Factor