Article

Robust linear regression methods in association studies.

Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
Bioinformatics (Impact Factor: 5.47). 01/2011; 27(6):815-21. DOI: 10.1093/bioinformatics/btr006
Source: PubMed

ABSTRACT It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination.
Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach.
The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.

0 Bookmarks
 · 
113 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: IgE is both a marker and mediator of allergic inflammation. Despite reported differences in serum total IgE levels by race-ethnicity, African American and Latino subjects have not been well represented in genetic studies of total IgE. OBJECTIVE: We sought to identify the genetic predictors of serum total IgE levels. METHODS: We used genome-wide association data from 4292 subjects (2469 African Americans, 1564 European Americans, and 259 Latinos) in the EVE Asthma Genetics Consortium. Tests for association were performed within each cohort by race-ethnic group (ie, African American, Latino, and European American) and asthma status. The resulting P values were meta-analyzed, accounting for sample size and direction of effect. Top single nucleotide polymorphism associations from the meta-analysis were reassessed in 6 additional cohorts comprising 5767 subjects. RESULTS: We identified 10 unique regions in which the combined association statistic was associated with total serum IgE levels (P < 5.0 × 10(-6)) and the minor allele frequency was 5% or greater in 2 or more population groups. Variant rs9469220, corresponding to HLA-DQB1, was the single nucleotide polymorphism most significantly associated with serum total IgE levels when assessed in both the replication cohorts and the discovery and replication sets combined (P = .007 and 2.45 × 10(-7), respectively). In addition, findings from earlier genome-wide association studies were also validated in the current meta-analysis. CONCLUSION: This meta-analysis independently identified a variant near HLA-DQB1 as a predictor of total serum IgE levels in multiple race-ethnic groups. This study also extends and confirms the findings of earlier genome-wide association analyses in African American and Latino subjects.
    The Journal of allergy and clinical immunology 11/2012; · 12.05 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Glycolytic potential (GP) in skeletal muscle is economically important in the pig industry because of its effect on pork processing yield. We have previously mapped a major quantitative trait loci (QTL) for GP on chromosome 3 in a White Duroc × Erhualian F2 intercross. We herein performed a systems genetic analysis to identify the causal variant underlying the phenotype QTL (pQTL). We first conducted genome-wide association analyses in the F2 intercross and an F19 Sutai pig population. The QTL was then refined to an 180-kb interval based on the 2-LOD drop method. We then performed expression QTL (eQTL) mapping using muscle transcriptome data from 497 F2 animals. Within the QTL interval, only one gene (PHKG1) has a cis-eQTL that was colocolizated with pQTL peaked at the same SNP. The PHKG1 gene encodes a catalytic subunit of the phosphorylase kinase (PhK), which functions in the cascade activation of glycogen breakdown. Deep sequencing of PHKG1 revealed a point mutation (C>A) in a splice acceptor site of intron 9, resulting in a 32-bp deletion in the open reading frame and generating a premature stop codon. The aberrant transcript induces nonsense-mediated decay, leading to lower protein level and weaker enzymatic activity in affected animals. The mutation causes an increase of 43% in GP and a decrease of>20% in water-holding capacity of pork. These effects were consistent across the F2 and Sutai populations, as well as Duroc × (Landrace × Yorkshire) hybrid pigs. The unfavorable allele exists predominantly in Duroc-derived pigs. The findings provide new insights into understanding risk factors affecting glucose metabolism, and would greatly contribute to the genetic improvement of meat quality in Duroc related pigs.
    PLoS Genetics 10/2014; 10(10):e1004710. · 8.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Increased postprandial lipid (PPL) response to dietary fat intake is a heritable risk factor for cardiovascular disease (CVD). Variability in postprandial lipids results from the complex interplay of dietary and genetic factors. We hypothesized that detailed lipid profiles (eg, sterols and fatty acids) may help elucidate specific genetic and dietary pathways contributing to the PPL response.
    PLoS ONE 01/2014; 9(6):e99509. · 3.53 Impact Factor