Robust linear regression methods in association studies.

Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
Bioinformatics (Impact Factor: 5.47). 01/2011; 27(6):815-21. DOI:10.1093/bioinformatics/btr006
Source: PubMed

ABSTRACT It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination.
Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach.
The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.

0 0
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Rationale: Left ventricular hypertrophy (LVH) is a heritable predictor of cardiovascular disease, particularly in blacks. Objective: Determine the feasibility of combining evidence from two distinct but complementary experimental approaches to identify novel genetic predictors of increased LV mass. Methods: Whole-exome sequencing (WES) was conducted in seven African-American sibling trios ascertained on high average familial LV mass indexed to height (LVMHT) using Illumina HiSeq technology. Identified missense or nonsense (MS/NS) mutations were examined for association with LVMHT using linear mixed models adjusted for age, sex, body weight, and familial relationship. To functionally assess WES findings, human induced pluripotent stem cell-derived cardiomyocytes (induced pluripotent stem cell-CM) were stimulated to induce hypertrophy; mRNA sequencing (RNA-seq) was used to determine gene expression differences associated with hypertrophy onset. Statistically significant findings under both experimental approaches identified LVH candidate genes. Candidate genes were further prioritized by seven supportive criteria that included additional association tests (two criteria), regional linkage evidence in the larger HyperGEN cohort (one criterion), and publically available gene and variant based annotations (four criteria). Results: WES reads covered 91% of the target capture region (of size 37.2 MB) with an average coverage of 65×. WES identified 31,426 MS/NS mutations among the 21 individuals. A total of 295 MS/NS variants in 265 genes were associated with LVMHT with q-value <0.25. Of the 265 WES genes, 44 were differentially expressed (P < 0.05) in hypertrophied cells. Among the 44 candidate genes identified, 5, including HLA-B, HTT, MTSS1, SLC5A12, and THBS1, met 3 of 7 supporting criteria. THBS1 encodes an adhesive glycoprotein that promotes matrix preservation in pressure-overload LVH. THBS1 gene expression was 34% higher in hypertrophied cells (P = 0.0003) and a predicted conserved and damaging NS variant in exon 13 (A2099G) was significantly associated with LVHMT (P = 4 × 10(-6)). Conclusion: Combining evidence from cutting-edge genetic and cellular experiments can enable identification of novel LVH risk loci.
    Frontiers in Genetics 01/2012; 3:92.
  • [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: IgE is both a marker and mediator of allergic inflammation. Despite reported differences in serum total IgE levels by race-ethnicity, African American and Latino subjects have not been well represented in genetic studies of total IgE. OBJECTIVE: We sought to identify the genetic predictors of serum total IgE levels. METHODS: We used genome-wide association data from 4292 subjects (2469 African Americans, 1564 European Americans, and 259 Latinos) in the EVE Asthma Genetics Consortium. Tests for association were performed within each cohort by race-ethnic group (ie, African American, Latino, and European American) and asthma status. The resulting P values were meta-analyzed, accounting for sample size and direction of effect. Top single nucleotide polymorphism associations from the meta-analysis were reassessed in 6 additional cohorts comprising 5767 subjects. RESULTS: We identified 10 unique regions in which the combined association statistic was associated with total serum IgE levels (P < 5.0 × 10(-6)) and the minor allele frequency was 5% or greater in 2 or more population groups. Variant rs9469220, corresponding to HLA-DQB1, was the single nucleotide polymorphism most significantly associated with serum total IgE levels when assessed in both the replication cohorts and the discovery and replication sets combined (P = .007 and 2.45 × 10(-7), respectively). In addition, findings from earlier genome-wide association studies were also validated in the current meta-analysis. CONCLUSION: This meta-analysis independently identified a variant near HLA-DQB1 as a predictor of total serum IgE levels in multiple race-ethnic groups. This study also extends and confirms the findings of earlier genome-wide association analyses in African American and Latino subjects.
    The Journal of allergy and clinical immunology 11/2012; · 12.05 Impact Factor