Robust linear regression methods in association studies

Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
Bioinformatics (Impact Factor: 4.62). 03/2011; 27(6):815-21. DOI: 10.1093/bioinformatics/btr006
Source: PubMed

ABSTRACT It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination.
Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach.
The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.

  • Source
    • "In order to identify missense or nonsense (MS/NS) singlenucleotide variants (SNVs) associated with LVMHT we conducted a mixed model regression with LVMHT as the dependent variable controlling for kinship structures as well as age, sex, and weight as covariates using the kinship R package program LMEKIN (Lourenco et al., 2011). We used a false discovery rate (FDR) criterion of q-value <0.25 (P-value <0.00258) for significance; this is more flexible than the usual Bonferroni criterion given our small sample size (N = 21). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Rationale: Left ventricular hypertrophy (LVH) is a heritable predictor of cardiovascular disease, particularly in blacks. Objective: Determine the feasibility of combining evidence from two distinct but complimentary experimental approaches to identify novel genetic predictors of increased LV mass . Methods: Whole exome sequencing (WES) was conducted in 7 African American sibling trios ascertained on high average familial LV mass indexed to height (LVMHT). WES identified 31,426 missense or nonsense mutations (MS/NS) which were examined for association with LVMHT using linear mixed models adjusted for age, sex, body weight, and family relationship. To functionally assess WES findings, human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) were stimulated to induce hypertrophy; mRNA sequencing was used to determine expression differences associated with hypertrophy onset. Results: After correction for multiple testing, 295 MS/NS variants in 265 genes were associated with LVMHT. We identified 44 of 265 WES genes differentially expressed (P<0.05) in hypertrophied cells. To further prioritize these 44 candidates, 7 supportive statistical and annotation-based criteria were used to evaluate the relevance of these genes. Five genes, HLA-B, HTT, MTSS1, SLC5A12, THBS1, were each supported by 3 criteria. THBS1 encodes an adhesive glycoprotein that promotes matrix preservation in pressure-overload LVH and harbors conserved and predicted damaging variants. Conclusions: Combining evidence from cutting-edge genetic and cellular experiments can enable identification of novel LVH risk loci.
    Frontiers in Genetics 05/2012; 3:92. DOI:10.3389/fgene.2012.00092
  • [Show abstract] [Hide abstract]
    ABSTRACT: One recent study indicates a significant association between certain single nucleotide polymorphisms (SNPs) in the genomic sequence of feline p53 and feline injection-site sarcoma (FISS). The aim of this study was to investigate the correlation between a specific nucleotide insertion in p53 gene and FISS in a German cat population. Blood samples from 150 German cats were allocated to a control group consisting of 100 healthy cats and a FISS-group consisting of 50 cats with FISS. All blood samples were examined for the presence of the SNP in the p53 gene. Results found the T-insertion at SNP 3 in 20.0% of the cats in the FISS-group and 19.2% of cats in the control-group. No statistically significant difference was observed in allelic distribution between the two groups. Further investigations are necessary to determine the association of SNPs in the feline p53 gene and the occurrence of FISS.
    Veterinary and Comparative Oncology 08/2012; 9999(9999). DOI:10.1111/j.1476-5829.2012.00344.x · 1.45 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: IgE is both a marker and mediator of allergic inflammation. Despite reported differences in serum total IgE levels by race-ethnicity, African American and Latino subjects have not been well represented in genetic studies of total IgE. OBJECTIVE: We sought to identify the genetic predictors of serum total IgE levels. METHODS: We used genome-wide association data from 4292 subjects (2469 African Americans, 1564 European Americans, and 259 Latinos) in the EVE Asthma Genetics Consortium. Tests for association were performed within each cohort by race-ethnic group (ie, African American, Latino, and European American) and asthma status. The resulting P values were meta-analyzed, accounting for sample size and direction of effect. Top single nucleotide polymorphism associations from the meta-analysis were reassessed in 6 additional cohorts comprising 5767 subjects. RESULTS: We identified 10 unique regions in which the combined association statistic was associated with total serum IgE levels (P < 5.0 × 10(-6)) and the minor allele frequency was 5% or greater in 2 or more population groups. Variant rs9469220, corresponding to HLA-DQB1, was the single nucleotide polymorphism most significantly associated with serum total IgE levels when assessed in both the replication cohorts and the discovery and replication sets combined (P = .007 and 2.45 × 10(-7), respectively). In addition, findings from earlier genome-wide association studies were also validated in the current meta-analysis. CONCLUSION: This meta-analysis independently identified a variant near HLA-DQB1 as a predictor of total serum IgE levels in multiple race-ethnic groups. This study also extends and confirms the findings of earlier genome-wide association analyses in African American and Latino subjects.
    The Journal of allergy and clinical immunology 11/2012; 131(4). DOI:10.1016/j.jaci.2012.10.002 · 11.25 Impact Factor
Show more