Robust linear regression methods in association studies

Department of Mathematics, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
Bioinformatics (Impact Factor: 4.98). 03/2011; 27(6):815-21. DOI: 10.1093/bioinformatics/btr006
Source: PubMed


It is well known that data deficiencies, such as coding/rounding errors, outliers or missing values, may lead to misleading results for many statistical methods. Robust statistical methods are designed to accommodate certain types of those deficiencies, allowing for reliable results under various conditions. We analyze the case of statistical tests to detect associations between genomic individual variations (SNP) and quantitative traits when deviations from the normality assumption are observed. We consider the classical analysis of variance tests for the parameters of the appropriate linear model and a robust version of those tests based on M-regression. We then compare their empirical power and level using simulated data with several degrees of contamination.
Data normality is nothing but a mathematical convenience. In practice, experiments usually yield data with non-conforming observations. In the presence of this type of data, classical least squares statistical methods perform poorly, giving biased estimates, raising the number of spurious associations and often failing to detect true ones. We show through a simulation study and a real data example, that the robust methodology can be more powerful and thus more adequate for association studies than the classical approach.
The code of the robustified version of function lmekin() from the R package kinship is provided as Supplementary Material.

Full-text preview

Available from:
  • Source
    • "For diploid organisms, this implies three possible genotypes at each polymorphic site. For example, an SNP with alleles A (adenine) and G (guanine) would lead to three possible genotypes in a diploid organism: AA, AG, and GG [4]. Statistical analysis of the effect of this type of polymorphism on a phenotype of interest would therefore involve representing the SNP as a three-category variable , though we discuss other modes of representation in this article. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The field of chemometrics has its origin in chemistry and has been widely applied to the evaluation of analytical chemical data and quantitative structure-activity relationships. Chemometric techniques apply statistical and algorithmic methods to extract information from analytical multivariate data, including fused, heterogeneous data. These techniques are now widely applied across fields as varied as food technology, environmental chemistry, process control, medical diagnostics, and metabolomics. In the mid-1980s, cross-disciplinary interaction between genetics and epidemiology led to the emergence of genetic epidemiology as a new discipline. Chemometric techniques are extremely appropriate for, and have been widely applied to, this discipline. Here, we present a broad review of the application of chemometric techniques to the fields of genetic epidemiology and statistical genetics. We also consider some future directions. We focus on chemometrics-based regression methodologies in genetic association studies.
    TrAC Trends in Analytical Chemistry 10/2015; 74. DOI:10.1016/j.trac.2015.05.007 · 6.47 Impact Factor
  • Source
    • "In order to identify missense or nonsense (MS/NS) singlenucleotide variants (SNVs) associated with LVMHT we conducted a mixed model regression with LVMHT as the dependent variable controlling for kinship structures as well as age, sex, and weight as covariates using the kinship R package program LMEKIN (Lourenco et al., 2011). We used a false discovery rate (FDR) criterion of q-value <0.25 (P-value <0.00258) for significance; this is more flexible than the usual Bonferroni criterion given our small sample size (N = 21). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Rationale: Left ventricular hypertrophy (LVH) is a heritable predictor of cardiovascular disease, particularly in blacks. Objective: Determine the feasibility of combining evidence from two distinct but complimentary experimental approaches to identify novel genetic predictors of increased LV mass . Methods: Whole exome sequencing (WES) was conducted in 7 African American sibling trios ascertained on high average familial LV mass indexed to height (LVMHT). WES identified 31,426 missense or nonsense mutations (MS/NS) which were examined for association with LVMHT using linear mixed models adjusted for age, sex, body weight, and family relationship. To functionally assess WES findings, human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) were stimulated to induce hypertrophy; mRNA sequencing was used to determine expression differences associated with hypertrophy onset. Results: After correction for multiple testing, 295 MS/NS variants in 265 genes were associated with LVMHT. We identified 44 of 265 WES genes differentially expressed (P<0.05) in hypertrophied cells. To further prioritize these 44 candidates, 7 supportive statistical and annotation-based criteria were used to evaluate the relevance of these genes. Five genes, HLA-B, HTT, MTSS1, SLC5A12, THBS1, were each supported by 3 criteria. THBS1 encodes an adhesive glycoprotein that promotes matrix preservation in pressure-overload LVH and harbors conserved and predicted damaging variants. Conclusions: Combining evidence from cutting-edge genetic and cellular experiments can enable identification of novel LVH risk loci.
    Frontiers in Genetics 05/2012; 3:92. DOI:10.3389/fgene.2012.00092
  • [Show abstract] [Hide abstract]
    ABSTRACT: One recent study indicates a significant association between certain single nucleotide polymorphisms (SNPs) in the genomic sequence of feline p53 and feline injection-site sarcoma (FISS). The aim of this study was to investigate the correlation between a specific nucleotide insertion in p53 gene and FISS in a German cat population. Blood samples from 150 German cats were allocated to a control group consisting of 100 healthy cats and a FISS-group consisting of 50 cats with FISS. All blood samples were examined for the presence of the SNP in the p53 gene. Results found the T-insertion at SNP 3 in 20.0% of the cats in the FISS-group and 19.2% of cats in the control-group. No statistically significant difference was observed in allelic distribution between the two groups. Further investigations are necessary to determine the association of SNPs in the feline p53 gene and the occurrence of FISS.
    Veterinary and Comparative Oncology 08/2012; 9999(9999). DOI:10.1111/j.1476-5829.2012.00344.x · 2.73 Impact Factor
Show more