FaST Linear Mixed Models for Genome-Wide Association Studies

Microsoft Research, Los Angeles, California, USA.
Nature Methods (Impact Factor: 32.07). 09/2011; 8(10):833-5. DOI: 10.1038/nmeth.1681
Source: PubMed


We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (

Download full-text


Available from: David Heckerman, Dec 08, 2014
  • Source
    • "Combining several SNPs in a region into a single indicator variable as a composite genotype can reduce the detection of rare variants[44]. The use of mixed models have also minimized the detection of false positive associations by accounting for the resultant phenotypic covariance that is due to genetic relatedness[45,46]. The success of GWAS in detecting genes of agronomic importance such as grain quality, grain yield, morphology, stress tolerance, and nutritional quality in rice, have demonstrated its usefulness in identifying more genome-wide genes contributing to seed dormancy in rice47484950. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Seed dormancy is an adaptive trait employed by flowering plants to avoid harsh environmental conditions for the continuity of their next generations. In cereal crops, moderate seed dormancy could help prevent pre-harvest sprouting and improve grain yield and quality. We performed a genome wide association study (GWAS) for dormancy, based on seed germination percentage (GP) in freshly harvested seeds (FHS) and after-ripened seeds (ARS) in 350 worldwide accessions that were characterized with strong population structure of indica, japonica and Aus subpopulations. Results The germination tests revealed that Aus and indica rice had stronger seed dormancy than japonica rice in FHS. Association analysis revealed 16 loci significantly associated with GP in FHS and 38 in ARS. Three out of the 38 loci detected in ARS were also detected in FHS and 13 of the ARS loci were detected near previously mapped dormancy QTL. In FHS, three of the association loci were located within 100 kb around previously cloned GA/IAA inactivation genes such as GA2ox3, EUI1 and GH3-2 and one near dormancy gene, Sdr4. In ARS, an association signal was detected near ABA signaling gene ABI5. No association peaks were commonly detected among the sub-populations in FHS and only one association peak was detected in both indica and japonica populations in ARS. Sdr4 and GA2OX3 haplotype analysis showed that Aus and indica II (IndII) varieties had stronger dormancy alleles whereas indica I (IndI) and japonica had weak or non-dormancy alleles. Conclusion The association study and haplotype analysis together, indicate an involvement of independent genes and alleles contributing towards regulation and natural variation of seed dormancy among the rice sub-populations. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0340-2) contains supplementary material, which is available to authorized users.
    Full-text · Article · Dec 2016 · BMC Genetics
  • Source
    • "Compared with GLM, MLM is much more computing intensive. Many algorithms have been developed to reduce the computational burden, including EMMA[15] (Efficient Mixed- Model Association), EMMAX[16] (EMMA eXpedited), P3D[17] (Population Parameters Previously Determined), GEMMA[18] (Genome-Wide Efficient Mixed-Model Association), FaST-LMM[19] (Factored Spectrally Transformed Linear Mixed Model), and GRAMMAR- Gamma[20] (fast variance components-based two-step method). However, the statistical power of these algorithms remains the same as the regular MLM. "
    [Show abstract] [Hide abstract]
    ABSTRACT: False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.
    Preview · Article · Feb 2016 · PLoS Genetics
  • Source
    • "The SNPs with minor allele frequencies of ≥ 0.05 and the varieties with minor allele frequencies of ≥ 6 in a population were used. LMM was used for association analysis by running the FaST-LMM program (Lippert et al. 2011). Using a method described by Li et al. (2012), the effective number of independent SNPs was calculated as 757,578, 571,843 and 245,348 for the whole population, indica and japonica, respectively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Awn is one of the most important domesticated traits in rice (Oryza sativa). Understanding the genetic basis of awn length is important for grain harvest and production because long awn length is disadvantageous for both grain harvest and milling. We investigated the awn length of 529 rice cultivars and performed a genomewide association studies (GWAS) in the indica and japonica subpopulation, and the whole population. In total, we found 17 loci associated with awn length. Of these loci, seven were linked to previously reported quantitative trait loci (QTL), and one was linked to the awn gene An-1. Nine novel loci were repeatedly identified in different environments. One of the nine associations was identified in both the whole and japonica populations. Of special interest was the detection of the most significant association SNP, sf0136352825, which was less than 95 kb from the seed shattering gene qSH1. These results may provide potentially favorable haplotypes for molecular breeding in rice.
    Full-text · Article · Jan 2016
Show more