Genetic Control of Human Brain Transcript Expression in Alzheimer Disease

Neurogenomics Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA.
The American Journal of Human Genetics (Impact Factor: 10.99). 05/2009; 84(4):445-58. DOI: 10.1016/j.ajhg.2009.03.011
Source: PubMed

ABSTRACT We recently surveyed the relationship between the human brain transcriptome and genome in a series of neuropathologically normal postmortem samples. We have now analyzed additional samples with a confirmed pathologic diagnosis of late-onset Alzheimer disease (LOAD; final n = 188 controls, 176 cases). Nine percent of the cortical transcripts that we analyzed had expression profiles correlated with their genotypes in the combined cohort, and approximately 5% of transcripts had SNP-transcript relationships that could distinguish LOAD samples. Two of these transcripts have been previously implicated in LOAD candidate-gene SNP-expression screens. This study shows how the relationship between common inherited genetic variants and brain transcript expression can be used in the study of human brain disorders. We suggest that studying the transcriptome as a quantitative endo-phenotype has greater power for discovering risk SNPs influencing expression than the use of discrete diagnostic categories such as presence or absence of disease.


Available from: Amanda J Myers, Jun 15, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Previous studies have indicated a heritable component of the etiology of neurodegenerative diseases such as Alzheimer disease (AD), frontotemporal dementia (FTD), and progressive supranuclear palsy (PSP). However, few have examined the contribution of low-frequency coding variants on a genome-wide level. To identify low-frequency coding variants that affect susceptibility to AD, FTD, and PSP. We used the Illumina HumanExome BeadChip array to genotype a large number of variants (most of which are low-frequency coding variants) in a cohort of patients with neurodegenerative disease (224 with AD, 168 with FTD, and 48 with PSP) and in 224 control individuals without dementia enrolled between 2005-2012 from multiple centers participating in the Genetic Investigation in Frontotemporal Dementia and Alzheimer's Disease (GIFT) Study. An additional multiancestral replication cohort of 240 patients with AD and 240 controls without dementia was used to validate suggestive findings. Variant-level association testing and gene-based testing were performed. Statistical association of genetic variants with clinical diagnosis of AD, FTD, and PSP. Genetic variants typed by the exome array explained 44%, 53%, and 57% of the total phenotypic variance of AD, FTD, and PSP, respectively. An association with the known AD gene ABCA7 was replicated in several ancestries (discovery P = .0049, European P = .041, African American P = .043, and Asian P = .027), suggesting that exonic variants within this gene modify AD susceptibility. In addition, 2 suggestive candidate genes, DYSF (P = 5.53 × 10-5) and PAXIP1 (P = 2.26 × 10-4), were highlighted in patients with AD and differentially expressed in AD brain. Corroborating evidence from other exome array studies and gene expression data points toward potential involvement of these genes in the pathogenesis of AD. Low-frequency coding variants with intermediate effect size may account for a significant fraction of the genetic susceptibility to AD and FTD. Furthermore, we found evidence that coding variants in the known susceptibility gene ABCA7, as well as candidate genes DYSF and PAXIP1, confer risk for AD.
    JAMA Neurology 02/2015; DOI:10.1001/jamaneurol.2014.4040 · 7.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
    BMC Genomics 10/2014; 16 Suppl 2. DOI:10.1186/1471-2164-16-S2-S5 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six `desiderata'. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy.
    06/2010; 5(2). DOI:10.1214/10-BA604