FitSNPs: Highly differentially expressed genes are more likely to have variants associated with disease

Stanford Center for Biomedical Informatics Research, 251 Cmpus Drive, Stanford, CA 94305, USA.
Genome biology (Impact Factor: 10.81). 01/2009; 9(12):R170. DOI: 10.1186/gb-2008-9-12-r170
Source: PubMed


Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs.
We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis.
Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.

Download full-text


Available from: Keiichi Kodama
  • Source
    • "The mean FitSNPs DER of the 116 genes was 0.547, which was ß10% higher than the mean DER of the rest of the genome (DER = 0.505 for the genome based on 19,876 genes in the database). Moreover, the precision for rediscovering disease-related genes reportedly increases with increasing DER [Chen et al., 2008]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The involvement of SNPs in miRNA target sites remains poorly investigated in neurodegenerative disease. In addition to associations with disease risk, such genetic variations can also provide novel insight into mechanistic pathways that may be responsible for disease etiology and/or pathobiology. To identify SNPs associated specifically with degenerating neurons we restricted our analysis to genes that are dysregulated in CA1 hippocampal neurons of mice during early, pre-clinical phase of Prion disease. The 125 genes chosen are also implicated in numerous other degenerative and neurological diseases and disorders and are therefore likely to be of fundamental importance. We predicated those SNPs that could increase, decrease or have neutral effects on miRNA binding. This group of genes were more likely to possess DNA variants than were genes chosen at random. Furthermore, many of the SNPs are common within the human population, and could contribute to the growing awareness that miRNAs and associated SNPs could account for detrimental neurological states. Interestingly, SNPs that overlapped miRNA binding sites in the 3’ UTR of GABA-receptor subunit coding genes were particularly enriched. Moreover, we demonstrated that SNP rs9291296 would strengthen miR-26a-5p binding to a highly conserved site in the 3’UTR of GABRα4.This article is protected by copyright. All rights reserved
    Full-text · Article · Oct 2014 · Human Mutation
  • Source
    • "Commonalities were observed in immune-, DC-, NK-cells, TLR-and SLE related signaling pathways, along with evidence for a Th1/Th2 related T-cell signature and a lack of Th17 associated genes in cutaneous disease , strengthening the likely importance of these findings. Chen et al. [70] demonstrated that DEGs are likely to harbor disease causing variants and proposed using gene expression data to inform and prioritize targeting candidate genes for single nucleotide polymorphism (SNP) in genome wide association or linkage studies. In this study, functional annotations of the 176 DEGs that mapped to the 13 CCLE transcriptional 'hot spots' in the genome are prominently related to immune response, classical and alternate complement pathways, HMGB1/ TLR signaling, NK cell mediated response, defense response, leukocyte chemotaxis and cell adhesion, in addition to interferon signaling and apoptosis/survival pathways. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Lupus Erythematosus is a heterogeneous autoimmune condition affecting multiple organs including skin, which remains poorly understood. To investigate pathogenetic processes relevant to cutaneous lupus as compared to systemic disease, we generated genome-wide expression data from lesional and non-lesional skin of chronic cutaneous LE (CCLE) patients. We reveal LE skin-associated transcriptional profiles and identify prominent functional pathways. A subset of CCLE differentially expressed genes (DEGs) was found to overlap with systemic lupus, including those linked to interferon and apoptosis. We identified 13 skin associated transcriptional "hot spots" that represent activated chromosomal regions. Seventeen CCLE DEGs (eight within "hot spots") were found to overlap with previously reported SLE-associated susceptibility loci. Additionally, we identify chromosomal regions not previously associated with lupus, potentially harboring distinct susceptibility loci for CCLE. This study suggests that overlapping as well as distinct genetic factors underlie disease pathogenesis in systemic and cutaneous lupus.
    Full-text · Article · Jun 2014 · Genomics
  • Source
    • "FitSNPs calculated a differential expression ratio for all genes in the genome, and prioritize SNPs by the differential expression ratio of their associated genes (Chen et al., 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the recent decade, high-throughput genotyping and next-generation sequencing platforms have enabled genome-wide association studies (GWAS) of many complex human diseases. These studies have discovered many disease susceptible loci, and unveiled unexpected disease mechanisms. Despite these successes, these identified variants only explain a small proportion of the genetic contributions to these diseases and many more remain to be found. This is largely due to the small effect sizes of most disease-associated variants and limited sample size. As a result, it is critical to leverage other information to more effectively prioritize GWAS signals to increase replication rates and better understand disease mechanisms. In this review, we introduce the biological/genomic features that have been found to be informative for post-GWAS prioritization, and discuss available tools to utilize these features for prioritization.
    Full-text · Article · Dec 2013 · Frontiers in Genetics
Show more