FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease

Stanford Center for Biomedical Informatics Research, 251 Cmpus Drive, Stanford, CA 94305, USA.
Genome biology (Impact Factor: 10.47). 01/2009; 9(12):R170. DOI: 10.1186/gb-2008-9-12-r170
Source: PubMed

ABSTRACT Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs.
We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis.
Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.


Available from: Keiichi Kodama, Jun 14, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression and disease-associated variants are often used to prioritize candidate genes for target validation. However, the success of these gene features alone or in combination in the discovery of therapeutic targets is uncertain. Here we evaluated the effectiveness of the differential expression (DE), the disease-associated single nucleotide polymorphisms (SNPs) and the combination of the two in recovering and predicting known therapeutic targets across 56 human diseases. We demonstrate that the performance of each feature varies across diseases and generally the features have more recovery power than predictive power. The combination of the two features, however, has significantly higher predictive power than each feature alone. Our study provides a systematic evaluation of two common gene features, DE and SNPs, for prioritization of candidate targets and identified an improved predictive power of coupling these two features.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 01/2015; 20:383-94.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website
    BMC Genomics 11/2010; 11 Suppl 2(Suppl 2):S3. DOI:10.1186/1471-2164-11-S2-S3 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the recent decade, high-throughput genotyping and next-generation sequencing platforms have enabled genome-wide association studies (GWAS) of many complex human diseases. These studies have discovered many disease susceptible loci, and unveiled unexpected disease mechanisms. Despite these successes, these identified variants only explain a small proportion of the genetic contributions to these diseases and many more remain to be found. This is largely due to the small effect sizes of most disease-associated variants and limited sample size. As a result, it is critical to leverage other information to more effectively prioritize GWAS signals to increase replication rates and better understand disease mechanisms. In this review, we introduce the biological/genomic features that have been found to be informative for post-GWAS prioritization, and discuss available tools to utilize these features for prioritization.
    Frontiers in Genetics 12/2013; 4:280. DOI:10.3389/fgene.2013.00280