Association of genes to genetically inherited diseases using data mining. Nat Genet

Max-Delbrück-Centrum für Molekulare Medizin, Berlín, Berlin, Germany
Nature Genetics (Impact Factor: 29.35). 08/2002; 31(3):316-9. DOI: 10.1038/ng895
Source: PubMed


Although approximately one-quarter of the roughly 4,000 genetically inherited diseases currently recorded in respective databases (LocusLink, OMIM) are already linked to a region of the human genome, about 450 have no known associated gene. Finding disease-related genes requires laborious examination of hundreds of possible candidate genes (sometimes, these are not even annotated; see, for example, refs 3,4). The public availability of the human genome draft sequence has fostered new strategies to map molecular functional features of gene products to complex phenotypic descriptions, such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of genes using controlled vocabularies, we have developed a scoring system for the possible functional relationships of human genes to 455 genetically inherited diseases that have been mapped to chromosomal regions without assignment of a particular gene. In a benchmark of the system with 100 known disease-associated genes, the disease-associated gene was among the 8 best-scoring genes with a 25% chance, and among the best 30 genes with a 50% chance, showing that there is a relationship between the score of a gene and its likelihood of being associated with a particular disease. The scoring also indicates that for some diseases, the chance of identifying the underlying gene is higher.

Download full-text


Available from: Miguel Andrade
  • Source
    • "Concretely, most of these methods are based on similarities between the genomic data of known disease genes and the genomic data of the candidate gene. The genomic data include sequence-based features [7,8], gene ontology (GO) annotation information [9,10], expression patterns111213 , and protein interaction data [14,15]. In most cases, multiple sources of genomic data are combined to find causal genes, e.g., the combinations of GO annotation information with protein interaction data [16], GO annotation information with sequence-based features [17], and metabolic pathway data with protein interaction data [18]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015. Published by Elsevier Inc.
    Full-text · Article · Jul 2015 · Journal of Biomedical Informatics
  • Source
    • "Khái niệm về phân hạng gen được giới thiệu lần đầu tiên vào năm 2002 bởi Perez-Iratxeta và cộng sự [1]. Trong bài báo, Perez-Iratxeta và cộng sự đã mô tả phương pháp tiếp cận tính toán đầu tiên để giải quyết vấn đề này. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Disease gene prioritization is the process of ranking candidate genes according to their relevance to a disease phenotype, thus facilitating the identification of disease genes by narrowing down the set of genes to be tested experimentally. Many methods have been proposed for disease gene prioritization based on relationships between proteins encoded in protein-protein interaction networks using various graph-based algorithms. In this paper, we propose a novel method for prioritizing candidate disease genes by combining reinforcement learning with PageRank algorithm and assigning priors for known disease genes. We experimentally evaluate the proposed method on a human protein interaction network and compared its performance with a state-of-the-art methods, namely PageRank with priors, Random Walk with Restart and K-Step Markov. The experiment results show that our method achieves relatively high performance in terms of AUC values and outperforms comparative methods.
    Full-text · Article · Jun 2015
  • Source
    • "In this category some methods used a random walk or a heat kernel [19], while others applied Web and social networks methods on a protein–protein interaction (PPI) network [20], and other approaches exploited PPI and pathway information to prioritize candidate genes [21] [15]. Most gene prioritization methods exploited different sources of information and gene networks [22] [23], ranging from phenotypic similarities between diseases and functional similarity between genes [24], to GO ontology and InterPro domain annotations [25] and protein–protein interactions, gene expression and common membership to KEGG pathways [26], and also to several other sets of data sources [15] [27] [28] (see [22] for a more detailed presentation of the different combinations of sources of evidence exploited by recent disease genes prioritization methods). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. Materials and methods: We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. Results: The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Conclusions: Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
    Full-text · Article · Jun 2014 · Artificial intelligence in medicine
Show more