Association of genes to genetically inherited diseases using data mining. Nat Genet

European Molecular Biology Laboratory, Meyerhofstr.1, Heidelberg 69012, Germany.
Nature Genetics (Impact Factor: 29.35). 08/2002; 31(3):316-9. DOI: 10.1038/ng895
Source: PubMed


Although approximately one-quarter of the roughly 4,000 genetically inherited diseases currently recorded in respective databases (LocusLink, OMIM) are already linked to a region of the human genome, about 450 have no known associated gene. Finding disease-related genes requires laborious examination of hundreds of possible candidate genes (sometimes, these are not even annotated; see, for example, refs 3,4). The public availability of the human genome draft sequence has fostered new strategies to map molecular functional features of gene products to complex phenotypic descriptions, such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of genes using controlled vocabularies, we have developed a scoring system for the possible functional relationships of human genes to 455 genetically inherited diseases that have been mapped to chromosomal regions without assignment of a particular gene. In a benchmark of the system with 100 known disease-associated genes, the disease-associated gene was among the 8 best-scoring genes with a 25% chance, and among the best 30 genes with a 50% chance, showing that there is a relationship between the score of a gene and its likelihood of being associated with a particular disease. The scoring also indicates that for some diseases, the chance of identifying the underlying gene is higher.

Download full-text


Available from: Miguel Andrade,
  • Source
    • "Khái niệm về phân hạng gen được giới thiệu lần đầu tiên vào năm 2002 bởi Perez-Iratxeta và cộng sự [1]. Trong bài báo, Perez-Iratxeta và cộng sự đã mô tả phương pháp tiếp cận tính toán đầu tiên để giải quyết vấn đề này. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Disease gene prioritization is the process of ranking candidate genes according to their relevance to a disease phenotype, thus facilitating the identification of disease genes by narrowing down the set of genes to be tested experimentally. Many methods have been proposed for disease gene prioritization based on relationships between proteins encoded in protein-protein interaction networks using various graph-based algorithms. In this paper, we propose a novel method for prioritizing candidate disease genes by combining reinforcement learning with PageRank algorithm and assigning priors for known disease genes. We experimentally evaluate the proposed method on a human protein interaction network and compared its performance with a state-of-the-art methods, namely PageRank with priors, Random Walk with Restart and K-Step Markov. The experiment results show that our method achieves relatively high performance in terms of AUC values and outperforms comparative methods.
  • Source
    • "In this category some methods used a random walk or a heat kernel [19], while others applied Web and social networks methods on a protein–protein interaction (PPI) network [20], and other approaches exploited PPI and pathway information to prioritize candidate genes [21] [15]. Most gene prioritization methods exploited different sources of information and gene networks [22] [23], ranging from phenotypic similarities between diseases and functional similarity between genes [24], to GO ontology and InterPro domain annotations [25] and protein–protein interactions, gene expression and common membership to KEGG pathways [26], and also to several other sets of data sources [15] [27] [28] (see [22] for a more detailed presentation of the different combinations of sources of evidence exploited by recent disease genes prioritization methods). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. Materials and methods: We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. Results: The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Conclusions: Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
    Artificial intelligence in medicine 06/2014; 61(2). DOI:10.1016/j.artmed.2014.03.003 · 2.02 Impact Factor
  • Source
    • "The data and information regarding such molecular interaction and their regulation from molecular to organism level is called as interactome, (Tiffin et al. 1980; Botstein et al. 1980; Perez-Iratxeta et al. 2002; Macé et al. 2005; Sam et al. 2007). The analysis of the interactome enables to obtain insights into the link between genotype and phenotype in terms of bio-molecular interactions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Human physiology is an ensemble of various biological processes spanning from intracellular molecular interactions to the whole body phenotypic response. Systems biology endures to decipher these multi-scale biological networks and bridge the link between genotype to phenotype. The structure and dynamic properties of these networks are responsible for controlling and deciding the phenotypic state of a cell. Several cells and various tissues coordinate together to generate an organ level response which further regulates the ultimate physiological state. The overall network embeds a hierarchical regulatory structure, which when unusually perturbed can lead to undesirable physiological state termed as disease. Here, we treat a disease diagnosis problem analogous to a fault diagnosis problem in engineering systems. Accordingly we review the application of engineering methodologies to address human diseases from systems biological perspective. The review highlights potential networks and modeling approaches used for analyzing human diseases. The application of such analysis is illustrated in the case of cancer and diabetes. We put forth a concept of cell-to-human framework comprising of five modules (data mining, networking, modeling, experimental and validation) for addressing human physiology and diseases based on a paradigm of system level analysis. The review overtly emphasizes on the importance of multi-scale biological networks and subsequent modeling and analysis for drug target identification and designing efficient therapies.
    Systems and Synthetic Biology 03/2014; 8(1):99-116. DOI:10.1007/s11693-013-9125-3
Show more