Association of genes to genetically inherited diseases using data mining

European Molecular Biology Laboratory, Meyerhofstr.1, Heidelberg 69012, Germany.
Nature Genetics (Impact Factor: 29.65). 08/2002; 31(3):316-9. DOI: 10.1038/ng895
Source: PubMed

ABSTRACT Although approximately one-quarter of the roughly 4,000 genetically inherited diseases currently recorded in respective databases (LocusLink, OMIM) are already linked to a region of the human genome, about 450 have no known associated gene. Finding disease-related genes requires laborious examination of hundreds of possible candidate genes (sometimes, these are not even annotated; see, for example, refs 3,4). The public availability of the human genome draft sequence has fostered new strategies to map molecular functional features of gene products to complex phenotypic descriptions, such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of genes using controlled vocabularies, we have developed a scoring system for the possible functional relationships of human genes to 455 genetically inherited diseases that have been mapped to chromosomal regions without assignment of a particular gene. In a benchmark of the system with 100 known disease-associated genes, the disease-associated gene was among the 8 best-scoring genes with a 25% chance, and among the best 30 genes with a 50% chance, showing that there is a relationship between the score of a gene and its likelihood of being associated with a particular disease. The scoring also indicates that for some diseases, the chance of identifying the underlying gene is higher.

Download full-text


Available from: Miguel Andrade, Jul 06, 2015
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the context of “network medicine”, gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization.
    Artificial intelligence in medicine 06/2014; 61(2). DOI:10.1016/j.artmed.2014.03.003 · 1.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human physiology is an ensemble of various biological processes spanning from intracellular molecular interactions to the whole body phenotypic response. Systems biology endures to decipher these multi-scale biological networks and bridge the link between genotype to phenotype. The structure and dynamic properties of these networks are responsible for controlling and deciding the phenotypic state of a cell. Several cells and various tissues coordinate together to generate an organ level response which further regulates the ultimate physiological state. The overall network embeds a hierarchical regulatory structure, which when unusually perturbed can lead to undesirable physiological state termed as disease. Here, we treat a disease diagnosis problem analogous to a fault diagnosis problem in engineering systems. Accordingly we review the application of engineering methodologies to address human diseases from systems biological perspective. The review highlights potential networks and modeling approaches used for analyzing human diseases. The application of such analysis is illustrated in the case of cancer and diabetes. We put forth a concept of cell-to-human framework comprising of five modules (data mining, networking, modeling, experimental and validation) for addressing human physiology and diseases based on a paradigm of system level analysis. The review overtly emphasizes on the importance of multi-scale biological networks and subsequent modeling and analysis for drug target identification and designing efficient therapies.
    Systems and Synthetic Biology 03/2014; 8(1):99-116. DOI:10.1007/s11693-013-9125-3
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of cancer associated proteins is the crucial problem in cancer research. Recently various techniques have been developed to discover novel cancer genes/proteins. Topological network of protein-protein interaction with their gene ontology annotation are good predictors of cancer proteins. Protein-protein interaction information has provided a basis for studying the cancer cellular network. In this study, we implemented clique percolation clustering approach on lung cancer protein-protein interaction information to identify cancer associated proteins, the enriched protein biological function in molecular networks of the clique motif and also the enriched KEGG pathways were observed.
    Communications and Information Technologies (ISCIT), 2013 13th International Symposium on; 01/2013