Biomine: predicting links between biological entities using network models of heterogeneous databases.

BMC Bioinformatics (Impact Factor: 2.67). 06/2012; 13(1):119. DOI: 10.1186/1471-2105-13-119
Source: PubMed

ABSTRACT BACKGROUND: Biological databases contain large amounts of data concerning the functions and associationsof genes and proteins. Integration of data from several such databases into a single repositorycan aid the discovery of previously unknown connections spanning multiple types ofrelationships and databases. RESULTS: Biomine is a system that integrates cross-references from several biological databases into agraph model with multiple types of edges, such as protein interactions, gene-diseaseassociations and gene ontology annotations. Edges are weighted based on their type,reliability, and informativeness. We present Biomine and evaluate its performance in linkprediction, where the goal is to predict pairs of nodes that will be connected in the future,based on current data. In particular, we formulate protein interaction prediction and diseasegene prioritization tasks as instances of link prediction. The predictions are based on aproximity measure computed on the integrated graph. We consider and experiment withseveral such measures, and perform a parameter optimization procedure where different edgetypes are weighted to optimize link prediction accuracy. We also propose a novel method fordisease-gene prioritization, defined as finding a subset of candidate genes that cluster togetherin the graph. We experimentally evaluate Biomine by predicting future annotations in thesource databases and prioritizing lists of putative disease genes. CONCLUSIONS: The experimental results show that Biomine has strong potential for predicting links when aset of selected candidate links is available. The predictions obtained using the entire Biominedataset are shown to clearly outperform ones obtained using any single source of data alone,when different types of links are suitably weighted. In the gene prioritization task, anestablished reference set of disease-associated genes is useful, but the results show that underfavorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and8.1 million relations between them, with focus on human genetics. Some of its functionalitiesare available in a public query interface at, allowing searching forand visualizing connections between given biological entities.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancer genome and transcriptome analyses advanced our understanding of cancer biology. We performed transcriptome analysis of all known genes of peptidases also called proteases and their endogenous inhibitors in glioblastoma multiforme (GBM), which is one of the most aggressive and deadly types of brain cancers, where unbalanced proteolysis is associated with tumor progression.
    PLoS ONE 10/2014; 9(10):e111819. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents an approach to the holistic analysis of transcriptomic data which integrates two state-of-the-art methodologies into a coherent framework. The aim of the proposed approach is to give insight into the discovered patterns, help explaining the observed phenomena, enable the creation of new research hypotheses and assist in design of new experiments. We have integrated a methodology for semantic analysis of transcriptomic data, a system for automated extraction of biological relations from the literature, and a number of supporting components. The approach is demonstrated and evaluated on a publicly available dataset from a clinical trial in acute lymphoblastic leukaemia and a document corpus of full-text articles from the PubMed Open Access Subset.
    2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Subgroup discovery (SD) methods can be used to find interesting subsets of objects of a given class. While subgroup describing rules are themselves good explanations of the subgroups, domain ontologies can provide additional descriptions to data and alternative explanations of the constructed rules. Such explanations in terms of higher level ontology concepts have the potential of providing new insights into the domain of investigation. We show that this additional explanatory power can be ensured by using recently developed semantic SD methods. We present a new approach to explaining subgroups through ontologies and demonstrate its utility on a motivational use case and on a gene expression profiling use case where groups of patients, identified through SD in terms of gene expression, are further explained through concepts from the Gene Ontology and KEGG orthology. We qualitatively compare the methodology with the supporting factors technique for characterizing subgroups. The developed tools are implemented within a new browser-based data mining platform ClowdFlows.
    Journal of Intelligent Information Systems 04/2013; · 0.63 Impact Factor


1 Download
Available from