Biomine: predicting links between biological entities using network models of heterogeneous databases.

BMC Bioinformatics (Impact Factor: 3.02). 06/2012; 13(1):119. DOI: 10.1186/1471-2105-13-119
Source: PubMed

ABSTRACT BACKGROUND: Biological databases contain large amounts of data concerning the functions and associationsof genes and proteins. Integration of data from several such databases into a single repositorycan aid the discovery of previously unknown connections spanning multiple types ofrelationships and databases. RESULTS: Biomine is a system that integrates cross-references from several biological databases into agraph model with multiple types of edges, such as protein interactions, gene-diseaseassociations and gene ontology annotations. Edges are weighted based on their type,reliability, and informativeness. We present Biomine and evaluate its performance in linkprediction, where the goal is to predict pairs of nodes that will be connected in the future,based on current data. In particular, we formulate protein interaction prediction and diseasegene prioritization tasks as instances of link prediction. The predictions are based on aproximity measure computed on the integrated graph. We consider and experiment withseveral such measures, and perform a parameter optimization procedure where different edgetypes are weighted to optimize link prediction accuracy. We also propose a novel method fordisease-gene prioritization, defined as finding a subset of candidate genes that cluster togetherin the graph. We experimentally evaluate Biomine by predicting future annotations in thesource databases and prioritizing lists of putative disease genes. CONCLUSIONS: The experimental results show that Biomine has strong potential for predicting links when aset of selected candidate links is available. The predictions obtained using the entire Biominedataset are shown to clearly outperform ones obtained using any single source of data alone,when different types of links are suitably weighted. In the gene prioritization task, anestablished reference set of disease-associated genes is useful, but the results show that underfavorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and8.1 million relations between them, with focus on human genetics. Some of its functionalitiesare available in a public query interface at, allowing searching forand visualizing connections between given biological entities.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancer genome and transcriptome analyses advanced our understanding of cancer biology. We performed transcriptome analysis of all known genes of peptidases also called proteases and their endogenous inhibitors in glioblastoma multiforme (GBM), which is one of the most aggressive and deadly types of brain cancers, where unbalanced proteolysis is associated with tumor progression.
    PLoS ONE 01/2014; 9(10):e111819. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: GoMapMan ( is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL:
    Database The Journal of Biological Databases and Curation 01/2013; 2013:bat076. · 4.20 Impact Factor


1 Download
Available from