Article

Biomine: predicting links between biological entities using network models of heterogeneous databases.

BMC Bioinformatics (Impact Factor: 3.02). 06/2012; 13(1):119. DOI: 10.1186/1471-2105-13-119
Source: PubMed

ABSTRACT BACKGROUND: Biological databases contain large amounts of data concerning the functions and associationsof genes and proteins. Integration of data from several such databases into a single repositorycan aid the discovery of previously unknown connections spanning multiple types ofrelationships and databases. RESULTS: Biomine is a system that integrates cross-references from several biological databases into agraph model with multiple types of edges, such as protein interactions, gene-diseaseassociations and gene ontology annotations. Edges are weighted based on their type,reliability, and informativeness. We present Biomine and evaluate its performance in linkprediction, where the goal is to predict pairs of nodes that will be connected in the future,based on current data. In particular, we formulate protein interaction prediction and diseasegene prioritization tasks as instances of link prediction. The predictions are based on aproximity measure computed on the integrated graph. We consider and experiment withseveral such measures, and perform a parameter optimization procedure where different edgetypes are weighted to optimize link prediction accuracy. We also propose a novel method fordisease-gene prioritization, defined as finding a subset of candidate genes that cluster togetherin the graph. We experimentally evaluate Biomine by predicting future annotations in thesource databases and prioritizing lists of putative disease genes. CONCLUSIONS: The experimental results show that Biomine has strong potential for predicting links when aset of selected candidate links is available. The predictions obtained using the entire Biominedataset are shown to clearly outperform ones obtained using any single source of data alone,when different types of links are suitably weighted. In the gene prioritization task, anestablished reference set of disease-associated genes is useful, but the results show that underfavorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and8.1 million relations between them, with focus on human genetics. Some of its functionalitiesare available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching forand visualizing connections between given biological entities.

0 Bookmarks
 · 
99 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL: http://ikmbio.csie.ncku.edu.tw/coin/home.php.
    Database The Journal of Biological Databases and Curation 01/2013; 2013:bat076. · 4.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only gives a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central node/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
    Pharmacology [?] Therapeutics 02/2013; · 7.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor

Full-text

View
1 Download
Available from