An ontology-based search engine for protein-protein interactions

School of Computer Science and Engineering, Inha University, Incheon 402-751, South Korea.
BMC Bioinformatics (Impact Factor: 2.58). 01/2010; 11 Suppl 1(Suppl 1):S23. DOI: 10.1186/1471-2105-11-S1-S23
Source: PubMed Central


Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database.
We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions.
Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.

Download full-text


Available from: Kyungsook Han, Dec 10, 2014
  • Source
    • "Fax: +34 868884151. 2006), which is focused on the public administration domain, semantic search engines (Byungkyu and Kyungsook, 2010; Ding et al., 2004), and question-answering systems (Heinemann, 2010; Vargas-Vera and Lytras, 2010), to name but a few. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of valid terms in any domain is fundamental to its computerization. For this reason, in this paper we present a method for obtaining automated morphosyntactic patterns, which will help researchers obtain valid terms from the proposed patterns, in order to build quality ontologies for the translation from one language to another, or to find relevant terms in short sentences, which can be used as parameters in question-answer systems. For this purpose, we use some statistical methods which show candidates in a pattern vector. Then, a heuristic process unfolds to refine the pattern vector obtained, basing on two main parameters: the statistical results previously obtained and the length of the pattern analyzed. As a result, we obtain the collection of the best patterns for the detection of real multiword terms.
    Preview · Article ·
  • [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of new protein-protein interactions is important to the discovery of the currently unknown function of various biological pathways. In addition, many databases of protein-protein interactions contain different types of interactions, including protein associations, physical protein associations and direct protein interactions. There are only a few studies that consider the issues inherent to the prediction of direct protein-protein interactions, that is, interactions between proteins that are actually in direct physical contact and are listed in known protein interaction databases. Predicting these interactions is a crucial and challenging task. Therefore, it is increasingly important to discover not only protein associations but also direct interactions. Many studies have predicted protein-protein interactions directly, by using biological features such as Gene Ontology (GO) functions and protein structural domains of two proteins with unknown interactions. In this article, we proposed an augmented transitive relationships predictor (ATRP), a new method of predicting potential direct protein-protein interactions by using transitive relationships and annotations of protein interactions. Our results demonstrate that ATRP can effectively predict unknown direct protein-protein interactions from existing protein interaction relationships. The average accuracy of this method outperformed GO-based prediction methods by a factor ranging from 28% to 62%.
    No preview · Conference Paper · Aug 2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new search engine called PPISearchEngine which finds protein-protein interactions (PPIs) using the gene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a prime number and the relation between the terms is represented by the product of prime numbers. This representation is hidden from users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number that represents the query protein. For a query protein, PPISearchEngine considers not only the GO term associated with the query protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions of the query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching search method cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To the best of our knowledge, this search engine is the first method that can process queries like 'for protein p with GO [Formula: see text], find p's interaction partners with GO [Formula: see text]'. PPISearchEngine is freely available to academics at .
    No preview · Article · Feb 2012 · Computer Methods in Biomechanics and Biomedical Engineering
Show more