Discovering patterns to extract protein-protein interactions from full texts

University of Waterloo, Ватерлоо, Ontario, Canada
Bioinformatics (Impact Factor: 4.62). 12/2004; 20(18):3604-12. DOI: 10.1093/bioinformatics/bth451
Source: PubMed

ABSTRACT Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts.
We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.
The program is available on request from the authors.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, Change Making Problem (CMP) and Subset-Sum Problem (SSP), which can arise, in practice, in some classes of one dimensional cargo loading and cutting stock problems, are researched. These problems are often used in computer science, as well. CMP and SSP are NP-hard problems and these problems can be seen as types of the knapsack problem in some ways. The complementary problems for the change making problem and the subset-sum problem are defined in this study, and it is aimed to examine the CMP and SSP by means of the complementary problems.
    Third International Conference on Computational Science, Engineering and Information Technology, Konya Turkey; 06/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein–protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools.
    Methods 10/2014; 74. DOI:10.1016/j.ymeth.2014.10.026 · 3.22 Impact Factor
  • Source

Full-text (2 Sources)

Available from
May 23, 2014