Discovering patterns to extract protein-protein interactions from full texts.

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China.
Bioinformatics (Impact Factor: 5.32). 12/2004; 20(18):3604-12. DOI: 10.1093/bioinformatics/bth451
Source: PubMed

ABSTRACT Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts.
We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.
The program is available on request from the authors.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events. While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss challenges that we are facing with complex relation extraction and outline possible solutions and future directions.
    Computational and Mathematical Methods in Medicine 01/2014; 2014:298473. · 0.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The biomedical literature represents a rich source of biomarker information. However, both the size of literature databases and their lack of standardization hamper the automatic exploitation of the information contained in these resources. Text mining approaches have proven to be useful for the exploitation of information contained in the scientific publications. Here, we show that a knowledge-driven text mining approach can exploit a large literature database to extract a dataset of biomarkers related to diseases covering all therapeutic areas. Our methodology takes advantage of the annotation of MEDLINE publications pertaining to biomarkers with MeSH terms, narrowing the search to specific publications and, therefore, minimizing the false positive ratio. It is based on a dictionary-based named entity recognition system and a relation extraction module. The application of this methodology resulted in the identification of 131,012 disease-biomarker associations between 2,803 genes and 2,751 diseases, and represents a valuable knowledge base for those interested in disease-related biomarkers. Additionally, we present a bibliometric analysis of the journals reporting biomarker related information during the last 40 years.
    BioMed Research International 01/2014; 2014:253128. · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional text-based image retrieval does not fully support queries on semantic relationships between two entities. To better help the exploratory search on image collections, this paper presents a system for automatically extracting the relations between entities by analyzing the sentence dependency on the descriptions of the images. Our results demonstrate that using the extracted relations is not only beneficial for understanding the data set but also an effective way to facilitate users' exploratory searches.
    Proceedings of the American Society for Information Science and Technology 01/2011; 48(1):1-4.

Full-text (2 Sources)

Available from
May 23, 2014