An integrated pharmacokinetics ontology and corpus for text mining

BMC Bioinformatics (Impact Factor: 2.58). 02/2013; 14(1):35. DOI: 10.1186/1471-2105-14-35
Source: PubMed


Drug pharmacokinetics parameters, drug interaction parameters, and pharmacogenetics data have been unevenly collected in different databases and published extensively in the literature. Without appropriate pharmacokinetics ontology and a well annotated pharmacokinetics corpus, it will be difficult to develop text mining tools for pharmacokinetics data collection from the literature and pharmacokinetics data integration from multiple databases.

A comprehensive pharmacokinetics ontology was constructed. It can annotate all aspects of in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. It covers all drug metabolism and transportation enzymes. Using our pharmacokinetics ontology, a PK-corpus was constructed to present four classes of pharmacokinetics abstracts: in vivo pharmacokinetics studies, in vivo pharmacogenetic studies, in vivo drug interaction studies, and in vitro drug interaction studies. A novel hierarchical three level annotation scheme was proposed and implemented to tag key terms, drug interaction sentences, and drug interaction pairs. The utility of the pharmacokinetics ontology was demonstrated by annotating three pharmacokinetics studies; and the utility of the PK-corpus was demonstrated by a drug interaction extraction text mining analysis.

The pharmacokinetics ontology annotates both in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. The PK-corpus is a highly valuable resource for the text mining of pharmacokinetics parameters and drug interactions.

Download full-text


Available from: Santosh Philips, Sep 16, 2015
47 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: Scientific communication in biomedicine is, by and large, still text based. Text mining technologies for the automated extraction of useful biomedical information from unstructured text that can be directly used for systems biology modeling have been substantially improved over the past few years. In this review, we underline the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology. Furthermore, we emphasize the role of publicly organized scientific benchmarking challenges that reflect the current status of text-mining technology and are important in moving the entire field forward. Given further interdisciplinary development of systems biology-orientated ontologies and training corpora, we expect a steadily increasing impact of text-mining technology on systems biology in the future.
    Drug discovery today 09/2013; 19(2). DOI:10.1016/j.drudis.2013.09.012 · 6.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.
    F1000 Research 04/2014; 3:96. DOI:10.12688/f1000research.3216.1
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules. Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools. The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks.
    Journal of Cheminformatics 03/2015; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. DOI:10.1186/1758-2946-7-S1-S6 · 4.55 Impact Factor
Show more