Article

Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts.

Biomedical Informatics Training Program, Stanford Medical Informatics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2006;
Source: PubMed

ABSTRACT Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence,overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, majority of RCT abstracts still remain unstructured.We have developed a novel automated approach to structure RCT abstracts by combining text classification and Hidden Markov Modeling(HMM) techniques. Results (precision: 0.98, recall: 0.99) of our approach significantly outperform previously reported work on automated categorization of sentences in RCT abstracts.

Download full-text

Full-text

Available from: Yang Huang, Dec 27, 2013
0 Followers
 · 
71 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Information overload is a significant problem facing online clinical trial searchers. We present eTACTS, a novel interactive retrieval framework using common eligibility tags to dynamically filter clinical trial search results. Materials and Methods eTACTS mines frequent eligibility tags from free-text clinical trial eligibility criteria and uses these tags for trial indexing. After an initial search, eTACTS presents to the user a tag cloud representing the current results. When the user selects a tag, eTACTS retains only those trials containing that tag in their eligibility criteria and generates a new cloud based on tag frequency and co-occurrences in the remaining trials. The user can then select a new tag or unselect a previous tag. The process iterates until a manageable number of trials is returned. We evaluated eTACTS in terms of filtering efficiency, diversity of the search results, and user eligibility to the filtered trials using both qualitative and quantitative methods. Results eTACTS (1) rapidly reduced search results from over a thousand trials to ten; (2) highlighted trials that are generally not top-ranked by conventional search engines; and (3) retrieved a greater number of suitable trials than existing search engines. Discussion eTACTS enables intuitive clinical trial searches by indexing eligibility criteria with effective tags. User evaluation was limited to one case study and a small group of evaluators due to the long duration of the experiment. Although a larger-scale evaluation could be conducted, this feasibility study demonstrated significant advantages of eTACTS over existing clinical trial search engines. Conclusion A dynamic eligibility tag cloud can potentially enhance state-of-the-art clinical trial search engines by allowing intuitive and efficient filtering of the search result space.
    Journal of Biomedical Informatics 08/2013; 46(6). DOI:10.1016/j.jbi.2013.07.014 · 2.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent progress in high-throughput genomic technologies has shifted pharmacogenomic research from candidate gene pharmacogenetics to clinical pharmacogenomics (PGx). Many clinical related questions may be asked such as 'what drug should be prescribed for a patient with mutant alleles?' Typically, answers to such questions can be found in publications mentioning the relationships of the gene-drug-disease of interest. In this work, we hypothesize that ClinicalTrials.gov is a comparable source rich in PGx related information. In this regard, we developed a systematic approach to automatically identify PGx relationships between genes, drugs and diseases from trial records in ClinicalTrials.gov. In our evaluation, we found that our extracted relationships overlap significantly with the curated factual knowledge through the literature in a PGx database and that most relationships appear on average 5years earlier in clinical trials than in their corresponding publications, suggesting that clinical trials may be valuable for both validating known and capturing new PGx related information in a more timely manner. Furthermore, two human reviewers judged a portion of computer-generated relationships and found an overall accuracy of 74% for our text-mining approach. This work has practical implications in enriching our existing knowledge on PGx gene-drug-disease relationships as well as suggesting crosslinks between ClinicalTrials.gov and other PGx knowledge bases.
    Journal of Biomedical Informatics 04/2012; 45(5):870-8. DOI:10.1016/j.jbi.2012.04.005 · 2.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Natural language processing has been successfully leveraged to extract patient information from unstructured clinical text. However the majority of the existing work targets at obtaining a specific category of clinical information through individual efforts. In the midst of the Health 2.0 wave, online health forums increasingly host abundant and diverse health-related information regarding the demographics and medical information of patients who are either actively participating in or passively reported at these forums. The potential categories of such information span a wide spectrum, whose extraction requires a systematic and comprehensive approach beyond the traditional isolated efforts that specialize in harvesting information of single categories. In this paper, we develop a new integrated biomedical NLP pipeline that automatically extracts a comprehensive set of patient demographics and medical information from online health forums. The pipeline can be adopted to construct structured personal health profiles from unstructured user-contributed content on eHealth social media sites. This paper describes key aspects of the pipeline as well as reports experimental results that show the system's satisfactory performance in accomplishing a series of NLP tasks of extracting patient information from online health forums.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2014; 2014:1825-34.