Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts

Biomedical Informatics Training Program, Stanford Medical Informatics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2006;
Source: PubMed


Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence,overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, majority of RCT abstracts still remain unstructured.We have developed a novel automated approach to structure RCT abstracts by combining text classification and Hidden Markov Modeling(HMM) techniques. Results (precision: 0.98, recall: 0.99) of our approach significantly outperform previously reported work on automated categorization of sentences in RCT abstracts.

Download full-text


Available from: Yang Huang, Dec 27, 2013
  • Source
    • "Prior studies proposed automatic techniques to transform clinical trial specifications into a computable form that can be efficiently reused for classification, clustering, and retrieval [17] [18] [19] [20] [21] [22]. A number of efforts also focused on formally representing free-text clinical trial eligibility criteria for computational processing [10,16,23–27]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Information overload is a significant problem facing online clinical trial searchers. We present eTACTS, a novel interactive retrieval framework using common eligibility tags to dynamically filter clinical trial search results. Materials and Methods eTACTS mines frequent eligibility tags from free-text clinical trial eligibility criteria and uses these tags for trial indexing. After an initial search, eTACTS presents to the user a tag cloud representing the current results. When the user selects a tag, eTACTS retains only those trials containing that tag in their eligibility criteria and generates a new cloud based on tag frequency and co-occurrences in the remaining trials. The user can then select a new tag or unselect a previous tag. The process iterates until a manageable number of trials is returned. We evaluated eTACTS in terms of filtering efficiency, diversity of the search results, and user eligibility to the filtered trials using both qualitative and quantitative methods. Results eTACTS (1) rapidly reduced search results from over a thousand trials to ten; (2) highlighted trials that are generally not top-ranked by conventional search engines; and (3) retrieved a greater number of suitable trials than existing search engines. Discussion eTACTS enables intuitive clinical trial searches by indexing eligibility criteria with effective tags. User evaluation was limited to one case study and a small group of evaluators due to the long duration of the experiment. Although a larger-scale evaluation could be conducted, this feasibility study demonstrated significant advantages of eTACTS over existing clinical trial search engines. Conclusion A dynamic eligibility tag cloud can potentially enhance state-of-the-art clinical trial search engines by allowing intuitive and efficient filtering of the search result space.
    Journal of Biomedical Informatics 08/2013; 46(6). DOI:10.1016/j.jbi.2013.07.014 · 2.19 Impact Factor
  • Source
    • "On the other hand, drug-disease associations in objective sections are weaker. We previously developed an algorithm by combining text classification and hidden Markov modeling techniques to automatically structure MEDLINE abstracts [31]. In the future, we plan to assign a confidence score to each extracted association by taking sentence type into account. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks.
    BMC Bioinformatics 06/2013; 14(1):181. DOI:10.1186/1471-2105-14-181 · 2.58 Impact Factor
  • Source
    • "A supervised domain-adaptation approach is adopted. [19] exploits a similar idea in a finer-grained level of syntactic units. The authors classify sentences of RCT abstracts in meaningful categories, i.e. introduction, objective, method, result and conclusion, combining text classification and Hidden Markov Modelling techniques. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols.
    BMC Medical Informatics and Decision Making 04/2012; 12 Suppl 1(Suppl 1):S3. DOI:10.1186/1472-6947-12-S1-S3 · 1.83 Impact Factor
Show more