Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts.

Biomedical Informatics Training Program, Stanford Medical Informatics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2006;
Source: PubMed

ABSTRACT Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence,overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, majority of RCT abstracts still remain unstructured.We have developed a novel automated approach to structure RCT abstracts by combining text classification and Hidden Markov Modeling(HMM) techniques. Results (precision: 0.98, recall: 0.99) of our approach significantly outperform previously reported work on automated categorization of sentences in RCT abstracts.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical text, such as clinical trial eligibility criteria, is largely underused in state-of-the-art medical search engines due to difficulties of accurate parsing. This paper proposes a novel methodology to derive a semantic index for clinical eligibility documents based on a controlled vocabulary of frequent tags, which are automatically mined from the text. We applied this method to eligibility criteria on and report that frequent tags (1) define an effective and efficient index of clinical trials and (2) are unlikely to grow radically when the repository increases. We proposed to apply the semantic index to filter clinical trial search results and we concluded that frequent tags reduce the result space more efficiently than an uncontrolled set of UMLS concepts. Overall, unsupervised mining of frequent tags from clinical text leads to an effective semantic index for the clinical eligibility documents and promotes their computational reuse.
    Journal of Biomedical Informatics 09/2013; DOI:10.1016/j.jbi.2013.08.012 · 2.48 Impact Factor
  • Source
    12/2014; 5(6).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Health information technology (HIT) is one of the most significant developments in health care in recent years. However, there is still a large gap between how HIT could support clinical work versus how it does. In this project, we developed a visionary scenario to identify opportunities for improving patient care in dentistry. In the scenario, patients and care providers are supported by a ubiquitous, embedded computing infrastructure that captures and processes data streams from multiple sources. Practical decision support, as well as automated background data processing (e.g., to screen for common conditions), helps clinicians provide quality care. A holistic view of clinical information technology (IT) focuses on supporting clinicians and patients in a user-centered manner. While clinical IT is still in very much a work in progress, scenarios such as the one presented may be helpful to keep us focused on the possibilities of tomorrow, not on the limitations of today.

Full-text (2 Sources)

Available from
May 26, 2014