Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts.

Biomedical Informatics Training Program, Stanford Medical Informatics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2006;
Source: PubMed

ABSTRACT Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence,overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, majority of RCT abstracts still remain unstructured.We have developed a novel automated approach to structure RCT abstracts by combining text classification and Hidden Markov Modeling(HMM) techniques. Results (precision: 0.98, recall: 0.99) of our approach significantly outperform previously reported work on automated categorization of sentences in RCT abstracts.

Download full-text


Available from: Yang Huang, Dec 27, 2013
  • Source
    • "Prior studies proposed automatic techniques to transform clinical trial specifications into a computable form that can be efficiently reused for classification, clustering, and retrieval [17] [18] [19] [20] [21] [22]. A number of efforts also focused on formally representing free-text clinical trial eligibility criteria for computational processing [10,16,23–27]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Information overload is a significant problem facing online clinical trial searchers. We present eTACTS, a novel interactive retrieval framework using common eligibility tags to dynamically filter clinical trial search results. Materials and Methods eTACTS mines frequent eligibility tags from free-text clinical trial eligibility criteria and uses these tags for trial indexing. After an initial search, eTACTS presents to the user a tag cloud representing the current results. When the user selects a tag, eTACTS retains only those trials containing that tag in their eligibility criteria and generates a new cloud based on tag frequency and co-occurrences in the remaining trials. The user can then select a new tag or unselect a previous tag. The process iterates until a manageable number of trials is returned. We evaluated eTACTS in terms of filtering efficiency, diversity of the search results, and user eligibility to the filtered trials using both qualitative and quantitative methods. Results eTACTS (1) rapidly reduced search results from over a thousand trials to ten; (2) highlighted trials that are generally not top-ranked by conventional search engines; and (3) retrieved a greater number of suitable trials than existing search engines. Discussion eTACTS enables intuitive clinical trial searches by indexing eligibility criteria with effective tags. User evaluation was limited to one case study and a small group of evaluators due to the long duration of the experiment. Although a larger-scale evaluation could be conducted, this feasibility study demonstrated significant advantages of eTACTS over existing clinical trial search engines. Conclusion A dynamic eligibility tag cloud can potentially enhance state-of-the-art clinical trial search engines by allowing intuitive and efficient filtering of the search result space.
    Journal of Biomedical Informatics 08/2013; 46(6). DOI:10.1016/j.jbi.2013.07.014 · 2.48 Impact Factor
  • Source
    • "captures important scientific and clinical investigations in biomedicine. As a result, the knowledge buried in those trial records has shown to be valuable for researchers, clinicians , and the pharmaceutical industry [12] [13] [14]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent progress in high-throughput genomic technologies has shifted pharmacogenomic research from candidate gene pharmacogenetics to clinical pharmacogenomics (PGx). Many clinical related questions may be asked such as 'what drug should be prescribed for a patient with mutant alleles?' Typically, answers to such questions can be found in publications mentioning the relationships of the gene-drug-disease of interest. In this work, we hypothesize that is a comparable source rich in PGx related information. In this regard, we developed a systematic approach to automatically identify PGx relationships between genes, drugs and diseases from trial records in In our evaluation, we found that our extracted relationships overlap significantly with the curated factual knowledge through the literature in a PGx database and that most relationships appear on average 5years earlier in clinical trials than in their corresponding publications, suggesting that clinical trials may be valuable for both validating known and capturing new PGx related information in a more timely manner. Furthermore, two human reviewers judged a portion of computer-generated relationships and found an overall accuracy of 74% for our text-mining approach. This work has practical implications in enriching our existing knowledge on PGx gene-drug-disease relationships as well as suggesting crosslinks between and other PGx knowledge bases.
    Journal of Biomedical Informatics 04/2012; 45(5):870-8. DOI:10.1016/j.jbi.2012.04.005 · 2.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Health information technology (HIT) is one of the most significant developments in health care in recent years. However, there is still a large gap between how HIT could support clinical work versus how it does. In this project, we developed a visionary scenario to identify opportunities for improving patient care in dentistry. In the scenario, patients and care providers are supported by a ubiquitous, embedded computing infrastructure that captures and processes data streams from multiple sources. Practical decision support, as well as automated background data processing (e.g., to screen for common conditions), helps clinicians provide quality care. A holistic view of clinical information technology (IT) focuses on supporting clinicians and patients in a user-centered manner. While clinical IT is still in very much a work in progress, scenarios such as the one presented may be helpful to keep us focused on the possibilities of tomorrow, not on the limitations of today.
Show more