Extracting subject demographic information from abstracts of randomized clinical trial reports.

Biomedical Informatics Training Program, Stanford University School of Medicine, USA.
Studies in health technology and informatics 02/2007; 129(Pt 1):550-4. DOI: 10.3233/978-1-58603-774-1-550
Source: PubMed

ABSTRACT In order to make more informed healthcare decisions, consumers need information systems that deliver accurate and reliable information about their illnesses and potential treatments. Reports of randomized clinical trials (RCTs) provide reliable medical evidence about the efficacy of treatments. Current methods to access, search for, and retrieve RCTs are keyword-based, time-consuming, and suffer from poor precision. Personalized semantic search and medical evidence summarization aim to solve this problem. The performance of these approaches may improve if they have access to study subject descriptors (e.g. age, gender, and ethnicity), trial sizes, and diseases/symptoms studied. We have developed a novel method to automatically extract such subject demographic information from RCT abstracts. We used text classification augmented with a Hidden Markov Model to identify sentences containing subject demographics, and subsequently these sentences were parsed using Natural Language Processing techniques to extract relevant information. Our results show accuracy levels of 82.5%, 92.5%, and 92.0% for extraction of subject descriptors, trial sizes, and diseases/symptoms descriptors respectively.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The health sciences literature incorporates a relatively large subset of epidemiological studies that focus on population-level findings, including various determinants, outcomes and correlations. Extracting structured information about those characteristics would be useful for more complete understanding of diseases and for meta-analyses and systematic reviews. Results We present an information extraction approach that enables users to identify key characteristics of epidemiological studies from MEDLINE abstracts. It extracts six types of epidemiological characteristic: design of the study, population that has been studied, exposure, outcome, covariates and effect size. We have developed a generic rule-based approach that has been designed according to semantic patterns observed in text, and tested it in the domain of obesity. Identified exposure, outcome and covariate concepts are clustered into health-related groups of interest. On a manually annotated test corpus of 60 epidemiological abstracts, the system achieved precision, recall and F-score between 79-100%, 80-100% and 82-96% respectively. We report the results of applying the method to a large scale epidemiological corpus related to obesity. Conclusions The experiments suggest that the proposed approach could identify key epidemiological characteristics associated with a complex clinical problem from related abstracts. When integrated over the literature, the extracted data can be used to provide a more complete picture of epidemiological efforts, and thus support understanding via meta-analysis and systematic reviews.
    Journal of Biomedical Semantics 05/2014; 5:22. DOI:10.1186/2041-1480-5-22
  • Source
    12/2013, Degree: Ph.D., Supervisor: Shlomo Argamon
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To date, the scientific process for generating, interpreting, and applying knowledge has received less informatics attention than operational processes for conducting clinical studies. The activities of these scientific processes - the science of clinical research - are centered on the study protocol, which is the abstract representation of the scientific design of a clinical study. The Ontology of Clinical Research (OCRe) is an OWL 2 model of the entities and relationships of study design protocols for the purpose of computationally supporting the design and analysis of human studies. OCRe's modeling is independent of any specific study design or clinical domain. It includes a study design typology and a specialized module called ERGO Annotation for capturing the meaning of eligibility criteria. In this paper, we describe the key informatics use cases of each phase of a study's scientific lifecycle, present OCRe and the principles behind its modeling, and describe applications of OCRe and associated technologies to a range of clinical research use cases. OCRe captures the central semantics that underlies the scientific processes of clinical research and can serve as an informatics foundation for supporting the entire range of knowledge activities that constitute the science of clinical research.
    Journal of Biomedical Informatics 11/2013; DOI:10.1016/j.jbi.2013.11.002 · 2.48 Impact Factor