Extracting timing and status descriptors for colonoscopy testing from electronic medical records

Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.5). 07/2010; 17(4):383-8. DOI: 10.1136/jamia.2010.004804
Source: PubMed


Colorectal cancer (CRC) screening rates are low despite confirmed benefits. The authors investigated the use of natural language processing (NLP) to identify previous colonoscopy screening in electronic records from a random sample of 200 patients at least 50 years old. The authors developed algorithms to recognize temporal expressions and 'status indicators', such as 'patient refused', or 'test scheduled'. The new methods were added to the existing KnowledgeMap concept identifier system, and the resulting system was used to parse electronic medical records (EMR) to detect completed colonoscopies. Using as the 'gold standard' expert physicians' manual review of EMR notes, the system identified timing references with a recall of 0.91 and precision of 0.95, colonoscopy status indicators with a recall of 0.82 and precision of 0.95, and references to actually completed colonoscopies with recall of 0.93 and precision of 0.95. The system was superior to using colonoscopy billing codes alone. Health services researchers and clinicians may find NLP a useful adjunct to traditional methods to detect CRC screening status. Further investigations must validate extension of NLP approaches for other types of CRC screening applications.

Download full-text


Available from: Joshua C Denny
  • Source
    • "Breast neoplasm Pathology reports [16] [20] [21] Breast neoplasm PubMed abstracts [15] Cervical neoplasm PubMed abstracts [22] Colon neoplasm Pathology reports [23] [24] Colorectal neoplasm EMR notes [25] [26] [28] [29] Colorectal neoplasm Pathology reports [27] Colorectal neoplasm Histopathology reports [30] Colorectal neoplasm Colonoscopy reports [5] Lung neoplasm Radiographic reports [31] Lung neoplasm EMR [26] Lung neoplasm Pathology reports [32] Ovarian neoplasm GPRD records [33] Pancreatic neoplasm PubMed abstracts, EMRs [34] Prostate neoplasm Clinical records: all available paper, electronic, radiologic, radiation therapy and pathology records [37] Prostate neoplasm Pathology reports [21] [36] Prostate neoplasm EMR [26] Skin neoplasm Pathology reports [36] cancer. In particular, two types of reports are relevant for recording cancer-related information: pathology and imaging reports. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. Methods: A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. Results: A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.
    Full-text · Article · Sep 2014 · International Journal of Medical Informatics
  • Source
    • "can only be answered and interpreted if the relative temporal relations between the events are considered. In general, temporal reasoning has applications in several tasks in the clinical domain such as information extraction [2] [3], question answering [4] [5], patient timeline visualization [6], clinical guideline development [7] [8] and others. Automatic extraction of temporal information can facilitate processing of patient information in the narrative text, and this can contribute to the decision making process in fundamental patient care tasks such as prevention, diagnosis and forecasting the effects of the treatments [9] [10]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical records include both coded and free-text fields that interact to reflect complicated patient stories. The information often covers not only the present medical condition and events experienced by the patient, but also refers to relevant events in the past (such as signs, symptoms, tests or treatments). In order to automatically construct a timeline of these events, we first need to extract the temporal relations between pairs of events or time expressions presented in the clinical notes. We designed separate extraction components for different types of temporal relations, utilizing a novel hybrid system that combines machine learning with a graph-based inference mechanism to extract the temporal links. The temporal graph is a directed graph based on parse tree dependencies of the simplified sentences and frequent pattern clues. We generalized the sentences in order to discover patterns that, given the complexities of natural language, might not be directly discoverable in the original sentences. The proposed hybrid system performance reached an F-measure of 0.63, with precision at 0.76 and recall at 0.54 on the 2012 i2b2 Natural Language Processing corpus for the temporal relation (TLink) extraction task, achieving the highest precision and third highest f-measure among participating teams in the TLink track.
    Full-text · Article · Nov 2013 · Journal of Biomedical Informatics
  • Source
    • "Temporal information in clinical narratives plays an important role in medical decision-making and care assessment [3]. Some examples of clinical applications that utilize temporal information include: diagnosis, prognosis and treatment decision support [3] [4], time specific clinical information extraction [5] [6] [7], and time-related question answering [8] [9] [10]. These applications rely on temporal reasoning systems which extract temporal information from natural language, and perform temporal inference over the extracted information. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Temporal information in clinical narratives plays an important role in patients' diagnosis, treatment and prognosis. In order to represent narrative information accurately, medical natural language processing (MLP) systems need to correctly identify and interpret temporal information. To promote research in this area, the Informatics for Integrating Biology and the Bedside (i2b2) project developed a temporally annotated corpus of clinical narratives. This corpus contains 310 de-identified discharge summaries, with annotations of clinical events, temporal expressions and temporal relations. This paper describes the process followed for the development of this corpus and discusses annotation guideline development, annotation methodology, and corpus quality.
    Full-text · Article · Jul 2013 · Journal of Biomedical Informatics
Show more