Article

Extracting timing and status descriptors for colonoscopy testing from electronic medical records.

Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.57). 07/2010; 17(4):383-8. DOI: 10.1136/jamia.2010.004804
Source: PubMed

ABSTRACT Colorectal cancer (CRC) screening rates are low despite confirmed benefits. The authors investigated the use of natural language processing (NLP) to identify previous colonoscopy screening in electronic records from a random sample of 200 patients at least 50 years old. The authors developed algorithms to recognize temporal expressions and 'status indicators', such as 'patient refused', or 'test scheduled'. The new methods were added to the existing KnowledgeMap concept identifier system, and the resulting system was used to parse electronic medical records (EMR) to detect completed colonoscopies. Using as the 'gold standard' expert physicians' manual review of EMR notes, the system identified timing references with a recall of 0.91 and precision of 0.95, colonoscopy status indicators with a recall of 0.82 and precision of 0.95, and references to actually completed colonoscopies with recall of 0.93 and precision of 0.95. The system was superior to using colonoscopy billing codes alone. Health services researchers and clinicians may find NLP a useful adjunct to traditional methods to detect CRC screening status. Further investigations must validate extension of NLP approaches for other types of CRC screening applications.

0 Bookmarks
 · 
98 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. Methods A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. Results A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.
    International Journal of Medical Informatics 09/2014; · 2.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A calculation grid developed by an international expert group was tested across biobanks in six countries to evaluate costs for collections of various types of biospecimens. The assessment yielded a tool for setting specimen-access prices that were transparently related to biobank costs, and the tool was applied across three models of collaborative partnership.
    Science translational medicine 11/2014; 6(261):261fs45. · 14.41 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We explore methods for effectively extracting information from clinical narratives, which are captured in a public health consulting phone service called HealthLink. The currently available data consists of dialogues constructed by nurses while consulting patients on the phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise: First is explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, variants of terms, etc. Second is implicit noise, which includes non-patient's information and negation of patient's information. To filter explicit noise, we propose our biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms and other types of named entities (which show patients' personal information such as age, and sex), we propose a bootstrapping-based pattern learning to detect all kinds of arbitrary variations of the named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information. The experimental results show that we achieve reasonable performance with our noise reduction methods.
    2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013

Full-text (2 Sources)

Download
43 Downloads
Available from
Jun 4, 2014