Extracting timing and status descriptors for colonoscopy testing from electronic medical records
ABSTRACT Colorectal cancer (CRC) screening rates are low despite confirmed benefits. The authors investigated the use of natural language processing (NLP) to identify previous colonoscopy screening in electronic records from a random sample of 200 patients at least 50 years old. The authors developed algorithms to recognize temporal expressions and 'status indicators', such as 'patient refused', or 'test scheduled'. The new methods were added to the existing KnowledgeMap concept identifier system, and the resulting system was used to parse electronic medical records (EMR) to detect completed colonoscopies. Using as the 'gold standard' expert physicians' manual review of EMR notes, the system identified timing references with a recall of 0.91 and precision of 0.95, colonoscopy status indicators with a recall of 0.82 and precision of 0.95, and references to actually completed colonoscopies with recall of 0.93 and precision of 0.95. The system was superior to using colonoscopy billing codes alone. Health services researchers and clinicians may find NLP a useful adjunct to traditional methods to detect CRC screening status. Further investigations must validate extension of NLP approaches for other types of CRC screening applications.
Full-textDOI: · Available from: Joshua C Denny, May 28, 2015
Conference Paper: Patient information extraction in noisy tele-health texts[Show abstract] [Hide abstract]
ABSTRACT: We explore methods for effectively extracting information from clinical narratives, which are captured in a public health consulting phone service called HealthLink. The currently available data consists of dialogues constructed by nurses while consulting patients on the phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise: First is explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, variants of terms, etc. Second is implicit noise, which includes non-patient's information and negation of patient's information. To filter explicit noise, we propose our biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms and other types of named entities (which show patients' personal information such as age, and sex), we propose a bootstrapping-based pattern learning to detect all kinds of arbitrary variations of the named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information. The experimental results show that we achieve reasonable performance with our noise reduction methods.2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013
[Show abstract] [Hide abstract]
ABSTRACT: A calculation grid developed by an international expert group was tested across biobanks in six countries to evaluate costs for collections of various types of biospecimens. The assessment yielded a tool for setting specimen-access prices that were transparently related to biobank costs, and the tool was applied across three models of collaborative partnership.Science translational medicine 11/2014; 6(261):261fs45. DOI:10.1126/scitranslmed.3010444 · 14.41 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Statin medications are often prescribed to ameliorate a patient's risk of cardiovascular events due in part to cholesterol reduction. We developed and evaluated an algorithm that can accurately identify subjects with major adverse cardiac events (MACE) while on statins using electronic medical record (EMR) data. The algorithm also identifies subjects experiencing their first MACE while on statins for primary prevention. The algorithm achieved 90% to 97% PPVs in identification of MACE cases as compared against physician review. By applying the algorithm to EMR data in BioVU, cases and controls were identified and used subsequently to replicate known associations with eight genetic variants. We replicated 6/8 previously reported genetic associations with cardiovascular diseases or lipid metabolism disorders. Our results demonstrated that the algorithm can be used to accurately identify subjects with MACE and MACE while on statins. Consequently, future e studies can be conducted to investigate and validate the relationship between statins and MACE using real-world clinical data.04/2014; 2014:112-9.