Identifying QT prolongation from ECG impressions using natural language processing and negation detection.
ABSTRACT Electrocardiogram (ECG) impressions provide significant information for decision support and clinical research. We investigated the presence of QT prolongation, an important risk factor for sudden cardiac death, compared to the automated calculation of corrected QT (QTc) by ECG machines. We integrated a negation tagging algorithm into the KnowledgeMap concept identifier (KMCI), then applied it to impressions from 44,080 ECGs to identify Unified Medical Language System concepts. We compared the instances of QT prolongation identified by KMCI to the calculated QTc. The algorithm for negation detection had a recall of 0.973 and precision of 0.982 over 10,490 concepts. A concept query for QT prolongation matched 2,364 ECGs with precision of 1.00. The positive predictive value of the common QTc cutoffs was 6-21%. ECGs not identified by KMCI as prolonged but with QTc>450ms revealed potential causes of miscalculated QTc intervals in 96% of the cases; no definite concept query false negatives were detected. We conclude that a natural language processing system can effectively identify QT prolongation and other cardiac diagnoses from ECG impressions for potential decision support and clinical research.
- SourceAvailable from: Joshua C Denny[show abstract] [hide abstract]
ABSTRACT: Epidemiologic studies contribute greatly to evidence-based medicine by identifying risk factors for diseases and determining optimal treatments for clinical practice. However, there is very limited effort on automatic extraction of knowledge from epidemiologic articles, such as exposures, outcomes, and their relations. In this initial study, we developed a system that consists of a natural language processing (NLP) engine and a rule-based classifier, to automatically extract exposure-related terms from titles of epidemiologic articles. The evaluation using 450 titles annotated by an epidemiologist showed the highest F-measure of 0.646 (Precision 0.610 and Recall 0.688) using in-exact matching, which indicated the feasibility of automated methods on mining epidemiologic literature. Further analysis of terms related to epidemiologic exposures suggested that although UMLS would have reasonable coverage, more appropriate semantic classifications of epidemiologic exposures would be required.AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:897-901.
- [show abstract] [hide abstract]
ABSTRACT: We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.Yearbook of medical informatics 01/2008;
- [show abstract] [hide abstract]
ABSTRACT: OBJECTIVE: To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs). MATERIALS AND METHODS: We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication-indication pairs that optimizes recall while maintaining high precision. RESULTS: MEDI contains 3112 medications and 63 343 medication-indication pairs. Wikipedia was the largest resource, with 2608 medications and 34 911 pairs. For each resource, estimated precision and recall, respectively, were 94% and 20% for RxNorm, 75% and 33% for MedlinePlus, 67% and 31% for SIDER 2, and 56% and 51% for Wikipedia. The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13 304 unique indication pairs regarding 2136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22±6.09. The estimated precision of MEDI-HPS is 92%. CONCLUSIONS: MEDI is a publicly available, computable resource that links medications with their indications as represented by concepts and billing codes. MEDI may benefit clinical EMR applications and reuse of EMR data for research.Journal of the American Medical Informatics Association 04/2013; · 3.57 Impact Factor