Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines.

Center for Health Research, Kaiser Permanente, Portland, Oregon 97227, USA.
American Journal of Preventive Medicine (Impact Factor: 3.95). 01/2006; 29(5):434-9. DOI: 10.1016/j.amepre.2005.08.007
Source: PubMed

ABSTRACT Comprehensively assessing care quality with electronic medical records (EMRs) is not currently possible because much data reside in clinicians' free-text notes.
We evaluated the accuracy of MediClass, an automated, rule-based classifier of the EMR that incorporates natural language processing, in assessing whether clinicians: (1) asked if the patient smoked; (2) advised them to stop; (3) assessed their readiness to quit; (4) assisted them in quitting by providing information or medications; and (5) arranged for appropriate follow-up care (i.e., the 5A's of smoking-cessation care).
We analyzed 125 medical records of known smokers at each of four HMOs in 2003 and 2004. One trained abstractor at each HMO manually coded all 500 records according to whether or not each of the 5A's of smoking cessation care was addressed during routine outpatient visits.
For each patient's record, we compared the presence or absence of each of the 5A's as assessed by each human coder and by MediClass. We measured the chance-corrected agreement between the human raters and MediClass using the kappa statistic.
For "ask" and "assist," agreement among human coders was indistinguishable from agreement between humans and MediClass (p>0.05). For "assess" and "advise," the human coders agreed more with each other than they did with MediClass (p<0.01); however, MediClass performance was sufficient to assess quality in these areas. The frequency of "arrange" was too low to be analyzed.
MediClass performance appears adequate to replace human coders of the 5A's of smoking-cessation care, allowing for automated assessment of clinician adherence to one of the most important, evidence-based guidelines in preventive health care.

  • [Show abstract] [Hide abstract]
    ABSTRACT: OBJECTIVE: To evaluate the validity of, characterize the usage of, and propose potential research applications for International Classification of Diseases, Ninth Revision (ICD-9) tobacco codes in clinical populations. MATERIALS AND METHODS: Using data on cancer cases and cancer-free controls from Vanderbilt's biorepository, BioVU, we evaluated the utility of ICD-9 tobacco use codes to identify ever-smokers in general and high smoking prevalence (lung cancer) clinic populations. We assessed potential biases in documentation, and performed temporal analysis relating transitions between smoking codes to smoking cessation attempts. We also examined the suitability of these codes for use in genetic association analyses. RESULTS: ICD-9 tobacco use codes can identify smokers in a general clinic population (specificity of 1, sensitivity of 0.32), and there is little evidence of documentation bias. Frequency of code transitions between 'current' and 'former' tobacco use was significantly correlated with initial success at smoking cessation (p<0.0001). Finally, code-based smoking status assignment is a comparable covariate to text-based smoking status for genetic association studies. DISCUSSION: Our results support the use of ICD-9 tobacco use codes for identifying smokers in a clinical population. Furthermore, with some limitations, these codes are suitable for adjustment of smoking status in genetic studies utilizing electronic health records. CONCLUSIONS: Researchers should not be deterred by the unavailability of full-text records to determine smoking status if they have ICD-9 code histories.
    Journal of the American Medical Informatics Association 02/2013; · 3.57 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Numerous population-based surveys indicate that overweight and obese patients can benefit from lifestyle counseling during routine clinical care. To determine if natural language processing (NLP) could be applied to information in the electronic health record (EHR) to automatically assess delivery of weight management-related counseling in clinical healthcare encounters. The MediClass system with NLP capabilities was used to identify weight-management counseling in EHRs. Knowledge for the NLP application was derived from the 5As framework for behavior counseling: Ask (evaluate weight and related disease), Advise at-risk patients to lose weight, Assess patients' readiness to change behavior, Assist through discussion of weight-loss methods and programs, and Arrange follow-up efforts including referral. Using samples of EHR data between January 1, 2007, and March 31, 2011, from two health systems, the accuracy of the MediClass processor for identifying these counseling elements was evaluated in postpartum visits of 600 women with gestational diabetes mellitus (GDM) compared to manual chart review as the gold standard. Data were analyzed in 2013. Mean sensitivity and specificity for each of the 5As compared to the gold standard was at or above 85%, with the exception of sensitivity for Assist, which was 40% and 60% for each of the two health systems. The automated method identified many valid Assist cases not identified in the gold standard. The MediClass processor has performance capability sufficiently similar to human abstractors to permit automated assessment of counseling for weight loss in postpartum encounter records.
    American journal of preventive medicine 05/2014; 46(5):457-64. · 4.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Electronic health records (EHRs) and social media have the potential to enrich public health surveillance of diabetes. Clinical and patient-facing data sources for diabetes surveillance are needed given its profound public health impact, opportunity for primary and secondary prevention, persistent disparities, and requirement for self-management. Initiatives to employ data from EHRs and social media for diabetes surveillance are in their infancy. With their transformative potential come practical limitations and ethical considerations. We explore applications of EHR and social media for diabetes surveillance, limitations to approaches, and steps for moving forward in this partnership between patients, health systems, and public health.
    Current Diabetes Reports 03/2014; 14(3):468. · 3.17 Impact Factor