Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques

Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.5). 09/2006; 13(5):516-25. DOI: 10.1197/jamia.M2077
Source: PubMed


Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.
We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.
Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.
Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.

Download full-text


Available from: Christopher G Chute, Jul 14, 2014
  • Source
    • "Researchers and developers of clinical information systems have used a range of technologies to try to achieve complete and accurate coded clinical data using post-hoc text processing. Some have used natural language processing (Long 2005;Meystre and Haug 2006), others have used data mining and machine learning techniques (Pakhomov et al. 2006 ;Wright et al. 2010). Rosenbloom et al (2011suggest that we need to develop hybrid systems that combine structured entry with later text-processing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semiautomated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary based matching of the remaining text.
    Full-text · Conference Paper · Nov 2015
    • "This approach is feasible for small code set but is questionable in reallife settings where thousands of codes need to be considered. Similar to our scheme, Pakhomov et al. [19] is the first work that attempts to improve the coding performance by combing the advantages of rule-based and machine learning approaches. It describes Autocoder, an automatic encoding system implemented at Mayo clinic. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The vocabulary gap between health seekers and providers has hindered the cross-system operability and the inter-user reusability. To bridge this gap, this paper presents a novel scheme to code the medical records by jointly utilizing local mining and global learning approaches, which are tightly linked and mutually reinforced. Local mining attempts to code the individual medical record by independently extracting the medical concepts from the medical record itself and then mapping them to authenticated terminologies. A corpus-aware terminology vocabulary is naturally constructed as a byproduct, which is used as the terminology space for global learning. Local mining approach, however, may suffer from information loss and lower precision, which are caused by the absence of key medical concepts and the presence of irrelevant medical concepts. Global learning, on the other hand, works towards enhancing the local medical coding via collaboratively discovering missing key terminologies and keeping off the irrelevant terminologies by analyzing the social neighbors. Comprehensive experiments well validate the proposed scheme and each of its component. Practically, this unsupervised scheme holds potential to large-scale data.
    No preview · Article · Feb 2015 · IEEE Transactions on Knowledge and Data Engineering
  • Source
    • "Clinical documentations in computer-based records are found to be more complete and appropriate for clinical decisions than those in paper-based records [18]. Likewise, automated coding and classification encompasses a variety of computerbased approaches, that are faster, reduce error rates, and are more efficient and accurate [4] [19] [20] [21]. Similarly, improvement in clinical documentation will be necessary to ensure complete automated coding [22] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Clinical coding is an integral part of health information management (HIM) practice which provides valuable data for healthcare quality evaluation, health resource allocation, health services research, medical billing, public health programming, Case-Mix/DRG funding. The International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) is a veritable tool for the effectiveness of clinical coding practices. Objective: This present study determined implementation levels of ICD-10 as well as ICD-10-PCS and clinical coding practices in both public and forprofit hospitals in Nigeria. Methods: We used Chi square (χ2) and Cramer’s V (φc) to assess the level of association between type of workplace and implementations of ICD-10 and clinical coding practices. Statistical significance was set at .05. Result: The study discovered nationwide implementation of ICD-10 (179, 88.2%) and fair adoption of its procedure counterpart (79, 38.9%). Most hospitals in Nigeria especially, for-profit facilities (3, 100%) and tertiary healthcare settings (148, 93.1%) employed HIM professionals (214, 91.5%) to manage their clinical coding processes. Conversely, the study observed that challenges confronting clinical coding processes were enormous. Notable among these were absence of automation (70, 34.5%), lack of political will (51, 48.1%), inadequate clinical coders (153, 74.4%) and suboptimal documentation (186, 91.6). Suggestions to improve clinical coding practices ranges from continuing professional coding education (33, 10.3%) to initiation of Nigerian’s modification of ICD such that ICD-10 will become ICD-10-NGM (1, 0.3%). Conclusion: Most healthcare systems in Nigeria have implemented ICD-10 for coding and classification of diagnoses and procedures and the process is being managed by the right workforce (i.e. HIM professionals) which reassures effectiveness. However, lack of political will, inadequate and unmotivated workforce and suboptimal clinical documentation were among challenges confronting the practice in Nigeria. Therefore, this study suggests advocacy and coding education with a view to modifying the orientation of all stakeholders and to sensitize relevant authorities on the benefits of clinical coding practices in order to maximize its outcome and in effect, improve public health in the country. Keywords: Automated Coding, Clinical Coding, Clinical Documentation, Data Quality, Discharge Summary, Health Information Technology, Health Information Management Professionals, ICD-10
    Full-text · Article · Dec 2014
Show more