Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques

Division of Biomedical Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.5). 09/2006; 13(5):516-25. DOI: 10.1197/jamia.M2077
Source: PubMed

ABSTRACT Human classification of diagnoses is a labor intensive process that consumes significant resources. Most medical practices use specially trained medical coders to categorize diagnoses for billing and research purposes.
We have developed an automated coding system designed to assign codes to clinical diagnoses. The system uses the notion of certainty to recommend subsequent processing. Codes with the highest certainty are generated by matching the diagnostic text to frequent examples in a database of 22 million manually coded entries. These code assignments are not subject to subsequent manual review. Codes at a lower certainty level are assigned by matching to previously infrequently coded examples. The least certain codes are generated by a naïve Bayes classifier. The latter two types of codes are subsequently manually reviewed.
Standard information retrieval accuracy measurements of precision, recall and f-measure were used. Micro- and macro-averaged results were computed. RESULTS At least 48% of all EMR problem list entries at the Mayo Clinic can be automatically classified with macro-averaged 98.0% precision, 98.3% recall and an f-score of 98.2%. An additional 34% of the entries are classified with macro-averaged 90.1% precision, 95.6% recall and 93.1% f-score. The remaining 18% of the entries are classified with macro-averaged 58.5%.
Over two thirds of all diagnoses are coded automatically with high accuracy. The system has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.

Download full-text


Available from: Christopher G Chute, Jul 14, 2014
9 Reads
  • Source
    • "Clinical documentations in computer-based records are found to be more complete and appropriate for clinical decisions than those in paper-based records [18]. Likewise, automated coding and classification encompasses a variety of computerbased approaches, that are faster, reduce error rates, and are more efficient and accurate [4] [19] [20] [21]. Similarly, improvement in clinical documentation will be necessary to ensure complete automated coding [22] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Clinical coding is an integral part of health information management (HIM) practice which provides valuable data for healthcare quality evaluation, health resource allocation, health services research, medical billing, public health programming, Case-Mix/DRG funding. The International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) is a veritable tool for the effectiveness of clinical coding practices. Objective: This present study determined implementation levels of ICD-10 as well as ICD-10-PCS and clinical coding practices in both public and forprofit hospitals in Nigeria. Methods: We used Chi square (χ2) and Cramer’s V (φc) to assess the level of association between type of workplace and implementations of ICD-10 and clinical coding practices. Statistical significance was set at .05. Result: The study discovered nationwide implementation of ICD-10 (179, 88.2%) and fair adoption of its procedure counterpart (79, 38.9%). Most hospitals in Nigeria especially, for-profit facilities (3, 100%) and tertiary healthcare settings (148, 93.1%) employed HIM professionals (214, 91.5%) to manage their clinical coding processes. Conversely, the study observed that challenges confronting clinical coding processes were enormous. Notable among these were absence of automation (70, 34.5%), lack of political will (51, 48.1%), inadequate clinical coders (153, 74.4%) and suboptimal documentation (186, 91.6). Suggestions to improve clinical coding practices ranges from continuing professional coding education (33, 10.3%) to initiation of Nigerian’s modification of ICD such that ICD-10 will become ICD-10-NGM (1, 0.3%). Conclusion: Most healthcare systems in Nigeria have implemented ICD-10 for coding and classification of diagnoses and procedures and the process is being managed by the right workforce (i.e. HIM professionals) which reassures effectiveness. However, lack of political will, inadequate and unmotivated workforce and suboptimal clinical documentation were among challenges confronting the practice in Nigeria. Therefore, this study suggests advocacy and coding education with a view to modifying the orientation of all stakeholders and to sensitize relevant authorities on the benefits of clinical coding practices in order to maximize its outcome and in effect, improve public health in the country. Keywords: Automated Coding, Clinical Coding, Clinical Documentation, Data Quality, Discharge Summary, Health Information Technology, Health Information Management Professionals, ICD-10
  • Source
    • "One pertinent example is the automatic categorization of informally written medical diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as a heuristic tool for helping doctors. Vector space models including LSA have been successfully used to this end (Lee et al., 2006; Pakhomov et al., 2006). Nonetheless, results from this type of models are at the mercy of the vectorial dynamics involved and the representational bias of some terms. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector space models have been successfully used to this end (Lee, Cimino, Zhu, Sable, Shanker, Ely & Yu, 2006; Pakhomov, Buntrock & Chute, 2006). In this study we use a computational model known as Latent Semantic Analysis (LSA) on a diagnostic corpus with the aim of retrieving definitions (in the form of lists of semantic neighbors) of common structures it contains (e.g. "storm phobia", "dog phobia") or less common structures that might be formed by logical combinations of categories and diagnostic symptoms (e.g. "gun personality" or "germ personality"). In the quest to bring definitions into line with the meaning of structures and make them in some way representative, various problems commonly arise while recovering content using vector space models. We propose some approaches which bypass these problems, such as Kintsch's (2001) predication algorithm and some corrections to the way lists of neighbors are obtained, which have already been tested on semantic spaces in a non-specific domain (Jorge-Botana, León, Olmos & Hassan-Montero, under review). The results support the idea that the predication algorithm may also be useful for extracting more precise meanings of certain structures from scientific corpora, and that the introduction of some corrections based on vector length may increases its efficiency on non-representative terms.
    The Spanish Journal of Psychology 11/2009; 12(2):424-40. DOI:10.1017/S1138741600001815 · 0.74 Impact Factor
  • Source
    • "In most chief complaint classifier studies, the performance of chief complaint classification methods is measured by sensitivity , specificity, positive predictive value (PPV), F measure, and F2 measure [4] [5] [8] [13] [14]. The F measure is a weighted harmonic mean of PPV and sensitivity. "
Show more