High agreement but low kappa: I. The problems of two paradoxes.

Yale University School of Medicine, New Haven, CT 06510.
Journal of Clinical Epidemiology (Impact Factor: 5.33). 02/1990; 43(6):543-9. DOI: 10.1016/0895-4356(90)90158-L
Source: PubMed

ABSTRACT In a fourfold table showing binary agreement of two observers, the observed proportion of agreement, p0, can be paradoxically altered by the chance-corrected ratio that creates kappa as an index of concordance. In one paradox, a high value of p0 can be drastically lowered by a substantial imbalance in the table's marginal totals either vertically or horizontally. In the second pardox, kappa will be higher with an asymmetrical rather than symmetrical imbalanced in marginal totals, and with imperfect rather than perfect symmetry in the imbalance. An adjustment that substitutes kappa max for kappa does not repair either problem, and seems to make the second one worse.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective The interpretation of critical care electroencephalography (EEG) studies is challenging because of the presence of many periodic and rhythmic patterns of uncertain clinical significance. Defining the clinical significance of these patterns requires standardized terminology with high interrater agreement (IRA). We sought to evaluate IRA for the final, published American Clinical Neurophysiology Society (ACNS)–approved version of the critical care EEG terminology (2012 version). Our evaluation included terms not assessed previously and incorporated raters with a broad range of EEG reading experience.Methods After reviewing a set of training slides, 49 readers independently completed a Web-based test consisting of 11 identical questions for each of 37 EEG samples (407 questions). Questions assessed whether a pattern was an electrographic seizure; pattern location (main term 1), pattern type (main term 2); and presence and classification of eight other key features (“plus” modifiers, sharpness, absolute and relative amplitude, frequency, number of phases, fluctuation/evolution, and the presence of “triphasic” morphology).ResultsIRA statistics (κ values) were almost perfect (90–100%) for seizures, main terms 1 and 2, the +S modifier (superimposed spikes/sharp waves or sharply contoured rhythmic delta activity), sharpness, absolute amplitude, frequency, and number of phases. Agreement was substantial for the +F (superimposed fast activity) and +R (superimposed rhythmic delta activity) modifiers (66% and 67%, respectively), moderate for triphasic morphology (58%), and fair for evolution (21%).SignificanceIRA for most terms in the ACNS critical care EEG terminology is high. These terms are suitable for multicenter research on the clinical significance of critical care EEG patterns.A PowerPoint slide summarizing this article is available for download in the Supporting Information section
    Epilepsia 05/2014; · 3.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To compare previously used algorithms to identify anovulatory menstrual cycles in women self-reporting regular menses.
    Fertility and sterility. 05/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Concordance between survey reports and claims data is not well established. We compared them for disease histories, preventative, and other health services use in a large, nationally representative sample of older Medicare beneficiaries with special attention given to evaluating age, aging, memory, and respondent status effects. Baseline (1993) and biennial follow-up data (through 2010) from the Survey on Assets and Health Dynamics among the Oldest-Old were linked to Medicare claims from 1991 to 2010, for 4910 participants yielding 19,556 person-periods. Concordance was measured by simple, weighted, and prevalence and bias-adjusted κ, and Lin's concordance statistics. Generalized estimating equation negative binomial models were used to predict the summary counts of concordant reports, survey underreports, and survey overreports. Concordance was highly variable overall, unacceptably low for arthritis and physician visits, and less than substantial for angina, heart disease, hypertension, and outpatient surgery. Generalized estimating equation negative binomial models revealed reductions in reporting accuracy (more underreporting and overreporting) associated with both age (interindividual) and aging (intraindividual) effects, countervailing memory effects on concordance due to less underreporting but more overreporting, and countervailing proxy-respondent effects on concordance due to less underreporting but more overreporting. Further research should explore whether these findings are time or cohort bound, address the potential heterogeneity of the proxy-respondent effects based on the reason for and relationship of the proxy to the target person, and evaluate the effects of a broader spectrum of performance-based cognitive abilities. In the interim, the significant predictors identified here should be included in future studies.
    Medical care 05/2014; 52(5):462-8. · 3.24 Impact Factor