DSM-5 Field Trials in the United States and Canada, Part II: Test-Retest Reliability of Selected Categorical Diagnoses

University of Pittsburgh, Pittsburgh, Pennsylvania, United States
American Journal of Psychiatry (Impact Factor: 12.3). 10/2012; 170(1). DOI: 10.1176/appi.ajp.2012.12070999
Source: PubMed


The DSM-5 Field Trials were designed to obtain precise (standard error,0.1) estimates of the intraclass kappa asa measure of the degree to which two clinicians could independently agree on the presence or absence of selected DSM-5 diagnoses when the same patient was interviewed on separate occasions, in clinical settings, and evaluated with usual clinical interview methods.

Eleven academic centers in the United States and Canada were selected,and each was assigned several target diagnoses frequently treated in that setting.Consecutive patients visiting a site during the study were screened and stratified on the basis of DSM-IV diagnoses or symptomatic presentations. Patients were randomly assigned to two clinicians for a diagnostic interview; clinicians were blind to any previous diagnosis. All data were entered directly via an Internet-based software system to a secure central server. Detailed research design and statistical methods are presented in an accompanying article.

There were a total of 15 adult and eight child/adolescent diagnoses for which adequate sample sizes were obtained to report adequately precise estimates of the intraclass kappa. Overall, five diagnoses were in the very good range(kappa=0.60–0.79), nine in the good range(kappa=0.40–0.59), six in the questionable range (kappa = 0.20–0.39), and three in the unacceptable range (kappa values,0.20). Eight diagnoses had insufficient sample sizes to generate precise kappa estimates at any site.

Most diagnoses adequately tested had good to very good reliability with these representative clinical populations assessed with usual clinical interview methods. Some diagnoses that were revised to encompass a broader spectrum of symptom expression or had a more dimensional approach tested in the good to very good range.

1 Follower
25 Reads
  • Source
    • "The recent 'fieldwork trials' for DSM-5 are a rare recent example looking at categorical diagnoses in sequentially recruited rather than highly selected patient samples and without structured clinical interviews. BD-I and BPD diagnosis showed very good reliability (Kappa : 0.75) in some centres, but not in others [30]. Thus, even when diagnosis is under explicit scrutiny, it cannot be assumed that diagnostic agreement is high. "
    • "and MDD, attention deficit/hyperactivity disorder (ADHD), and anxiety disorders are reported to be the most common comorbid diagnoses (American Psychiatric Association 2013). The diagnosis of DMDD is criticized because of its potential to pathologize physiological behavior (i.e., temper tantrums) with a consequent elevation in use of psychotropic medications, paucity of empirical evidence supporting the validity of diagnosis, low test– retest reliability and supporting studies focusing at selected centers, and a not entirely overlapping diagnosis (i.e., SMDD) (Parens et al. 2010; Regier et al. 2013; McGough 2014). On the other hand, there are also studies supporting its validity as a distinct diagnosis (Copeland et al. 2013; Deveney et al. 2013; Copeland et al. 2014; Dougherty et al. 2014). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Disruptive mood dysregulation disorder (DMDD) is a novel diagnosis listed in Diagnostic and Statistical Manual of Mental Disorders, 5th ed. (DSM-5) to encompass chronic and impairing irritability in youth, and to help its differentiation from bipolar disorders. Because it is a new entity, treatment guidelines, as well as its sociodemographic and clinical features among diverse populations, are still not elucidated. Here, DMDD cases from three centers in Turkey are reported and the implications are discussed. Methods: The study was conducted at the Abant Izzet Baysal University Medical Faculty Department of Child and Adolescent Psychiatry (Bolu), and American Hospital and Bengi Semerci Institute (Istanbul) between August 2014 and October 2014. Records of patients were reviewed and features of patients who fulfilled criteria for DMDD were recorded. Data were analyzed with SPS Version 17.0 for Windows. Descriptive analyses, χ(2) test, and Mann-Whitney U test were used for analyses. Diagnostic consensus was determined via Cohen's κ constants. p was set at 0.01. Results: Thirty-six patients (77.8 % male) fulfilled criteria for DMDD. κ value for consensus between clinicians was 0.68 (p = 0.00). Mean age of patients was 9.0 years (S.D. = 2.5) whereas the mean age of onset for DMDD symptoms was 4.9 years (S.D. = 2.2). Irritability, temper tantrums, verbal rages, and physical aggression toward family members were the most common presenting complaints. Conclusions: Diagnostic consensus could not be reached for almost one fourth of cases. Most common reasons for lack of consensus were problems in clarification of moods of patients in between episodes, problems in differentiation of normality and pathology (i.e., symptoms mainly reported in one setting vs. pervasiveness), and inability to fulfill frequency criterion for tantrums.
    Journal of child and adolescent psychopharmacology 10/2015; DOI:10.1089/cap.2015.0004 · 2.93 Impact Factor
  • Source
    • "Reliable diagnosis is an essential prerequisite for the study of mental disorders. The question how to reliably measure Major Depression (MD) is unresolved: depression biomarkers have very limited explanatory power (Cai et al., 2015; Schmaal et al., 2015), and MD was among the least reliable diagnoses in the DSM-5 field trials (Regier et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The symptoms for Major Depression (MD) defined in the DSM-5 differ markedly from symptoms assessed in common rating scales, and the empirical question about core depression symptoms is unresolved. Here we conceptualize depression as a complex dynamic system of interacting symptoms to examine what symptoms are most central to driving depressive processes. Methods We constructed a network of 28 depression symptoms assessed via the Inventory of Depressive Symptomatology (IDS-30) in 3,463 depressed outpatients from the Sequenced Treatment Alternatives to Relieve Depression (STAR⁎D) study. We estimated the centrality of all IDS-30 symptoms, and compared the centrality of DSM and non-DSM symptoms; centrality reflects the connectedness of each symptom with all other symptoms. Results A network with 28 intertwined symptoms emerged, and symptoms differed substantially in their centrality values. Both DSM symptoms (e.g., sad mood) and non-DSM symptoms (e.g., anxiety) were among the most central symptoms, and DSM criteria were not more central than non-DSM symptoms. Limitations Many subjects enrolled in STAR⁎D reported comorbid medical and psychiatric conditions which may have affected symptom presentation. Conclusion The network perspective neither supports the standard psychometric notion that depression symptoms are equivalent indicators of MD, nor the common assumption that DSM symptoms of depression are of higher clinical relevance than non-DSM depression symptoms. The findings suggest the value of research focusing on especially central symptoms to increase the accuracy of predicting outcomes such as the course of illness, probability of relapse, and treatment response.
    Journal of Affective Disorders 10/2015; DOI:10.1016/j.jad.2015.09.005 · 3.38 Impact Factor
Show more


25 Reads
Available from