Migrating from a legacy fixed-format measure to CAT administration: Calibrating the PHQ-9 to the PROMIS depression measures

General Internal Medicine, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Ave, Seattle, WA 98104, USA
Quality of Life Research (Impact Factor: 2.49). 03/2011; 20(9):1349-57. DOI: 10.1007/s11136-011-9882-y
Source: PubMed


We provide detailed instructions for analyzing patient-reported outcome (PRO) data collected with an existing (legacy) instrument so that scores can be calibrated to the PRO Measurement Information System (PROMIS) metric. This calibration facilitates migration to computerized adaptive test (CAT) PROMIS data collection, while facilitating research using historical legacy data alongside new PROMIS data.
A cross-sectional convenience sample (n = 2,178) from the Universities of Washington and Alabama at Birmingham HIV clinics completed the PROMIS short form and Patient Health Questionnaire (PHQ-9) depression symptom measures between August 2008 and December 2009. We calibrated the tests using item response theory. We compared measurement precision of the PHQ-9, the PROMIS short form, and simulated PROMIS CAT.
Dimensionality analyses confirmed the PHQ-9 could be calibrated to the PROMIS metric. We provide code used to score the PHQ-9 on the PROMIS metric. The mean standard errors of measurement were 0.49 for the PHQ-9, 0.35 for the PROMIS short form, and 0.37, 0.28, and 0.27 for 3-, 8-, and 9-item-simulated CATs.
The strategy described here facilitated migration from a fixed-format legacy scale to PROMIS CAT administration and may be useful in other settings.

Download full-text


Available from: Heidi M Crane
    • "Currently eight mental health CATs/CAT systems have been applied to real patients. So far the mental health CATs cover only a few domains like depression (Fliege et al., 2005a, 2009; Gibbons et al., 2008, 2012; Pilkonis et al., 2011; 2014), anxiety (Becker et al., 2008; Gibbons et al., 2008, 2014; Walter et al., 2005, 2007), stress (Kocalevent et al., 2009), anger (Pilkonis et al., 2011), and generic health-related quality of life (HRQOL, Rebollo et al., 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Computerized adaptive testing (CAT) based on Item Response Theory, (IRT) offers an efficient way for accurate measurement of patient reported outcomes. The efficiency lies within a minimal response burden and a high measurement precision over a broad measurement range. The objective of the study was to evaluate and compare the responsiveness of CATs measuring anxiety, depression, and stress reaction to standard static self-assessment tools. Longitudinal data of n=595 psychosomatic inpatients were analyzed for evaluating retest-reliability and sensitivity to change of the CATs compared to static measures (GAD-7, PHQ-9, and PSQ) using correlational and ANOVA statistics. The study hypothesized that CATs are at least as retest-reliable and as sensitive to change as static tools. The three CATs show a low burden for patients, administering on average 5-7 (±2-6SD) items with similar retest-reliability compared to the static tools applied (A-CAT: r=.78 vs. GAD-7: r=.75, D-CAT: r=.71 vs. PHQ-9: r=.75, S-CAT: r=.80 vs. PSQworries scale: r=.80). The CATs were overall as sensitive to change as the static tools (Cohen׳s d ranged between .19 and .69). This is a monocenter, observational, longitudinal study without external clinical criteria; thus generalization to other settings may be limited. The tested CATs belong to the first generation of CATs being used in daily routine for more than a decade. They are as retest reliable and sensitive to change as static tools. Newer CATs may provide further practical advantages. Copyright © 2014 Elsevier B.V. All rights reserved.
    No preview · Article · Nov 2014 · Journal of Affective Disorders
  • Source
    • "The PROMIS depression equivalent for the PHQ-9 threshold of 5 (mild depression) is 52.5; for the threshold of 10 (moderate depression), 59.9; for the threshold of 15 (moderately severe depression), 65.8; and for the threshold of 20 (severe depression), 71.5. Gibbons et al. (2011) also reported analyses linking PROMIS depression and the PHQ-9 in a sample of HIV patients. Their results were generally comparable to the PROsetta Stone linkages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is an NIH Roadmap initiative devoted to developing better measurement tools for assessing constructs relevant to the clinical investigation and treatment of all diseases-constructs such as pain, fatigue, emotional distress, sleep, physical functioning, and social participation. Following creation of item banks for these constructs, our priority has been to validate them, most often in short-term observational studies. We report here on a three-month prospective observational study with depressed outpatients in the early stages of a new treatment episode (with assessments at intake, one-month follow-up, and three-month follow-up). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a computerized adaptive test, CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999). PROMIS depression demonstrated strong convergent validity with the CESD and the PHQ-9 (with correlations in a range from .72 to .84 across all time points), as well as responsiveness to change when characterizing symptom severity in a clinical outpatient sample. Identification of patients as "recovered" varied across the measures, with the PHQ-9 being the most conservative. The use of calibrations based on models from item response theory (IRT) provides advantages for PROMIS depression both psychometrically (creating the possibility of adaptive testing, providing a broader effective range of measurement, and generating greater precision) and practically (these psychometric advantages can be achieved with fewer items-a median of 4 items administered by CAT-resulting in less patient burden).
    Full-text · Article · May 2014 · Journal of Psychiatric Research
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Parents of children undergoing hematopoietic stem cell transplantation (HSCT) may face emotional distress while managing intense treatments with uncertain outcomes. We evaluated a brief parental emotional functioning (PREMO) screener from a health-related quality of life instrument to identify parental emotional distress, as measured by the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID). Methods: As part of a longitudinal pediatric HSCT study, parents (N = 165) completed the Child Health Ratings Inventories, which contain the 7-item PREMO screener. Some parents (n = 117) also completed SCID modules for Anxiety, Mood, and Adjustment disorders at baseline and/or 12 months. A composite outcome was created for threshold or subthreshold levels of any of these disorders. Receiver operating characteristic (ROC) analysis assessed how the PREMO screener predicted emotional distress as measured by the SCID. A prediction model was then built. Results: Fifty-two percent of parents completing the SCID had an Axis I disorder at baseline, while 41 % had an Axis I disorder at 12 months. The area under the ROC curve was 0.75 for the PREMO screener and 0.81 for the prediction model. Conclusions: The PREMO screener may identify parents with, or at risk for, emotional distress and facilitate further evaluation and intervention.
    No preview · Article · Jul 2012 · Quality of Life Research
Show more