Migrating from a legacy fixed-format measure to CAT administration: Calibrating the PHQ-9 to the PROMIS depression measures

General Internal Medicine, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Ave, Seattle, WA 98104, USA
Quality of Life Research (Impact Factor: 2.49). 03/2011; 20(9):1349-57. DOI: 10.1007/s11136-011-9882-y
Source: PubMed


We provide detailed instructions for analyzing patient-reported outcome (PRO) data collected with an existing (legacy) instrument so that scores can be calibrated to the PRO Measurement Information System (PROMIS) metric. This calibration facilitates migration to computerized adaptive test (CAT) PROMIS data collection, while facilitating research using historical legacy data alongside new PROMIS data.
A cross-sectional convenience sample (n = 2,178) from the Universities of Washington and Alabama at Birmingham HIV clinics completed the PROMIS short form and Patient Health Questionnaire (PHQ-9) depression symptom measures between August 2008 and December 2009. We calibrated the tests using item response theory. We compared measurement precision of the PHQ-9, the PROMIS short form, and simulated PROMIS CAT.
Dimensionality analyses confirmed the PHQ-9 could be calibrated to the PROMIS metric. We provide code used to score the PHQ-9 on the PROMIS metric. The mean standard errors of measurement were 0.49 for the PHQ-9, 0.35 for the PROMIS short form, and 0.37, 0.28, and 0.27 for 3-, 8-, and 9-item-simulated CATs.
The strategy described here facilitated migration from a fixed-format legacy scale to PROMIS CAT administration and may be useful in other settings.

Download full-text


Available from: Heidi M Crane,
26 Reads
  • Source
    • "The PROMIS depression equivalent for the PHQ-9 threshold of 5 (mild depression) is 52.5; for the threshold of 10 (moderate depression), 59.9; for the threshold of 15 (moderately severe depression), 65.8; and for the threshold of 20 (severe depression), 71.5. Gibbons et al. (2011) also reported analyses linking PROMIS depression and the PHQ-9 in a sample of HIV patients. Their results were generally comparable to the PROsetta Stone linkages. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is an NIH Roadmap initiative devoted to developing better measurement tools for assessing constructs relevant to the clinical investigation and treatment of all diseases-constructs such as pain, fatigue, emotional distress, sleep, physical functioning, and social participation. Following creation of item banks for these constructs, our priority has been to validate them, most often in short-term observational studies. We report here on a three-month prospective observational study with depressed outpatients in the early stages of a new treatment episode (with assessments at intake, one-month follow-up, and three-month follow-up). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a computerized adaptive test, CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999). PROMIS depression demonstrated strong convergent validity with the CESD and the PHQ-9 (with correlations in a range from .72 to .84 across all time points), as well as responsiveness to change when characterizing symptom severity in a clinical outpatient sample. Identification of patients as "recovered" varied across the measures, with the PHQ-9 being the most conservative. The use of calibrations based on models from item response theory (IRT) provides advantages for PROMIS depression both psychometrically (creating the possibility of adaptive testing, providing a broader effective range of measurement, and generating greater precision) and practically (these psychometric advantages can be achieved with fewer items-a median of 4 items administered by CAT-resulting in less patient burden).
    Journal of Psychiatric Research 05/2014; 56(1). DOI:10.1016/j.jpsychires.2014.05.010 · 3.96 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Parents of children undergoing hematopoietic stem cell transplantation (HSCT) may face emotional distress while managing intense treatments with uncertain outcomes. We evaluated a brief parental emotional functioning (PREMO) screener from a health-related quality of life instrument to identify parental emotional distress, as measured by the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID). Methods: As part of a longitudinal pediatric HSCT study, parents (N = 165) completed the Child Health Ratings Inventories, which contain the 7-item PREMO screener. Some parents (n = 117) also completed SCID modules for Anxiety, Mood, and Adjustment disorders at baseline and/or 12 months. A composite outcome was created for threshold or subthreshold levels of any of these disorders. Receiver operating characteristic (ROC) analysis assessed how the PREMO screener predicted emotional distress as measured by the SCID. A prediction model was then built. Results: Fifty-two percent of parents completing the SCID had an Axis I disorder at baseline, while 41 % had an Axis I disorder at 12 months. The area under the ROC curve was 0.75 for the PREMO screener and 0.81 for the prediction model. Conclusions: The PREMO screener may identify parents with, or at risk for, emotional distress and facilitate further evaluation and intervention.
    Quality of Life Research 07/2012; 22(6). DOI:10.1007/s11136-012-0240-5 · 2.49 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background: Lack of coordination between screening studies for common mental disorders in primary care and community epidemiological samples impedes progress in clinical epidemiology. Short screening scales based on the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI), the diagnostic interview used in community epidemiological surveys throughout the world, were developed to address this problem. Method: Expert reviews and cognitive interviews generated CIDI screening scale (CIDI-SC) item pools for 30-day DSM-IV-TR major depressive episode (MDE), generalized anxiety disorder (GAD), panic disorder (PD) and bipolar disorder (BPD). These items were administered to 3058 unselected patients in 29 US primary care offices. Blinded SCID clinical reinterviews were administered to 206 of these patients, oversampling screened positives. Results: Stepwise regression selected optimal screening items to predict clinical diagnoses. Excellent concordance [area under the receiver operating characteristic curve (AUC)] was found between continuous CIDI-SC and DSM-IV/SCID diagnoses of 30-day MDE (0.93), GAD (0.88), PD (0.90) and BPD (0.97), with only 9-38 questions needed to administer all scales. CIDI-SC versus SCID prevalence differences are insignificant at the optimal CIDI-SC diagnostic thresholds (χ2 1 = 0.0-2.9, p = 0.09-0.94). Individual-level diagnostic concordance at these thresholds is substantial (AUC 0.81-0.86, sensitivity 68.0-80.2%, specificity 90.1-98.8%). Likelihood ratio positive (LR+) exceeds 10 and LR- is 0.1 or less at informative thresholds for all diagnoses. Conclusions: CIDI-SC operating characteristics are equivalent (MDE, GAD) or superior (PD, BPD) to those of the best alternative screening scales. CIDI-SC results can be compared directly to general population CIDI survey results or used to target and streamline second-stage CIDIs.
    Psychological Medicine 10/2012; 43(8):1-13. DOI:10.1017/S0033291712002334 · 5.94 Impact Factor
Show more