Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

National Cancer Institute, NIH, Bethesda, Maryland 20892, USA.
Medical Care (Impact Factor: 3.23). 06/2007; 45(5 Suppl 1):S22-31. DOI: 10.1097/01.mlr.0000250483.85507.04
Source: PubMed


The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project.
Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks.
Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking.
Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.

Download full-text


Available from: Karon F Cook,
561 Reads
  • Source
    • "PROMIS item banks provide a comprehensive profile of health status, including physical functioning, pain, fatigue, sleep disturbance, emotional distress, alcohol use, and social participation (Buysse et al., 2010; Cella et al., 2010; Fries et al., 2009; Pilkonis et al., 2011, 2013; Revicki et al., 2009). PROMIS is the most ambitious attempt to date to apply models from item response theory (IRT) to health-related assessment (Cella et al., 2010; Hilton, 2011; Reeve et al., 2007). We report here on the development and calibration of two item banks measuring severity of substance use and the positive appeal of substance use, an important motivational factor influencing the development and "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS(®)): severity of substance use and positive appeal of substance use. Methods: Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461 patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results: Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions: Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings.
    Drug and alcohol dependence 10/2015; 156. DOI:10.1016/j.drugalcdep.2015.09.008 · 3.42 Impact Factor
  • Source
    • "It is sufficiently flexible to model item parameter drift and response dependence across time points. Recently IRT models have been used increasingly in health status measurement and evaluation of patient reported outcomes (PROs) like physical functioning and psychological well-being (Reeve et al. 2007). E.g., the simplest IRT model, the Rasch (1960) model (Fischer and Molenaar 1995; Christensen, Kreiner, and Mesbah 2013), is increasingly used for validation of measurement instruments (Tennant and Conaghan 2007) and has been shown to be superior to classical approaches (Blanchin et al. 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Item response theory models are often applied when a number items are used to measure a unidimensional latent variable. Originally proposed and used within educational research, they are also used when focus is on physical functioning or psychological wellbeing. Modern applications often need more general models, typically models for multidimensional latent variables or longitudinal models for repeated measurements. This paper describes a SAS macro that fits two-dimensional polytomous Rasch models using a specification of the model that is sufficiently flexible to accommodate longitudinal Rasch models. The macro estimates item parameters using marginal maximum likelihood estimation. A graphical presentation of item characteristic curves is included.
    Journal of statistical software 10/2015; 67(Code Snippet 2). DOI:10.18637/jss.v067.c02 · 3.80 Impact Factor
  • Source
    • "Comparative Fit Index (CFI) > 0.95, Tucker-Lewis index (TLI) > 0.95, and Root Mean Square Error of Approximation (RMSEA) < 0.08 (Reeve et al., 2007 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Previous studies have identified differential item function (DIF) in depressive symptoms measures, but the impact of DIF has been rarely reported. Given the critical importance of depressive symptoms assessment among older adults, we examined whether DIF due to demographic characteristics resulted in salient score changes in commonly used measures.Methods Four longitudinal studies of cognitive aging provided a sample size of 3754 older adults and included individuals both with and without a clinical diagnosis of major depression. Each study administered at least one of the following measures: the Center for Epidemiologic Studies Depression scale (20-item ordinal response or 10-item dichotomous response versions), the Geriatric Depression Scale, and the Montgomery–Åsberg Depression Rating Scale. Hybrid logistic regression-item response theory methods were used to examine the presence and impact of DIF due to age, sex, race/ethnicity, and years of education on the depressive symptoms items.ResultsAlthough statistically significant DIF due to demographic factors was present on several items, its cumulative impact on depressive symptoms scores was practically negligible.Conclusions The findings support substantive meaningfulness of previously reported demographic differences in depressive symptoms among older adults, showing that these individual differences were unlikely to have resulted from item bias attributable to demographic characteristics we examined. Copyright © 2014 John Wiley & Sons, Ltd.
    International Journal of Geriatric Psychiatry 01/2015; 30(1):88-96. DOI:10.1002/gps.4121 · 2.87 Impact Factor
Show more