Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

National Cancer Institute, NIH, Bethesda, Maryland 20892, USA.
Medical Care (Impact Factor: 3.23). 06/2007; 45(5 Suppl 1):S22-31. DOI: 10.1097/01.mlr.0000250483.85507.04
Source: PubMed


The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project.
Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks.
Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking.
Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.

Download full-text


Available from: Karon F Cook, Sep 30, 2015
456 Reads
  • Source
    • "Comparative Fit Index (CFI) > 0.95, Tucker-Lewis index (TLI) > 0.95, and Root Mean Square Error of Approximation (RMSEA) < 0.08 (Reeve et al., 2007 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective Previous studies have identified differential item function (DIF) in depressive symptoms measures, but the impact of DIF has been rarely reported. Given the critical importance of depressive symptoms assessment among older adults, we examined whether DIF due to demographic characteristics resulted in salient score changes in commonly used measures.Methods Four longitudinal studies of cognitive aging provided a sample size of 3754 older adults and included individuals both with and without a clinical diagnosis of major depression. Each study administered at least one of the following measures: the Center for Epidemiologic Studies Depression scale (20-item ordinal response or 10-item dichotomous response versions), the Geriatric Depression Scale, and the Montgomery–Åsberg Depression Rating Scale. Hybrid logistic regression-item response theory methods were used to examine the presence and impact of DIF due to age, sex, race/ethnicity, and years of education on the depressive symptoms items.ResultsAlthough statistically significant DIF due to demographic factors was present on several items, its cumulative impact on depressive symptoms scores was practically negligible.Conclusions The findings support substantive meaningfulness of previously reported demographic differences in depressive symptoms among older adults, showing that these individual differences were unlikely to have resulted from item bias attributable to demographic characteristics we examined. Copyright © 2014 John Wiley & Sons, Ltd.
    International Journal of Geriatric Psychiatry 01/2015; 30(1):88-96. DOI:10.1002/gps.4121 · 2.87 Impact Factor
  • Source
    • "Samejima's Graded Response Model was used for ordinal variables (Samejima, 1969). The unidimensionality assumption for each depression scale was assessed with a single-factor model in Mplus 6.11 (Muthén and Muthén, 1998–2007) using conventional criteria for acceptable model fit: confirmatory fit index (CFI) > 0.95, Tucker Lewis Index (TLI) > 0.95, and root mean squared error of approximation (RMSEA) < 0.08 (Reeve et al., 2007). When the assumption of unidimensionality was questionable, we accounted for any residual correlations among symptom items in Multiple Indicators Multiple Causes (MIMIC) modeling. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this study is to determine whether differential item functioning (DIF) due to cognitive status impacted three depressive symptoms measures commonly used with older adults. Differential item functioning in depressive symptoms was assessed among participants (N = 3558) taking part in four longitudinal studies of cognitive aging, using the Geriatric Depression Scale, the Montgomery-Åsberg Depression Rating Scale, and the Center for Epidemiologic Studies Depression Scale. Participants were grouped by cognitive status using a general cognitive performance score derived from each study's neuropsychological battery and linked to a national average using a population-based survey representative of the US population. The Clinical Dementia Rating score was used as an alternate grouping variable in three of the studies. Although statistically significant DIF based on cognitive status was found for some depressive symptom items (e.g., items related to memory complaints, appetite loss, lack of energy, and mood), the effect of item bias on the total score for each scale was negligible. The depressive symptoms scales in these four studies measured depression in the same way, regardless of cognitive status. This may reduce concerns about using these depression measures in cognitive aging research, as relationships between depression and cognitive decline are unlikely to have been due to item bias, at least in the ways that were measured in the datasets we considered. Copyright © 2014 John Wiley & Sons, Ltd. Copyright © 2014 John Wiley & Sons, Ltd.
    International Journal of Geriatric Psychiatry 12/2014; 30(9). DOI:10.1002/gps.4234 · 2.87 Impact Factor
  • Source
    • "The ordinal items were regressed on the scale-factor by probit regressions estimated by a robust weighted least squares estimator with mean and variance adjustment (WLSMV) [28,29]. Appropriateness of the initial one-factor model for each scale was assessed by: 1) overall goodness-of-fit statistics including the comparative fit index (CFI) and the root mean square error of approximation (RMSEA), where CFI >0.95 and RMSEA < 0.08 were regarded as appropriate fit [30-34]; 2) magnitude of factor loadings; 3) model residual correlations (RC) and 4) modification indices (MI) [28,35]. For the latter three criteria, their magnitude was evaluated in comparison to other items in the scale and in an integrative manner, taking all three under consideration at once, so no strict thresholds were applied for each criterion. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background and aimThyroid diseases are prevalent and chronic. With treatment, quality of life is restored in most, but not all patients. Construct validity of the thyroid-related quality of life questionnaire, ThyPRO, has been established by multi-trait scaling, but not evaluated with more elaborate methods. The purpose of the present study was to evaluate dimensionality of the ThyPRO scales and to attempt to understand possible item misfit through structural equation modeling for categorical data.Methods The current 84-item version of ThyPRO consists of 13 scales, covering domains of physical (4 scales) and mental (2 scales) symptoms, function and well-being (3 scales) and participation/social function (4 scales). The data were collected from a cross-sectional sample of 907 thyroid patients. One-factor confirmatory models were fitted to each scale, and evaluated by model fit statistics (comparative fit index >0.95, root mean square error of approximation <0.08), magnitude of factor loadings, model residual correlations and modification indices (MI). Indications of multi-dimensionality were tested in bi-factor models. Possible item misfit was evaluated in a combined, investigational model.ResultsEach ThyPRO scale was adequately represented by a unidimensional model after minor revisions. Eleven items were identified in the unidimensional models as potentially misfitting and were investigated further by multidimensional modeling.Conclusion Elaborate psychometric modeling supported the construct validity of the ThyPRO. However, 11 potentially misfitting items and 18 items with local dependence to other items are candidates for removal in future item reduction processes.
    Health and Quality of Life Outcomes 09/2014; 12(1):126. DOI:10.1186/s12955-014-0126-z · 2.12 Impact Factor
Show more