Special Issues for Building Computerized-Adaptive Tests for Measuring Patient-Reported Outcomes: The National Institute of Health??s Investment in New Technology

National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-7344, USA.
Medical Care (Impact Factor: 3.23). 12/2006; 44(11 Suppl 3):S198-204. DOI: 10.1097/01.mlr.0000245146.77104.50
Source: PubMed
1 Follower
7 Reads
  • Source
    • "Moreover, the ADLib-cardio is free of DIF with regard to six further socio-demographic (gender, age, educational level, employment status) and medical variables (intensity of pain, subjective limitations due to CVD). The test-fairness of the ADLib-cardio is of particular importance, as an unfair item can heavily impact results of instruments with a low item number [39]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: To develop and calibrate the activities of daily living item bank (ADLib-cardio) as a prerequisite for a Computer-adaptive test (CAT) for the assessment of ADL in patients with cardiovascular diseases (CVD). After pre-testing for relevance and comprehension a pool of 181 ADL items were answered on a five-point Likert scale by 720 CVD patients, who were recruited in fourteen German cardiac rehabilitation centers. To verify that the relationship between the items is due to one factor, a confirmatory factor analysis (CFA) was conducted. A Mokken analysis was computed to examine the double monotonicity (i.e. every item generates an equivalent order of person traits, and every person generates an equivalent order of item difficulties). Finally, a Rasch analysis based on the partial credit model was conducted to test for unidimensionality and to calibrate the item bank. Results of CFA and Mokken analysis confirmed a one factor structure and double monotonicity. In Rasch analysis, merging response categories and removing items with misfit, differential item functioning or local response dependency reduced the ADLib-cardio to 33 items. The ADLib-cardio fitted to the Rasch model with a nonsignificant item-trait interaction (chi-square=105.42, df=99; p=0.31). Person-separation reliability was 0.81 and unidimensionality could be verified. The ADLib-cardio is the first calibrated, unidimensional item bank that allows for the assessment of ADL in rehabilitation patients with CVD. As such, it provides the basis for the development of a CAT for the assessment of ADL in patients with cardiovascular diseases. Calibrating the ADLib-cardio in other than rehabilitation cardiovascular patient settings would further increase its generalizability.
    Full-text · Article · Aug 2013 · Health and Quality of Life Outcomes
  • Source
    • "Computerized adaptive testing (CAT) is widely used in education and has gained acceptance as a mode for administering health outcomes measures [1,2]. CAT offers several potential advantages over conventional (e.g., paper-and-pencil) administration, including automated scoring and storage of questionnaire data, and reduction of respondent burden. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected. Method Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust Z (RZ) test, and (2) 95% credible intervals (CrI) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in θ estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition. Results Both methods evidenced good to excellent false positive control, with RZ providing better control of false positives and with slightly higher power for CrI, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. RZ outperformed CrI, due to better control of false positive DIF. Conclusions Whereas false positives were well controlled, particularly for RZ, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.
    Full-text · Article · Aug 2012 · BMC Medical Research Methodology
  • Source
    • "Such conflicting findings of performance between the newer item selection methods versus the classical MFI inspired us to undertake this study. Furthermore, interest in polytomous items is growing with the recent use of CAT technology in patient-reported outcomes (PROs) such as mental health, pain, fatigue, and physical functioning (Reeve, 2006). Most PRO measures are constructed using Likert-type items more befitting of polytomous models. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six different selection methods are compared systematically. The results showed no clear benefit from more sophisticated selection criteria and showed one method previously believed to be superior-the maximum expected posterior weighted information (MEPWI)-to be mathematically equivalent to a simpler method, the maximum posterior weighted information (MPWI).
    Full-text · Article · Sep 2009 · Applied Psychological Measurement
Show more