IRT health outcomes data analysis project: An overview and summary

Department of Rehabilitation Medicine, University of Washington School of Medicine, Seattle, Washington, USA.
Quality of Life Research (Impact Factor: 2.49). 02/2007; 16 Suppl 1(S1):121-32. DOI: 10.1007/s11136-007-9177-5
Source: PubMed


In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset.
A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset.
HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared.
The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites.
Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey.
Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.

Download full-text


Available from: Karon F Cook,
  • Source
    • "A value of 2.0 or greater for this statistic is comparable to reliability of 0.80 and is acceptable. To detect the presence of differential item functioning (DIF), which occurs when different groups within the sample respond in a different manner to an individual item [34], we compared the different levels of the trait by gender. A Welch t statistically significant at P< 0.05, and a difference in difficulty of at least 0.5 logit was considered as noticeable DIF [33]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The aims of this study were to propose a Spanish Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) short form based on previously shortened versions and to study its validity, reliability, and responsiveness for patients with hip osteoarthritis undergoing total hip replacement (THR). Prospective observational study of two independent cohorts (788 and 445 patients, respectively). Patients completed the WOMAC and the Short Form (SF)-36 questionnaires before THR and 6 months afterward. Patients received the questionnaires by mailing, and two reminder letters were sent to patients who had not replied the questionnaire. Based on two studies from the literature, we selected the two shortened domains, the pain domain composed of three items and the function domain composed of eight items. Thus, we proposed an 11-items WOMAC short form. A complete validation process was performed, including confirmatory factor analysis (CFA) and Rasch analysis, and a study of reliability, responsiveness, and agreement measured by the Bland-Altman approach. The mean age was about 69 years and about 49% were women. CFA analyses confirmed the two-factor model. The pain and function domains fit the Rasch model. Stability was supported with similar results in both cohorts. Cronbach's alpha coefficients were high, 0.74 and 0.88. The highest correlations in convergent validity were found with the bodily pain and physical function SF-36 domains. Significant differences were found according to different pain and function severity scales, supporting known-groups validity. Responsiveness parameters showed large changes (effect sizes, 2.11 and 2.29). Agreement between the WOMAC long and short forms was adequate. Since short questionnaires result in improved patient compliance and response rates, it is very useful to have a shortened WOMAC version with the same good psychometric properties as the original version. The Spanish WOMAC short form is valid, reliable, and responsive for patients undergoing THR, and most importantly, the first WOMAC short version proposed in Spanish. Because of its simplicity and ease of application, the short form is a good alternative to the original WOMAC questionnaire and it would further enhance its acceptability and usefulness in clinical research, clinical trials, and in routine practice within the orthopaedic community.
    Health and Quality of Life Outcomes 09/2011; 9(75):75. DOI:10.1186/1477-7525-9-75 · 2.12 Impact Factor
  • Source
    • "Model fit was evaluated using the goodness of fit index (GFI), root mean square error of approximation, and the residual correlation matrix was examined for possible violations of local independence. Absolute residual correlations greater than 0.20 were considered to be possible indicators of local dependence [35]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Past studies have used various methods to assess perceived risk of HIV infection; however, few have included multiple items covering different dimensions of risk perception or have examined the characteristics of individual items. This study describes the use of Item Response Theory (IRT) to develop a short measure of perceived risk of HIV infection scale (PRHS). An item pool was administered by trained interviewers to 771 participants. Participants also completed the risk behavior assessment (RBA) which includes items measuring risky sexual behaviors, and 652 participants completed HIV testing. The final measure consisted of 8 items, including items assessing likelihood estimates, intuitive judgments and salience of risk. Higher scores on the PRHS were positively associated with a greater number of sex partners, episodes of unprotected sex and having sex while high. Participants who tested positive for HIV reported higher perceived risk. The PRHS demonstrated good reliability and concurrent criterion-related validity. Compared to single item measures of risk perception, the PRHS is more robust by examining multiple dimensions of perceived risk. Possible uses of the measure and directions for future research are discussed.
    AIDS and Behavior 07/2011; 16(4):1075-83. DOI:10.1007/s10461-011-0003-2 · 3.49 Impact Factor
  • Source
    • "An early version of SIMPOLYCAT was used for an investigation of the effect of item bank on using various θ estimators and the graded response model (Chen, 2007). A CAT simulation using a cancer-related health related quality of life dataset and a simulated CAT of the Modified Rolland-Morris Back Disability Questionnaire also used an early version of SIMPOLYCAT (Cook et al., 2007; Cook, Crane, & Amtmann, 2006). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A real-data simulation of computerized adaptive testing (CAT) is an important step in real-life CAT applications. Such a simulation allows CAT developers to evaluate important features of the CAT system, such as item selection and stopping rules, before live testing. SIMPOLYCAT, an SAS macro program, was created by the authors to conduct real-data CAT simulations based on polytomous item response theory (IRT) models. In SIMPOLYCAT, item responses can be input from an external file or generated internally on the basis of item parameters provided by users. The program allows users to choose among methods of setting initial , approaches to item selection, trait estimators, CAT stopping criteria, polytomous IRT models, and other CAT parameters. In addition, CAT simulation results can be saved easily and used for further study. The purpose of this article is to introduce SIMPOLYCAT, briefly describe the program algorithm and parameters, and provide examples of CAT simulations, using generated and real data. Visual comparisons of the results obtained from the CAT simulations are presented.
    Behavior Research Methods 06/2009; 41(2):499-506. DOI:10.3758/BRM.41.2.499 · 2.93 Impact Factor
Show more