Comparing Alternative Rasch-Based Methods vs Raw Scores in Measuring Change in Health

Oxford Brookes University, Oxford, England, United Kingdom
Medical Care (Impact Factor: 3.23). 02/2004; 42(1 Suppl):I25-36. DOI: 10.1097/01.mlr.0000103530.13056.88
Source: PubMed


To compare alternative Rasch-based approaches to the assessment of change over time through the example of an outcome measure used in total hip replacement surgery.
Preoperative data were collected on 1424 patients receiving total hip replacement surgery; 1221 (86%) were sent follow-up questionnaires 1 year after surgery.
The 12-item Oxford Hip Score (OHS) questionnaire administered preoperatively and 1-year postoperatively.
Subscales of the OHS for pain and functional impairment were examined for unidimensionality and item invariance. Two criteria were used to examine Rasch-based measurement of the 2 subscales. Advantages of Rasch measurement were examined in terms of whether it produced improved discrimination of outcomes of patients (1) undergoing different levels of complexity of surgery; and (2) reporting different retrospective judgments of the success of their surgery. Using the method of relative precision in relation to groups of patients distinguished in these 2 ways, change scores using Likert scoring methods were compared with 2 Rasch scoring methods: (1) separate analyses of the 2 time points; and (2) a common scale analysis obtained by stacking patients from the 2 time points.
Less evidence for item invariance over time was found for the pain subscale. Other evidence supported treating subscales as unidimensional. Whichever Rasch scoring method was used, some gains in precision over standard Likert scoring were obtained in discriminating between groups of patients.
The evidence from the current study suggests that there may be some gains in sensitivity to change of outcome measures from different Rasch-based scoring approaches.

Download full-text


Available from: Jill Dawson, Nov 17, 2014
  • Source
    • ") and shown that the problem can be resolved by providing linear interval transformation of the ordinal raw score, thereby permitting the use of parametric statistical techniques on the questionnaire data (Norquist et al. 2004; Garamendi et al. 2006; Pesudovs 2006). Other unique evaluations available in Rasch analysis include assessment of how well the difficulty of an item targets the level of person ability in the population and measurement of scale validity – in particular, item and person fit to the overall construct (Fisher et al. 1997; Mallinson et al. 2004; Pesudovs et al. 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of this study was to use the Catquest-9SF to measure cataract surgery outcomes, and to use Rasch analysis to test the psychometric properties of this questionnaire, including its validity and responsiveness. Patients were recruited as consecutive cataract surgery patients during 1 month at six surgical units in Sweden (via the National Cataract Register). The patients completed the questionnaire before surgery and 3 months after. The Catquest-9SF data were assessed for fit to the Rasch model using version 3.63.2 of the WINSTEPS software (, Beaverton, OR, USA). Both preoperative and postoperative questionnaires were included in the analysis. The responsiveness to cataract surgery was calculated as the effect size. Completed questionnaires before and after surgery were received from 846 patients. The Rasch analysis showed that the category thresholds were ordered. All items fit a single overall construct (infit range 0.79-1.40; outfit range 0.74-1.40). The ability to discriminate different strata of person ability was good, with a real patient separation of 2.58 and patient separation reliability of 0.87. The questionnaire showed unidimensionality and was largely free from differential item functioning. The item difficulty was reasonably well targeted to both preoperative and postoperative patient ability. The Catquest-9SF Rasch score correlated significantly with visual acuity, and cataract surgery resulted in a significant improvement with an effect size of 1.8. The Catquest-9SF shows excellent psychometric properties, as demonstrated by Rasch analysis. It is highly responsive to cataract surgery, and its brevity (nine items) makes it well suited for use in daily clinical practice.
    Acta ophthalmologica 12/2011; 89(8):718-23. DOI:10.1111/j.1755-3768.2009.01801.x · 2.84 Impact Factor
  • Source
    • "visual disability). Over the last few years the limitations of the CTT have been widely acknowledged and the use of item response theory (IRT), specifically Rasch analysis, has been advocated (Raczek et al. 1998; Massof 2002; Norquist et al. 2004). Among the numerous properties of Rasch analysis, it also enables insight into content validity and targeting of item difficulty to patient ability, classically not possible with CTT. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The visual functioning index (VFI) was one of the first questionnaires developed using classical test theory to assess outcomes of cataract surgery. However, it was not Rasch-validated. The objective of this study was to examine the psychometric properties of the VFI using Rasch analysis in patients with cataract. The 11-item VFI was self-administered to 243 patients (mean age 73.9 years) drawn from a cataract surgery waiting list. We examined the response category thresholds, item fit statistics, differential item functioning and unidimensionality for the VFI and its three subscales. Category thresholds were ordered. The person separation and reliability were low, indicating the poor discriminatory ability of the VFI. No items misfit but there was suboptimal targeting of item difficulty to patient ability. On the whole the items in the VFI were too easy for the sample. Only one item showed moderate differential item functioning. The VFI does not meet the stringent requirements of the Rasch model. However adding more items to suit the more able patients with cataract as well as those awaiting second-eye cataract surgery could optimize the VFI.
    Acta ophthalmologica 07/2009; 88(7):797-803. DOI:10.1111/j.1755-3768.2009.01562.x · 2.84 Impact Factor
  • Source
    • "A demonstration of this nature is rare. It is much more common that scales developed using traditional methods to be tested post hoc using new approaches [35]. Nevertheless, direct comparisons of new and traditional psychometric methods of any nature in the medical literature are sparse, and at best superficial [36,37] In part, this may be due to the fact that these two approaches cannot be compared easily as they use different methods, produce different information, and apply different criteria for success and failure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The United States Food and Drug Administration (FDA) are currently producing guidelines for the scientific adequacy of patient reported outcome measures (PROMs) in clinical trials, which will have implications for the selection of scales used in future clinical trials. In this study, we examine how the Cervical Dystonia Impact Profile (CDIP-58), a rigorous Rasch measurement developed neurologic PROM, stands up to traditional psychometric criteria for three reasons: 1) provide traditional psychometric evidence for the CDIP-58 in line with proposed FDA guidelines; 2) enable researchers and clinicians to compare it with existing dystonia PROMs; and 3) help researchers and clinicians bridge the knowledge gap between old and new methods of reliability and validity testing. We evaluated traditional psychometric properties of data quality, scaling assumptions, targeting, reliability and validity in a group of 391 people with CD. The main outcome measures used were the CDIP-58, Medical Outcome Study Short Form-36, the 28-item General Health Questionnaire, and Hospital and Anxiety and Depression Scale. A total of 391 people returned completed questionnaires (corrected response rate 87%). Analyses showed: 1) data quality was high (low missing data < or = 4%, subscale scores could be computed for > 96% of the sample); 2) item groupings passed tests for scaling assumptions; 3) good targeting (except for the Sleep subscale, ceiling effect = 27%); 4) good reliability (Cronbach's alpha > or = 0.92, test-retest intraclass correlations > or = 0.83); and 5) validity was supported. This study has shown that new psychometric methods can produce a PROM that stands up to traditional criteria and supports the clinical advantages of Rasch analysis.
    Health and Quality of Life Outcomes 09/2008; 6(1):58. DOI:10.1186/1477-7525-6-58 · 2.12 Impact Factor
Show more