Comparison of CAT Item Selection Criteria for Polytomous Items.

Northwestern University Feinberg School of Medicine.
Applied Psychological Measurement (Impact Factor: 1.49). 09/2009; 33(6):419-440. DOI: 10.1177/0146621608327801
Source: PubMed

ABSTRACT Item selection is a core component in computerized adaptive testing (CAT). Several studies have evaluated new and classical selection methods; however, the few that have applied such methods to the use of polytomous items have reported conflicting results. To clarify these discrepancies and further investigate selection method properties, six different selection methods are compared systematically. The results showed no clear benefit from more sophisticated selection criteria and showed one method previously believed to be superior-the maximum expected posterior weighted information (MEPWI)-to be mathematically equivalent to a simpler method, the maximum posterior weighted information (MPWI).

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residuals<|2.5|) and no DIF or LD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment.
    Journal of psychosomatic research 11/2013; 75(5):437-43. · 2.91 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this study was to report on the measurement properties of the Pediatric Quality of Life Inventory™ (PedsQL™) Gastrointestinal Symptoms Module for patients with functional gastrointestinal (GI) disorders (FGIDs) and organic GI diseases, hereafter referred to as "GI disorders", for patient self-report ages 5-18 and parent proxy-report for ages 2-18 years. The 74-item PedsQL™ GI Module and 23-item PedsQL™ Generic Core Scales were completed in a 9-site study by 584 patients and 682 parents. Patients had physician-diagnosed GI disorders (Chronic Constipation, Functional Abdominal Pain, Irritable Bowel Syndrome, Functional Dyspepsia, Crohn's Disease, Ulcerative Colitis, Gastroesophageal Reflux Disease). Fourteen unidimensional scales were derived measuring stomach pain, stomach discomfort when eating, food and drink limits, trouble swallowing, heartburn and reflux, nausea and vomiting, gas and bloating, constipation, blood, diarrhea, worry, medicines, and communication. The PedsQL™ GI Module Scales evidenced excellent feasibility, excellent reliability for the Total Scale Scores (patient self-report α = 0.97; parent proxy-report α = 0.97), and good to excellent reliability for the 14 individual scales (patient self-report α = 0.67-0.94; parent proxy-report α = 0.77-0.95). Intercorrelations with the Generic Core Scales supported construct validity. Individual Symptoms Scales known-groups validity across 7 GI disorders was generally supported. Factor analysis supported the unidimensionality of the individual scales. The PedsQL™ GI Module Scales demonstrated acceptable to excellent measurement properties, and may be utilized as common metrics to compare GI-specific symptoms in clinical research and practice both within and across patient groups for FGIDs and organic GI diseases.
    Journal of pediatric gastroenterology and nutrition 05/2014; · 2.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler information, and maximum expected posterior weighted Kullback-Leibler information procedures. The ability estimation methods investigated were maximum likelihood estimation (MLE), weighted likelihood estimation (WLE), and expected a posteriori (EAP). Results suggested that all item selection procedures, regardless of the information functions on which they were based, performed equally well across ability estimation methods. The principal conclusions drawn about the ability estimation methods are that MLE is a practical choice and WLE should be considered when there is a mismatch between pool information and the population ability distribution. EAP can serve as a viable alternative when an appropriate prior ability distribution is specified. Several implications of the findings for applied measurement are discussed.
    Applied Measurement in Education 10/2012; 25(4). · 0.37 Impact Factor

Full-text (2 Sources)

Available from
May 28, 2014