Modifying Measures Based on Differential Item Functioning (DIF) Impact Analyses
ABSTRACT OBJECTIVE: Measure modification can impact comparability of scores across groups and settings. Changes in items can affect the percent admitting to a symptom. METHODS: Using item response theory (IRT) methods, well-calibrated items can be used interchangeably, and the exact same item does not have to be administered to each respondent, theoretically permitting wider latitude in terms of modification. RESULTS: Recommendations regarding modifications vary, depending on the use of the measure. In the context of research, adjustments can be made at the analytic level by freeing and fixing parameters based on findings of differential item functioning (DIF). The consequences of DIF for clinical decision making depend on whether or not the patient's performance level approaches the scale decision cutpoint. High-stakes testing may require item removal or separate calibrations to ensure accurate assessment. DISCUSSION: Guidelines for modification based on DIF analyses and illustrations of the impact of adjustments are presented.
- SourceAvailable from: Natalia O. Dmitrieva
[Show abstract] [Hide abstract]
- "The presence of statistically significant item bias leads earlier researchers to recommend dropping items with DIF (e.g., Stommel et al., 1993). As explained in a more recent work (e.g., Teresi et al., 2012), dropping items with DIF is unnecessary. "
ABSTRACT: Objective Previous studies have identified differential item function (DIF) in depressive symptoms measures, but the impact of DIF has been rarely reported. Given the critical importance of depressive symptoms assessment among older adults, we examined whether DIF due to demographic characteristics resulted in salient score changes in commonly used measures.Methods Four longitudinal studies of cognitive aging provided a sample size of 3754 older adults and included individuals both with and without a clinical diagnosis of major depression. Each study administered at least one of the following measures: the Center for Epidemiologic Studies Depression scale (20-item ordinal response or 10-item dichotomous response versions), the Geriatric Depression Scale, and the Montgomery–Åsberg Depression Rating Scale. Hybrid logistic regression-item response theory methods were used to examine the presence and impact of DIF due to age, sex, race/ethnicity, and years of education on the depressive symptoms items.ResultsAlthough statistically significant DIF due to demographic factors was present on several items, its cumulative impact on depressive symptoms scores was practically negligible.Conclusions The findings support substantive meaningfulness of previously reported demographic differences in depressive symptoms among older adults, showing that these individual differences were unlikely to have resulted from item bias attributable to demographic characteristics we examined. Copyright © 2014 John Wiley & Sons, Ltd.International Journal of Geriatric Psychiatry 01/2015; 30(1):88-96. DOI:10.1002/gps.4121 · 3.09 Impact Factor
[Show abstract] [Hide abstract]
- "So far, most work has mainly focused on detection of DIF rather than on its actual practical impact . The few studies that have investigated practical impact show mixed results . Some found low practical impact  , whereas others found substantial impact  . "
ABSTRACT: Objective To investigate the impact of differences in depressive symptom reporting across clinical groups (healthcare setting, chronic illness, depression diagnosis and anxiety diagnosis) on clinical interpretability and comparability of depression scores. Methods Participants from the Netherlands Study of Depression and Anxiety (n = 2981) completed the self-report Inventory of Depressive Symptomatology (IDS-SR). Differences in depressive symptom reporting between distinct clinical subpopulations were assessed using a Differential Item Functioning (DIF) analysis. The effects of DIF on symptom level were evaluated by examining whether DIF-adjustment had clinically relevant effects. Results Significant DIF was detected across all tested clinical subpopulation groupings. Clinically relevant DIF was found on the symptom level for 13 IDS-SR items. However, impact of DIF on the aggregate level ranged from small to negligible: adjustment for DIF only led to salient changes in aggregate scores for 0.2-12.7% of individuals across tested sources of DIF. Conclusion Differences in endorsement patterns of depressive symptoms were observed across clinical populations, challenging the assumptions regarding the measurement properties of self-reported depression. However, effects of DIF on the aggregate level of IDS-SR total scores were found to be minimal and not clinically important. The IDS-SR thus seems robust against DIF across clinical populations.Journal of Psychosomatic Research 08/2014; 78(2). DOI:10.1016/j.jpsychores.2014.08.014 · 2.84 Impact Factor
- Journal of Aging and Health 09/2012; 24(6):985-91. DOI:10.1177/0898264312457750 · 1.56 Impact Factor