ArticlePDF Available

The effect of sample size on the item differential functioning in the context of item response theory

Authors:

Abstract

The current study examined the effect of differing sample size to detect the Item Differential Functioning (DIF) and the study has been used three different sizes of the samples (300,500,1000) as well as to test a component of twenty polytomous items, where each item has five categories, and were used Graded Response Model as a single of polytomous item response theory models to estimating items and individuals parameters. And the study has been used Mantel-Haenszel (MH) way to detecting (DIF) and that through each of case for the different samples, and the results of the study showed the inverse relationship between the sample size and the number of items which showed a differential performer.







https://doi.org/10.52839/0111-000-072-004







         




The Effect of Sample Size on the Item Differential Functioning in
the Context of Item Response Theory
Ziad Abdullah
Theology College Gaziantep University Turkey
E-mail
Ziadsy@gmail.com
Ziadsy@gantep.edu.tr
Abstract
The current study examined the effect of different sample sizes to detect the Item differential
functioning (DIF). The study has used three different sizes of the samples (300, 500, 1000),
as well as to test a component of twenty polytomous items, where each item has five
categories. They were used Graded Response Model as a single polytomous item response
theory model to estimate items and individuals’ parameters. The study has used the Mantel-
Haenszel (MH) way to detect (DIF) through each case for the different samples. The results
of the study showed the inverse relationship between the sample size and the number of
items, which showed a differential performer.
Keywords: Item Differential Functioning (IDF), sample size, polytomous item
response theory models, Graded Response Model (GRM).






   

(CTT TheoryClassical Test 





Item Response Theory (IRT)




Item Bias
 
 
Item Differential Functioning (DIF)        






              










     Graded Response
Model    Polytomous       
Dichotomous
Item Differential Functioning (DIF)
     

              
Refrence groupFocal group


       Dichotomous 
Polytomous
Hidalgo & Gómez-Benito, 2010; Millsap & Everson, 1993; Potenza & Dorans,
1995DIFUniform DIF


Non-Uniform DIF
Finch, 2005
Mellenbergh, 1982

DIF    
DIF 
Raju & Ellis, 2002



              


DIF Detection Methods


   



     DIF



Hidalgo & Gómez-Benito, 2010;Millsap & Everson, 1993; Potenza & Dorans,
1995
Penfield & Lam, 2000; Penfield & Camilli, 2007
Mantel-Haenszel (MH)



Educational Testing
Service (ETS)
Guilera, Gómez-Benito & Hidalgo, 2009; Ackerman & Evans, 1992; Allen &
Donoghue, 1996; Clauser, Mazor, & Hambleton, 1991; Clauser, 1993; Mazor,
Clauser, & Hambleton, 1992; Mazor et al., 1994; Uttaro & Millsap, 1994
             



MH

Mantel & Haenszel, 1959

          Reference
Focal2x2


  2 x C x K2
CK

2x2

iDIF





















ci

cijr

cijf





c

i

r

f


j



i

r


j



i

f


j



MH =󰇣
 
 󰇤

 󰇛󰇜

     F    k   

 Fk
 
c
GR
Ci

k

     MH       
c-1c
β
common odds ratio





……………..(2)

ij

ij

ij

ij




ij


Holland and Thayer, 1988
 󰇛󰇜 ………… (3)




Zwick & Ercikan, 1989
A



B



 C



  

EasyDIF Kumagai, 2012 K


K
k=󰇛󰇜󰇛󰇜

 󰇛󰇜󰇛󰇜 󰇛󰇜 󰇛󰇜 󰇛󰇜
 ………(4)

󰇛󰇜

󰇛󰇜


.
.
N





K



……….(5) (C-1) x 0.1W=
C

0.4(5-1) x 0.1W=
K > W

EasyDIF≠≠≠
K

EasyDIF
Graded Response Model (GRM)

Samejima, 1972
Likert-type scale

……………….(6)
ki
D

i
a
i

i
b
i





Thurstone's approach 

Anderson, 2003  



2
1 VS. 2,3,4
i
b

3
1,2 VS. 3,4
i
b

4
(1,2,3 VS. 4)
i
b



González-Romá et al., 2006

DIF

Kim, Cohen & Kim, 1994

3PLMRoussos & Stout, 1996

SIBTEST




Simulation study
WinGenHan, & Hambleton, 2007)
   

Ching, 2002; Daniel & Joshua, 2009; Shudong, 1999; Fitzpatrick & Wendy, 2001;
Kinsey, 2003, 2012; Williams, 2003
Harwell et al., 1996
MonteCarlo studies











GRM

N(0 , 1)































1
GRM
5
-0.361
-0.992
0.160
0.862
1.322
2
GRM
5
0.417
-1.134
-1.080
0.349
1.282
3
GRM
5
-0.839
0.235
0.376
0.881
1.393
4
GRM
5
-0.199
-1.505
-0.729
-0.451
0.234
5
GRM
5
2.229
-0.831
-0.255
-0.142
0.807
6
GRM
5
0.701
-1.916
-0.805
-0.127
0.639
7
GRM
5
-0.270
-0.196
0.516
0.996
1.679
8
GRM
5
-0.174
-0.875
-0.396
0.745
1.545



9
GRM
5
-0.692
-1.061
-0.562
0.464
0.780
10
GRM
5
-0.282
0.274
1.254
2.040
2.655
11
GRM
5
-0.356
-0.065
0.854
0.884
1.043
12
GRM
5
-1.478
-0.766
-0.092
0.188
1.453
13
GRM
5
0.000
-1.882
0.303
0.392
0.987
14
GRM
5
0.178
-1.261
-0.569
-0.412
0.373
15
GRM
5
1.411
-1.012
-0.249
-0.146
1.212
16
GRM
5
-0.549
-0.402
-0.142
0.237
0.404
17
GRM
5
-0.764
-1.029
-1.013
-0.971
0.312
18
GRM
5
1.881
0.351
0.395
0.878
1.401
19
GRM
5
-0.117
-1.091
-0.487
-0.007
1.309
20
GRM
5
-1.078
-0.739
-0.260
-0.192
0.220


      
       








   
DIF



EasyDIF





p








All USE
 NonDIF     
Study
All not USE
   
All USE




K



           


N=300

Kx0,1=0.45-1




             








IGF)RDIF
b1b2b3b4
aK


N=300
N=500

K0.4


         

             






N = 300
ةئفلا
b1
b2
b3
b4
a
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K

1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
-2.398
-0.572
-2.120
-0.540
-0.245
0.299
1.742
0.937
0.119
0.614
0.506
###
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.119
0.614
0.555
###
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.119
0.614
0.555
###
5
-0.843
-1.257
-0.391
-0.374
-0.277
-0.252
0.532
0.744
1.445
1.285
0.127
6
-2.195
-3.175
-0.810
-1.718
-0.253
-0.900
0.416
0.490
0.455
0.264
0.202
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
14
-0.828
0.000
-0.554
0.000
-0.282
0.000
-0.15
0.000
0.115
0.264
0.204
15
-1.906
-1.455
-0.666
-0.434
-0.574
-0.296
1.073
1.108
0.712
0.805
0.167
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.712
0.805
0.071
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.712
0.805
0.071
18
0.442
0.0543
0.468
0.150
0.949
0.804
1.701
1.414
0.878
1.048
0.219
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.878
1.048
0.105
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.878
1.048
0.105

N=300
N=500

 K0.4  
             


 







N = 500


b1
b2
b3
b4
A
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K

1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5
-0.430
-0.880
-0.117
-0.212
-0.068
-0.111
0.571
0.780
3.064
1.154
0.463
###
6
-1.998
-2.503
-0.889
-0.967
-0.275
0.237
0.749
1.197
0.370
0.327
0.103
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.000
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
14
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
15
-1.103
-0.823
-0.242
0.019
-0.184
0.143
1.824
1.457
0.580
0.705
0.162
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.580
0.705
0.112
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.580
0.705
0.112
18
0.423
0.337
0.505
0.379
1.015
0.340
1.476
1.325
1.193
1.233
0.097
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.193
1.233
0.020
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.193
1.233
0.020





N=500
N=1000

K0.4











N = 1000

b1
b2
b3
b4
a
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K

1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
-0.861
-1.074
-0.808
-1.061
0.363
0.370
1.279
1.138
0.429
0.461
0.095
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.429
0.461
0.034
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.429
0.461
0.034
5
-0.729
-0.640
-01.66
-0.170
-0.092
-0.096
0.754
0.676
2.299
3.160
0.109
6
-1.660
-2.289
-0.697
-0.929
-0.226
-0.230
0.356
0.503
0.843
0.578
0.194
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217



9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
14
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
15
-0.953
-
0.884
-
0.252
-0.202
-0.139
-0.102
1.057
1.075
1.458
1.628
0.047
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.458
1.628
0.062
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.458
1.628
0.062
18
0.382
0.368
0.392
0.383
0.859
0.828
1.270
1.313
2.068
2.230
0.017
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2.068
2.230
0.037
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2.068
2.230
0.037





 














               















Tan & Gierl, 2005












Computerized Adaptive Testing (CAT)



















               

Recommendations
The current study recommends careful attention to the factors affecting
determining the specifications of the item and its differential performance between
different groups, whether by gender, race or culture, before storing this item in the
item bank and benefiting from it in measuring capabilities, and the importance of this
process is highlighted in computerized adaptive testing, which depends on the
measurement of capabilities based on a pre-test to determine the individual's ability
and then examine him with a test appropriate to his ability. If the used item in the
pre-assessment of an individual's ability suffers from a differential performance, this
will give him a test that is disproportionate to his ability, and this will undoubtedly
lead to a wrong decision that affects the objectivity and fairness of the measurement
process.



Suggestions for future research
Through the results of the study, the researcher suggests conducting the following
studies as future research:
1. Conducting a study using real data and comparing it with simulation data to
determine the accuracy of the results.
2. Discuss the issue of the effect of sample size on the differential item functioning
through using other models of item response models such as the partial credit
model, generalized partial credit model and other models in this field.
3. Carrying out a study to compare the differential item functioning when the item
response models differed between polytomous and dichotomous, since
polytomous models typically require a larger sample size than they do in
dichotomous models.
4. Studying the effect of the length of the test on the differential item functioning, it
may play an influencing role in judging the item performance.




      

1. Abdullah, Z. (2012) Effect of some Estimation Methods on Accuracy of
Estimating Parameters in Polytomous Item Response Models. Unpublished
Doctoral Dissertation, Institute of Educational Studies, Cairo University.
2. Ackerman, T. A. & Evans, J. A. (1992). An investigation of the relationship
between reliability, power, and the Type I error rate of the Mantel-Haenszel and
simultaneous item bias detection procedures. Annual Meeting of the National
Council on Measurement in Education, San Francisco.
3. Allen, N. L., & Donoghue, J. R. (1996). Applying the Mantel-Haenszel
procedure to complex samples of items. Journal of Educational Measurement,
33 (2), 231-251.
4. Ching-Fung B. Si. (2002). Ability Estimation Under Different Item
Parameterization and Scoring Models. Unpublished Doctoral Dissertation,
University of North Texas.
5. Clauser, B. E., Mazor, K., & Hambleton, R. K. (1991). An examination of item
characteristics on Mantel-Haenszel detection rates. Annual Meeting of the
National Council on Measurement in Education, Chicago.
6. Daniel, J., Joshua, G. (2009). A Comparison of IRT Parameter Recovery in
Mixed Format Examinations Using PARSCALE and ICL. Poster presented at the
Annual meeting of Northeastern Educational Research Association.
7. Finch, W. H. (2005). The MIMIC model as a method for detecting DIF:
Comparison with Mantel_Haenszel, SIBTEST and IRT likelihood ratio. Applied
Psychological Measurement, 29, 278-295.
8. Fitzpatrick, A. R., Wendy, M. Y. (2001). The Effects of Test Length and Sample
Size on the Reliability and Equating of Tests Composed of Constructed
Response Items. Applied Measurement In Education, 14(1), 3157.



9. González-Romá, V., Hernández, A., & Gόmez-Benito, J. (2006). Power and
Type I error of the mean and covariance structure analysis model for detecting
differential item functioning in graded response items. Multivariate Behavioral
Research, 41(1), 29-53.
10. Guilera, G.; Gómez-Benito, J. & Hidalgo, M.D. (2009). Scientific production on
the Mantel-Hanszel procedure as a way of detecting DIF. Psicothema, 21 (3),
492-498.
11. Han, K. T., & Hambleton, R. K. (2007). User's Manual: WinGen (Center for
Educational Assessment Report No. 642). Amherst, MA: University of
Massachusetts, School of Education.
12. Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies
in item response theory. Applied Psychological Measurement, 20(2), 101-125.
13. Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement:
Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.),
International Encyclopedia of Education (3rd edition). USA: Elsevier - Science &
Technology.
14. Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement:
Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.),
International Encyclopedia of Education (3rd edition). USA: Elsevier - Science &
Technology.
15. Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the
Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity
(pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum
16. Kim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord’s
procedure for thedetection of differential item functioning. Applied Psychological
Measurement, 18(3),217-228.
17. Kinsey, Tari L. (2003). A Comparison of IRT and Rasch Procedures in a Mixed
Item Format Test. Unpublished Doctoral Dissertation, University of North Texas.



18. Kumagai, R. (2012) A new method for estimating differential item functioning
(DIF) for multiple groups and polytomous items: Development of index K and the
computer program "EasyDIF". Japanese Journal of Psychology, 83(1), 35-43.
(in Japanese)
19. Mantel, N. & Haenszel, W. (1959). Statistical aspects of the analysis of data
from
retrospective studies of disease. Journal of the -ational Cancer Institute, 22,
719-748.
20. Mazor, K., Clauser, B. E., & Hambleton, R. K. (1994). Identification of
nonuniform differential item functioning using a variation of the Mantel-Haenszel
procedure. Educational and Psychological Measurement, 54 , 284-291.
21. Mellenbergh, G. J. (1982). Contingency table models for assessing item bias.
Journal of Educational Statistics 7, 105-118.
22. Millsap, R. E. & Everson, H. T. (1993). Methodology review: Statistical
approaches for assessing measurement bias. Applied Psychological
Measurement 17, 297-334.
23. Millsap, R. E. & Everson, H. T.(1993). Methodology review: Statistical
approaches for assessing measurement bias. Applied Psychological
Measurement 17, 297-334
24. Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias.
In C. R. Rao, & S. Sinharay (Eds.), Psychometrics (pp. 125-168; 5).
Amsterdam: Elsevier
25. Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning
in performance assessment: review and recommendations. Educational
Measurement: Issues and Practice, 19 (3), 5-15.
26. Potenza, M. T. & Dorans, N. J. (1995). DIF assessment for polytomously
scored items: A framework for classification and evaluation. Applied
Psychological Measurement,19, 23-37.



27. Potenza, M. T. & Dorans, N. J. (1995). DIF assessment for polytomously
scored items: A framework for classification and evaluation. Applied
Psychological Measurement, 19, 23-37.
28. Raju, N. S. & Ellis, B. B. (2002). Differential item and test functioning. In F.
Drasgow & N. Schmitt (Eds.), Measuring and Analyzing Behavior in
Organizations: Advances in Measurement and Data Analysis (pp. 156-188).
San Francisco, CA: Jossey-Bass.
29. Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small
sample sizeand studied item parameters on SIBTEST and Mantel-Haenszel type
I errorperformance. Journal of Educational Measurement, 33(2), 215-230.
30. Shudong, W. (1999). The Accuracy of Ability Estimation Methods for
Computerized Adaptive Testing Using The Generalized Partial Credit Model.
Unpublished Doctoral Dissertation, University of Pittsburgh.
31. Tan, X., & Gierl, M. J. (2005). Using local DIF analyses to assess group
differences on multilingual examinations. Poster presented at the annual meeting
of the National Council on Measurement in Education.
32. Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel
procedure in the detection of differential item functioning. Applied Psychological
Measurement, 18, 15-25.
33. Williams, N. J. (2003). Item and Person Parameter Estimation using Hierarchical
Generalized Linear Models and Polytomous Item Response Theory Models.
Unpublished Doctoral Dissertation, University of Texas at Austin.
34. Zwick, R. and Ercikan, K. (1989). Analysis of differential item functioning in the
NAEP history assessment. Journal of Educational Measurement, 26, 55-66.
Article
يهدف البحث الحالي إلى الاستفادة من نموذج أندريش، بوصفه أحد نماذج الاستجابة المتعددة التدريج لنظرية القياس المعاصرة. حيث ركز الباحث على قياس الأداء التفاضلي استنادا للقدرة على استخدام استراتيجيات الحكمة الاختبارية. ولتحقيق هذا الهدف اعتمدت الباحث على مقياس الحكمة الاختبارية والمعد من قبل (حمد، 2010). لذلك تم اختيار عينة عشوائية طبقية قوامها (447) طالباً وطالبة من الصفوف العاشر والحادي عشر والثاني عشر الإعدادي، وتم التحقق من فرضيات نظرية الاستجابة للفقرة (IRT)، بما في ذلك فرضية البعد الواحد، وذلك من خلال التحليل العاملي. لفقرات الاختبار باستخدام طريقة تحليل المكونات الرئيسية (PCA) لاستجابات الأفراد لفقرات الاختبار، وذلك بحساب قيمة الجذر الكامن ونسبة التباين المفسر،وكـذلك التبـاين المفسـر التراكمي لكل عامل من العوامل، ومن خلال هذا الافتراض تم تأكيد فرضية الاستقلال المحلي أيضاً. ولتحليل بيانات فقرات المقياس استخدم الباحث نموذج أندريش، وباستخدام برنامج الحاسوب (ConstructMap-4.6) حيث أشارت قيم موقع الفقرة المقدرة على السمة الكامنة إلى أنها تراوحت من (2.55) إلى (2.71) لوغاريتم، بمتوسط (0.027) لوغاريتم. وهذا يشير إلى أن المقياس يغطي نطاقًا واسعًا من السمة المقاسة، من الأقل إلى الأعلى كما بلغ الخطأ المعياري لمتوسط تقديرات صعوبة الفقرة (0.032)، وهي قيمة منخفضة قريبة من الصفر، مما يدل على دقة تقديرات موقع الفقرة على سمة الحكمة الكامنة خلف الاستجابة للاختبار
Article
Full-text available
The performance of the Mantel-Haenszel odds-ratio estimator and chi2 significance test were investigated using simulated data. Multiparameter logistic item response theory models were used to generate item scores for 20- and 40-item tests for 500 reference group and 500 focal group examinees. The difficulty, discrimination, and guessing parameters, and the difference in the group trait level averages were varied and combined factorially. Within each cell of the design, 200 replications were completed under both differential item functioning (DIF) and no-DIF Conditions. The empirical chi2 Type I and Type II error rates, and the average of the odds-ratio estimates, were analyzed over the 200 replications. Under no-DIF conditions, inflated chi2 Type I error rates and misestimated odds-ratio values were found for the 20-item test and resulted from interactions between item parameter values and trait differences. For the 40-item test, Type I error rate inflation disappeared but odds-ratios still were misestimated. Under DIF conditions, Type II error rates were not inflated, but odds-ratios were misestimated, due to parameter x trait level interactions for both test lengths. The results demonstrate the importance of using both the odds-ratio and the significance test in interpreting the presence or absence of DIF. In addition, the accuracy under the DIF conditions depended on the size and uniformity of DIF.
Article
The Mantel-Haenszel procedure is a noniterative contingency table method for estimating and testing a common two-factor association parameter in a 2×2×k table. As such it may be used to study “item bias” or differential item functioning in two groups of examinees. This technique is discussed in this context and compared to other related techniques as well as to item response theory methods.
Article
Examined in this study were the effects of test length and sample size on the alternate forms reliability and the equating of simulated mathematics tests composed of constructed-response items scaled using the 2-parameter partial credit model. Test length was defined in terms of the number of both items and score points per item. Tests with 2, 4, 8, 12, and 20 items were generated, and these items had 2, 4, and 6 score points. Sample sizes of 200, 500, and 1,000 were considered. Precise item parameter estimates were not found when 200 cases were used to scale the items. To obtain acceptable reliabilities and accurate equated scores, the findings suggested that tests should have at least eight 6-point items or at least 12 items with 4 or more score points per item.
Article
This study compares the ability of the multiple indicators, multiple causes (MIMIC) confirmatory factor analysis model to correctly identify cases of differential item functioning (DIF) with more established methods. Although the MIMIC model might have application in identifying DIF for multiple grouping variables, there has been little examination of how well the technique works in terms of correct and incorrect identification of DIF. A Monte Carlo methodology is used in this study, with manipulation of the number of items, number of examinees, differences between the mean abilities of the reference and focal groups, level of DIF contamination of the anchor items, and amount of DIF in the target item. Results indicate that the MIMIC model is effective for DIF identification for 50 items or when the two-parameter logistic model underlies the data but has a very high rate of incorrect DIF identification for 20 items with three-parameter logistic data.
Article
Increased use of alternatives to the traditional dichotomously scored multiple-choice item yield complex responses that require complex scoring rules. Some of these new item types can be polytomously scored. DIF methodology is well-defined for traditional dichotomously scored multiple-choice items. This paper provides a classification scheme of DIF procedures for dichotomously scored items that is applicable to new DIF procedures for polytomously scored items. In the process, a formal development of a polytomous version of a dichotomous DIF technique is presented. Several polytomous DIF techniques are evaluated in terms of statistical and practical criteria.
Article
Statistical methods developed over the last decade for detecting measurement bias in psycho logical and educational tests are reviewed. Earlier methods for assessing measurement bias generally have been replaced by more sophisticated statistical techniques, such as the Mantel-Haenszel procedure, the standardization approach, logistic regression models, and item response theory approaches. The review employs a conceptual framework that distin guishes methods of detecting measurement bias based on either observed or unobserved conditional invariance models. Although progress has been made in the development of statistical methods for detecting measurement bias, issues related to the choice of matching variable, the nonuniform nature of measurement bias, the suitability of cur rent approaches for new and emerging perform ance assessment methods, and insights into the causes of measurement bias remain elusive. Clearly, psychometric solutions to the problems of measurement bias will further understanding of the more central issue of construct validity. The con tinuing development of statistical methods for detecting and understanding the causes of mea surement bias will continue to be an important scientific challenge.
Article
Type I error rates of Lord's χ 2 test for differential item functioning were investigated using monte carlo simulations. Two- and three-parameter item response theory (IRT) models were used to generate 50-item tests for samples of 250 and 1,000 simulated examin ees. Item parameters were estimated using two algo rithms (marginal maximum likelihood estimation and marginal Bayesian estimation) for three IRT models (the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-param eter model). Proportions of significant χ 2s at selected nominal α levels were compared to those from joint maximum likelihood estimation as reported by McLaughlin & Drasgow (1987). Type I error rates for the three-parameter model consistently exceeded theo retically expected values. Results for the three-param eter model with a fixed guessing parameter and for the two-parameter model were consistently lower than ex pected values at the a levels in this study. Index terms: differential item functioning, item response theory, Lord's χ2.
Article
Monte carlo studies are being used in item response theory (IRT) to provide information about how validly these methods can be applied to realistic datasets (e.g., small numbers of examinees and multidimensional data). This paper describes the conditions under which monte carlo studies are appropriate in IRT-based re search, the kinds of problems these techniques have been applied to, available computer programs for gen erating item responses and estimating item and exam inee parameters, and the importance of conceptualizing these studies as statistical sampling experiments that should be subject to the same principles of experimen tal design and data analysis that pertain to empirical studies. The number of replications that should be used in these studies is also addressed.
Article
The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning (DIF). One of the most troublesome criticisms of this procedure is that whereas detection rates for uniform DIF are very good, the procedure is not sensitive to nonuniform DIF. In this study, examinee responses were generated to simulate both uniform and nonuniform DIF. A standard MH procedure was used first. Then, examinees were split into two samples by breaking the full sample at approximately the middle of the test score distribution. The tests were then reanalyzed, first with the low-performing sample and then with the high-performing sample. This variation improved detection rates of nonuniform DIF considerably over the total sample procedure without increasing the Type I error rate. Items with the largest differences in discrimination and difficulty parameters were most likely to be identified.