Content uploaded by Ziad Abdullah
Author content
All content in this area was uploaded by Ziad Abdullah on Jul 23, 2023
Content may be subject to copyright.
––
https://doi.org/10.52839/0111-000-072-004
The Effect of Sample Size on the Item Differential Functioning in
the Context of Item Response Theory
Ziad Abdullah
Theology College – Gaziantep University – Turkey
E-mail
Ziadsy@gmail.com
Ziadsy@gantep.edu.tr
Abstract
The current study examined the effect of different sample sizes to detect the Item differential
functioning (DIF). The study has used three different sizes of the samples (300, 500, 1000),
as well as to test a component of twenty polytomous items, where each item has five
categories. They were used Graded Response Model as a single polytomous item response
theory model to estimate items and individuals’ parameters. The study has used the Mantel-
Haenszel (MH) way to detect (DIF) through each case for the different samples. The results
of the study showed the inverse relationship between the sample size and the number of
items, which showed a differential performer.
Keywords: Item Differential Functioning (IDF), sample size, polytomous item
response theory models, Graded Response Model (GRM).
(CTT TheoryClassical Test
Item Response Theory (IRT)
Item Bias
Item Differential Functioning (DIF)
Graded Response
Model Polytomous
Dichotomous
Item Differential Functioning (DIF)
Refrence groupFocal group
Dichotomous
Polytomous
Hidalgo & Gómez-Benito, 2010; Millsap & Everson, 1993; Potenza & Dorans,
1995DIFUniform DIF
Non-Uniform DIF
Finch, 2005
Mellenbergh, 1982
DIF
DIF
Raju & Ellis, 2002
DIF Detection Methods
DIF
Hidalgo & Gómez-Benito, 2010;Millsap & Everson, 1993; Potenza & Dorans,
1995
Penfield & Lam, 2000; Penfield & Camilli, 2007
Mantel-Haenszel (MH)
Educational Testing
Service (ETS)
Guilera, Gómez-Benito & Hidalgo, 2009; Ackerman & Evans, 1992; Allen &
Donoghue, 1996; Clauser, Mazor, & Hambleton, 1991; Clauser, 1993; Mazor,
Clauser, & Hambleton, 1992; Mazor et al., 1994; Uttaro & Millsap, 1994
MH
Mantel & Haenszel, 1959
Reference
Focal2x2
2 x C x K2
CK
2x2
iDIF
…………..
…………..
…………..
…………
ci
cijr
cijf
c
i
r
f
j
i
r
j
i
f
j
MH =
F k
Fk
c
GR
Ci
k
MH
c-1c
β
common odds ratio
……………..(2)
ij
ij
ij
ij
ij
Holland and Thayer, 1988
………… (3)
Zwick & Ercikan, 1989
A
B
C
EasyDIF Kumagai, 2012 K
K
k=
………(4)
.
.
N
K
……….(5) (C-1) x 0.1W=
C
0.4(5-1) x 0.1W=
K > W
EasyDIF≠≠≠
K
EasyDIF
Graded Response Model (GRM)
Samejima, 1972
Likert-type scale
……………….(6)
ki
D
i
a
i
i
b
i
Thurstone's approach
Anderson, 2003
2
1 VS. 2,3,4
i
b
3
1,2 VS. 3,4
i
b
4
(1,2,3 VS. 4)
i
b
González-Romá et al., 2006
DIF
Kim, Cohen & Kim, 1994
3PLMRoussos & Stout, 1996
SIBTEST
Simulation study
WinGenHan, & Hambleton, 2007)
Ching, 2002; Daniel & Joshua, 2009; Shudong, 1999; Fitzpatrick & Wendy, 2001;
Kinsey, 2003, 2012; Williams, 2003
Harwell et al., 1996
MonteCarlo studies
GRM
N(0 , 1)
1
GRM
5
-0.361
-0.992
0.160
0.862
1.322
2
GRM
5
0.417
-1.134
-1.080
0.349
1.282
3
GRM
5
-0.839
0.235
0.376
0.881
1.393
4
GRM
5
-0.199
-1.505
-0.729
-0.451
0.234
5
GRM
5
2.229
-0.831
-0.255
-0.142
0.807
6
GRM
5
0.701
-1.916
-0.805
-0.127
0.639
7
GRM
5
-0.270
-0.196
0.516
0.996
1.679
8
GRM
5
-0.174
-0.875
-0.396
0.745
1.545
9
GRM
5
-0.692
-1.061
-0.562
0.464
0.780
10
GRM
5
-0.282
0.274
1.254
2.040
2.655
11
GRM
5
-0.356
-0.065
0.854
0.884
1.043
12
GRM
5
-1.478
-0.766
-0.092
0.188
1.453
13
GRM
5
0.000
-1.882
0.303
0.392
0.987
14
GRM
5
0.178
-1.261
-0.569
-0.412
0.373
15
GRM
5
1.411
-1.012
-0.249
-0.146
1.212
16
GRM
5
-0.549
-0.402
-0.142
0.237
0.404
17
GRM
5
-0.764
-1.029
-1.013
-0.971
0.312
18
GRM
5
1.881
0.351
0.395
0.878
1.401
19
GRM
5
-0.117
-1.091
-0.487
-0.007
1.309
20
GRM
5
-1.078
-0.739
-0.260
-0.192
0.220
DIF
EasyDIF
p
All USE
NonDIF
Study
All not USE
All USE
K
N=300
Kx0,1=0.45-1
IGF)RDIF
b1b2b3b4
aK
N=300
N=500
K0.4
N = 300
ةئفلا
b1
b2
b3
b4
a
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K
1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
-2.398
-0.572
-2.120
-0.540
-0.245
0.299
1.742
0.937
0.119
0.614
0.506
###
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.119
0.614
0.555
###
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.119
0.614
0.555
###
5
-0.843
-1.257
-0.391
-0.374
-0.277
-0.252
0.532
0.744
1.445
1.285
0.127
6
-2.195
-3.175
-0.810
-1.718
-0.253
-0.900
0.416
0.490
0.455
0.264
0.202
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.455
0.264
0.217
14
-0.828
0.000
-0.554
0.000
-0.282
0.000
-0.15
0.000
0.115
0.264
0.204
15
-1.906
-1.455
-0.666
-0.434
-0.574
-0.296
1.073
1.108
0.712
0.805
0.167
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.712
0.805
0.071
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.712
0.805
0.071
18
0.442
0.0543
0.468
0.150
0.949
0.804
1.701
1.414
0.878
1.048
0.219
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.878
1.048
0.105
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.878
1.048
0.105
N=300
N=500
K0.4
N = 500
b1
b2
b3
b4
A
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K
1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5
-0.430
-0.880
-0.117
-0.212
-0.068
-0.111
0.571
0.780
3.064
1.154
0.463
###
6
-1.998
-2.503
-0.889
-0.967
-0.275
0.237
0.749
1.197
0.370
0.327
0.103
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.000
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
14
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.370
0.327
0.052
15
-1.103
-0.823
-0.242
0.019
-0.184
0.143
1.824
1.457
0.580
0.705
0.162
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.580
0.705
0.112
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.580
0.705
0.112
18
0.423
0.337
0.505
0.379
1.015
0.340
1.476
1.325
1.193
1.233
0.097
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.193
1.233
0.020
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.193
1.233
0.020
N=500
N=1000
K0.4
N = 1000
b1
b2
b3
b4
a
DIF
G
I
R
F
R
F
R
F
R
F
R
F
K
1
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2
-0.861
-1.074
-0.808
-1.061
0.363
0.370
1.279
1.138
0.429
0.461
0.095
3
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.429
0.461
0.034
4
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.429
0.461
0.034
5
-0.729
-0.640
-01.66
-0.170
-0.092
-0.096
0.754
0.676
2.299
3.160
0.109
6
-1.660
-2.289
-0.697
-0.929
-0.226
-0.230
0.356
0.503
0.843
0.578
0.194
7
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
8
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
9
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
10
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
11
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
12
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
13
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
14
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.843
0.578
0.217
15
-0.953
-
0.884
-
0.252
-0.202
-0.139
-0.102
1.057
1.075
1.458
1.628
0.047
16
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.458
1.628
0.062
17
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.458
1.628
0.062
18
0.382
0.368
0.392
0.383
0.859
0.828
1.270
1.313
2.068
2.230
0.017
19
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2.068
2.230
0.037
20
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
2.068
2.230
0.037
Tan & Gierl, 2005
Computerized Adaptive Testing (CAT)
Recommendations
The current study recommends careful attention to the factors affecting
determining the specifications of the item and its differential performance between
different groups, whether by gender, race or culture, before storing this item in the
item bank and benefiting from it in measuring capabilities, and the importance of this
process is highlighted in computerized adaptive testing, which depends on the
measurement of capabilities based on a pre-test to determine the individual's ability
and then examine him with a test appropriate to his ability. If the used item in the
pre-assessment of an individual's ability suffers from a differential performance, this
will give him a test that is disproportionate to his ability, and this will undoubtedly
lead to a wrong decision that affects the objectivity and fairness of the measurement
process.
Suggestions for future research
Through the results of the study, the researcher suggests conducting the following
studies as future research:
1. Conducting a study using real data and comparing it with simulation data to
determine the accuracy of the results.
2. Discuss the issue of the effect of sample size on the differential item functioning
through using other models of item response models such as the partial credit
model, generalized partial credit model and other models in this field.
3. Carrying out a study to compare the differential item functioning when the item
response models differed between polytomous and dichotomous, since
polytomous models typically require a larger sample size than they do in
dichotomous models.
4. Studying the effect of the length of the test on the differential item functioning, it
may play an influencing role in judging the item performance.
1. Abdullah, Z. (2012) Effect of some Estimation Methods on Accuracy of
Estimating Parameters in Polytomous Item Response Models. Unpublished
Doctoral Dissertation, Institute of Educational Studies, Cairo University.
2. Ackerman, T. A. & Evans, J. A. (1992). An investigation of the relationship
between reliability, power, and the Type I error rate of the Mantel-Haenszel and
simultaneous item bias detection procedures. Annual Meeting of the National
Council on Measurement in Education, San Francisco.
3. Allen, N. L., & Donoghue, J. R. (1996). Applying the Mantel-Haenszel
procedure to complex samples of items. Journal of Educational Measurement,
33 (2), 231-251.
4. Ching-Fung B. Si. (2002). Ability Estimation Under Different Item
Parameterization and Scoring Models. Unpublished Doctoral Dissertation,
University of North Texas.
5. Clauser, B. E., Mazor, K., & Hambleton, R. K. (1991). An examination of item
characteristics on Mantel-Haenszel detection rates. Annual Meeting of the
National Council on Measurement in Education, Chicago.
6. Daniel, J., Joshua, G. (2009). A Comparison of IRT Parameter Recovery in
Mixed Format Examinations Using PARSCALE and ICL. Poster presented at the
Annual meeting of Northeastern Educational Research Association.
7. Finch, W. H. (2005). The MIMIC model as a method for detecting DIF:
Comparison with Mantel_Haenszel, SIBTEST and IRT likelihood ratio. Applied
Psychological Measurement, 29, 278-295.
8. Fitzpatrick, A. R., Wendy, M. Y. (2001). The Effects of Test Length and Sample
Size on the Reliability and Equating of Tests Composed of Constructed
Response Items. Applied Measurement In Education, 14(1), 31–57.
9. González-Romá, V., Hernández, A., & Gόmez-Benito, J. (2006). Power and
Type I error of the mean and covariance structure analysis model for detecting
differential item functioning in graded response items. Multivariate Behavioral
Research, 41(1), 29-53.
10. Guilera, G.; Gómez-Benito, J. & Hidalgo, M.D. (2009). Scientific production on
the Mantel-Hanszel procedure as a way of detecting DIF. Psicothema, 21 (3),
492-498.
11. Han, K. T., & Hambleton, R. K. (2007). User's Manual: WinGen (Center for
Educational Assessment Report No. 642). Amherst, MA: University of
Massachusetts, School of Education.
12. Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies
in item response theory. Applied Psychological Measurement, 20(2), 101-125.
13. Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement:
Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.),
International Encyclopedia of Education (3rd edition). USA: Elsevier - Science &
Technology.
14. Hidalgo, M. D., & Gómez-Benito, J. (2010). Education measurement:
Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.),
International Encyclopedia of Education (3rd edition). USA: Elsevier - Science &
Technology.
15. Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the
Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity
(pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum
16. Kim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord’s
procedure for thedetection of differential item functioning. Applied Psychological
Measurement, 18(3),217-228.
17. Kinsey, Tari L. (2003). A Comparison of IRT and Rasch Procedures in a Mixed
Item Format Test. Unpublished Doctoral Dissertation, University of North Texas.
18. Kumagai, R. (2012) A new method for estimating differential item functioning
(DIF) for multiple groups and polytomous items: Development of index K and the
computer program "EasyDIF". Japanese Journal of Psychology, 83(1), 35-43.
(in Japanese)
19. Mantel, N. & Haenszel, W. (1959). Statistical aspects of the analysis of data
from
retrospective studies of disease. Journal of the -ational Cancer Institute, 22,
719-748.
20. Mazor, K., Clauser, B. E., & Hambleton, R. K. (1994). Identification of
nonuniform differential item functioning using a variation of the Mantel-Haenszel
procedure. Educational and Psychological Measurement, 54 , 284-291.
21. Mellenbergh, G. J. (1982). Contingency table models for assessing item bias.
Journal of Educational Statistics 7, 105-118.
22. Millsap, R. E. & Everson, H. T. (1993). Methodology review: Statistical
approaches for assessing measurement bias. Applied Psychological
Measurement 17, 297-334.
23. Millsap, R. E. & Everson, H. T.(1993). Methodology review: Statistical
approaches for assessing measurement bias. Applied Psychological
Measurement 17, 297-334
24. Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias.
In C. R. Rao, & S. Sinharay (Eds.), Psychometrics (pp. 125-168; 5).
Amsterdam: Elsevier
25. Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning
in performance assessment: review and recommendations. Educational
Measurement: Issues and Practice, 19 (3), 5-15.
26. Potenza, M. T. & Dorans, N. J. (1995). DIF assessment for polytomously
scored items: A framework for classification and evaluation. Applied
Psychological Measurement,19, 23-37.
27. Potenza, M. T. & Dorans, N. J. (1995). DIF assessment for polytomously
scored items: A framework for classification and evaluation. Applied
Psychological Measurement, 19, 23-37.
28. Raju, N. S. & Ellis, B. B. (2002). Differential item and test functioning. In F.
Drasgow & N. Schmitt (Eds.), Measuring and Analyzing Behavior in
Organizations: Advances in Measurement and Data Analysis (pp. 156-188).
San Francisco, CA: Jossey-Bass.
29. Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small
sample sizeand studied item parameters on SIBTEST and Mantel-Haenszel type
I errorperformance. Journal of Educational Measurement, 33(2), 215-230.
30. Shudong, W. (1999). The Accuracy of Ability Estimation Methods for
Computerized Adaptive Testing Using The Generalized Partial Credit Model.
Unpublished Doctoral Dissertation, University of Pittsburgh.
31. Tan, X., & Gierl, M. J. (2005). Using local DIF analyses to assess group
differences on multilingual examinations. Poster presented at the annual meeting
of the National Council on Measurement in Education.
32. Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel
procedure in the detection of differential item functioning. Applied Psychological
Measurement, 18, 15-25.
33. Williams, N. J. (2003). Item and Person Parameter Estimation using Hierarchical
Generalized Linear Models and Polytomous Item Response Theory Models.
Unpublished Doctoral Dissertation, University of Texas at Austin.
34. Zwick, R. and Ercikan, K. (1989). Analysis of differential item functioning in the
NAEP history assessment. Journal of Educational Measurement, 26, 55-66.