ArticlePDF Available

Sex Differences in 32,347 Jordanian 4th Graders on the National Exam of Mathematics

Authors:
  • Ulster Institute for Social Research
  • Ulster Institute for Social Research

Abstract and Figures

Sex differences in mathematical ability were examined in a nationwide sample of 32,346 Jordanian 4th graders (age 9-10 year) on a 40-item mathematics test. Overall, boys were found to perform slightly worse (d = À0.12) but had slightly more variation in scores (SD = 1.02 and SD = 0.98 for boys and girls, respectively). However, when results were disaggregated by school type, single-sex versus coed (i.e., coeducational), boys were found to perform better than girls in coed schools (d = 0.27) but worse across single-sex schools (d = À0.37). Two-parameter item response theory analysis showed that item difficulty was similar across sexes in the full sample. Item loadings exhibited substantial departure from measurement invariance with respect to boys and girls at single-sex schools, though. For boys and girls at coed schools, both the item difficulty and item loading correlations were highly similar, evincing that measurement invariance largely held in this case. Partially consistent with findings from other countries, a correlation between item difficulty and male advantage was observed, r = .57, such that the relative male advantage increased with increased item difficulty. Complicating interpretation, this association did not replicate within coed schools. Item content, Bloom's cognitive taxonomy category, and item position showed no relation to sex differences.
Content may be subject to copyright.
Original Article
Sex Differences in 32,347 Jordanian
4th Graders on the National Exam
of Mathematics
Ismael S. Al-Bursan,
1
Emil O. W. Kirkegaard,
2
John Fuerst,
2
Salaheldin Farah Attallah Bakhiet,
3
Mohammad F. Al Qudah,
1
Elsayed Mohammed Abu Hashem Hassan,
1
and Adel S. Abduljabbar
1
1
King Saud University, Department of Psychology, College of Education, Riyadh, Saudi Arabia
2
Ulster Institute for Social Research, London, UK
3
King Saud University, Department of Special Education, College of Education, Riyadh, Saudi Arabia
Abstract: Sex differences in mathematical ability were examined in a nation-wide sample of 32,346 Jordanian 4th graders (age 910 year) on
a 40-item mathematics test. Overall, boys were found to perform slightly worse (d=0.12) but had slightly more variation in scores (SD = 1.02
and SD = 0.98 for boys and girls, respectively). However, when results were disaggregated by school type, single-sex versus coed (i.e.,
coeducational), boys were found to perform better than girls in coed schools (d= 0.27) but worse across single-sex schools (d=0.37). Two-
parameter item response theory analysis showed that item difficulty was similar across sexes in the full sample. Item loadings exhibited
substantial departure from measurement invariance with respect to boys and girls at single-sex schools, though. For boys and girls at coed
schools, both the item difficulty and item loading correlations were highly similar, evincing that measurement invariance largely held in this
case. Partially consistent with findings from other countries, a correlation between item difficulty and male advantage was observed,r= .57,
such that the relative male advantage increased with increased item difficulty. Complicating interpretation, this association did not replicate
within coed schools. Item content, Blooms cognitive taxonomy category, and item position showed no relation to sex differences.
Keywords: sex differences, mathematics, cognitive ability, item response theory, Jordan
Sex differences in cognitive and mathematical abilities have
been widely studied and discussed over the last few dec-
ades (Halpern et al., 2007; Else-Quest, Hyde, & Linn,
2010). Much of the concern stems from that related to fre-
quently found social outcome differences, such as different
participation rates in STEM fields, and their possible rela-
tion to developed ability, educational related opportunities,
and social perceptions such as stereotypes (Nosek et al.,
2009). Previous research on mathematical abilities, mea-
sured by scholastic achievement tests, has found male
advantages which are very small to small, given conven-
tional effect size interpretative standards (Sawilowsky,
2009). The effect sizes, though, depend on the knowledge
domain, item characteristics, and the age of the sample
(Else-Quest et al., 2010). They also depend on the country
of assessment. For example, in Middle Eastern Islamic
countries, girls often perform better than boys on math
and science exams (Fryer & Levitt, 2010; Stoet & Geary,
2015), while in Sub-Sahara Africa, boys have been found
to have a substantial advantage (Dickerson, McIntosh, &
Valente, 2015). As a result, the magnitudes of differences
depend on the set of countries examined (Fryer & Levitt,
2010).
As noted, magnitudes of sex differences depend on
knowledge domains and item characteristics. Regarding
the latter, it has been found that girls do worse on more dif-
ficult mathematical items (Beller & Gafni, 2000; Bielinski
&Davison,2001; Penner, 2003). However, it has also been
found that girls perform no worse on more cognitively com-
plex items, as indexed using Blooms cognitive taxonomy
(Bloom, 1984), according to which questions are catego-
rized as knowledge, application, and reasoning based
(Lan, 2014; Else-Quest et al., 2010). Given that, on average,
reasoning questions are more difficult than knowledge and
application ones (Liou & Bulut, 2017), these two sets of
findings seem to be at odds with one another.
Another item characteristic frequently investigated in
context of group differences is item position (Debeer &
Janssen, 2013). Some research has found item posi-
tion effects related to sex differences (Mäkitalo, 1996;
Borgonovi & Biecek, 2016). Item position effects refer to
how the previously completed items effect performance
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
https://doi.org/10.1027/1614-0001/a000278
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
on a given item. Item position effects have been found to
arise from an association between item position and item
difficulty and also from an association between item posi-
tion and participant motivation (Borgonovi & Biecek,
2016; Zeller, Reiß, & Schweizer, 2017).
Regarding knowledge domains, in general, certain math-
ematical subtests have been found to show very small to
small male advantages (e.g., Program for International Stu-
dent Assessment [PISA] space and shape, PISA change/re-
lationships, and Trends in International Mathematics and
Science Study [TIMSS] measurement), while others have
been found to show very small to small female advantages
(e.g., TIMSS algebra) while yet still others have been found
to show practically no differences at all (e.g., TIMSS data,
TIMSS geometry, and TIMSS number; Else-Quest et al.,
2010). Generally, as noted by Schroeders, Wilhelm, and
Olaru (2016), test composition can affect the magnitude
of sex differences.
As for data analyses, most prior studies of sex differences
have been based on data at the subtest and test level, not at
the item level. Nonetheless, there has been an increasing
tendency toward analyzing item data instead, as this can
provide more information regarding sources of observed
differences. Yet, studies that have analyzed item-level data
mostly involve Western samples (Henrich, Heine, & Noren-
zayan, 2010). Data collected from other regions of the
world have less often been studied in this way.
Jordan Specific Context
Jordan, formally known as The Hashemite Kingdom of Jor-
dan, is a small Middle Eastern country located on the Arab
peninsula. Jordan is currently undergoing a project of
modernization, both with respect to educational policy
(Ababneh, Al-Tweissi, & Abulibdeh, 2016) and with respect
to societal sex roles (Shteiwi, 2015). In wake of poor TIMSS
results, Jordans Ministry of Education has begun to revamp
its educational curriculum. To monitor the effectiveness of
the changes, national tests have been developed (Ababneh
et al., 2016).
As in the case of other Middle Eastern educational sys-
tems, a substantial portion of the primary and secondary stu-
dent population attend single-sex schools. In light of the
segregated schooling and the common views about tradi-
tional sex roles and the possible impacts both may have
on educational outcomes, the Ministry of Education is inter-
ested in better understanding sex differences in scholastic
performance and their possible causes (Tweissi, Ababneh,
& Lebdih, 2014). As such, researchers associated with the
Jordan Ministry of Education contacted the present authors
and requested an analysis of the new 4th-grade math test in
relation to sex differences, school type, and measurement
bias.
As noted previously, studies of international test results
have found that in Jordan, girls perform better than boys
in math, science, and reading. The same has been found
in other Middle Eastern countries in which a large portion
of students attend single-sex schools (Innabi & Dodeen,
2006; Tweissi et al., 2014). This phenomenon has led some
researchers to posit the Muslim culture hypothesis and the
single-gender classroom hypothesis, according to which Mus-
lim culture and single-sex classrooms, respectively, protect
against poor female performance. The international data,
however, do not consistently support either of these
hypotheses (Kane & Mertz, 2012; Dickerson et al., 2015).
Regarding performance differences in Jordan, AlSindi
(2013,Figure5) found, based on 8th-grade TIMSS data,
that girls in single-sex schools performed best, followed
by girls in coeducational (i.e., coed) schools, by boys in sin-
gle-sex schools, and, finally, by boys in coed schools. These
results were interpreted in light of the hypothesis that girls
benefit from segregated schooling as a result of reduced sex
stereotypes and increased opportunity for risk taking, while
simultaneously being disadvantaged due to differences
related to school funding and infrastructure. Since sex dif-
ferences, in general, have been found to be sensitive to
pupil age, it is important to determine if these patterns
replicate across different age cohorts before attempting to
theorize about their etiology.
Regarding item characteristics, previous results from Jor-
dan have also shown differential item functioning (DIF)
which favors girls on familiar items with fixed answers,
and which favors males on non-familiar items involving
judgment or estimation (Innabi & Dodeen, 2006). How
and if at all this relates to the aforementioned results
regarding item difficulty and cognitive complexity is not
clear. Additionally, Bursan (2013) reports sex related DIF
for the 10th-grade Jordanian national test. The author notes
that the proportion of DIF decrease as the question
level decreases (advanced to beginner). Relatedly, Alomari
and Shatnawi (2016) found, in an analysis of the 10th-grade
Jordanian math exam, that the majority of items showed
DIF.
An important concern with any analysis of group differ-
ences is measurement invariance (Lan, 2014). Generally,
if a test does not function the same way across groups,
score (non)differences may be difficult or impossible to
interpret in terms of latent factor (non)differences. The
issue of measurement invariance is particularly relevant
to Middle Eastern countries, given the proportion of stu-
dents in single-sex schools. Sex segregation is linked to fac-
tors,suchasclassroomdynamicsandsexrelatedpedagogy
(AlSindi, 2013), which plausibly could differentially affect
students of different sexes. While there have been a
number of studies of DIF, we were unable to locate prior
ones investigating sex related measurement invariance,
Journal of Individual Differences (2018) Ó2018 Hogrefe Publishing
2 I. S. Al-Bursan et al.,Sex Differences in 32,347 Jordanian 4th Graders
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
with respect to overall scores, on Jordanian achievement
tests.
In this paper, we analyze results from the recently devel-
oped 4th-grade national mathematical test. We start with
an item parameter analysis for the full sample and for the
full sample decomposed by sex. Next, following the method
of Fan (1998)andOECD(2015,p.153), we conduct an
assessment of measurement invariance across sexes and
school types using item parameter data; as we had limited
information about the 40 test items, we were limited in the
methods available to examine this issue. Next, in line with
previous research, we examine: (1) the magnitude of overall
sex differences, (2)themagnitudeofsexdifferences
decomposed by school type, and (3) the relation between
sex differences and item difficulty, knowledge domain,
Blooms cognitive taxonomy, and item position. Based on
previous literature, we predict that 4th-grade girls will per-
form better than 4th-grade boys both across single-sex and
in coed schools. As we have no firm grounds to make pre-
dictions regarding measurement invariance, we make none.
Based on previous results, we expect boys to outperform
girls on difficult items, despite no difference by Blooms
cognitive taxonomy. In line with cross-national results
(e.g., Else-Quest et al., 2010), we expect there to be no dif-
ferences by knowledge domain, given the domains in the
current sample (i.e., number, geometry, and data). Previous
research does not lend itself to ready predictions regarding
item position effects, so we make none in this case.
Methods
Test Design and Data Collection
The test was prepared at the Jordanian Ministry of Educa-
tion, Directorate of Tests, by a committee composed of
three educational supervisors of mathematics, an educa-
tional supervisor of measurement and evaluation, and a
4th-grade mathematics teacher. Content validity was veri-
fied by a committee composed of 3mathematics teachers,
3math educational supervisors, 2specialists in measure-
ment and evaluation, and 2university professors specialized
in the mathematics curriculum. The first version of the test
contained 48 items; the item count was reduced to 43 as a
result of content revisions. Before the main study, a pilot
study was done to screen for problematic items. This was
done on a sample of 157 pupils (83 male; 53%). Based on
this, 3items were removed due to low discrimination (item
discrimination < 0.20). After this, the remaining 40 items
were administered to 4th-graders (age 910 years) in all
schools in Jordan in the academic year of 2014/2015,with
50% of students in each school participating. The resulting
sample size was 32,346 (16,400 male; 51%). The test items
were grouped by mathematical domain (in this case: num-
ber, geometry, and data analysis) and by cognitive taxon-
omy (in this case: knowledge, application, and reasoning)
by the test developers. Table 1shows the distribution of
items by domain and by taxonomy. The test questions were
based on the TIMSS 2015 exam. Table 2shows examples of
the TIMSS Grade 4questions and the derived grade 4Jor-
dan national test questions. It can be seen that the TIMSS
and national test questions are quite similar.
Description of Analyses
All analyses are conducted using R (3.4.0;RCoreTeam,
2017). The item data are analyzed with item response theory
(IRT; DeMars, 2010). IRT analysis is done using the psych
package (Revelle, 2017) using the two-parameter normal
(2PN) model which provides item discrimination and diffi-
culty levels. This package also performs factor analysis on
the latent correlation matrix (tetrachoric; see Uebersax,
2015). This allows for the calculation of item loadings that
are on the same scale as regular test-level loadings and which
are not affected by item pass rates, unlike with Pearson cor-
relations (Wicherts, 2017; Wicherts & Johnson, 2009).
We report item parameters for the 40 math questions for
the full sample and for the full sample disaggregated by sex.
Item pass rate refers to the raw percentage of students who
answered the test item correctly; item difficulty is a transfor-
mation of item pass rate, where the item is given a stan-
dardized interval scale score. The correlation between
these two variables is r=1in this sample. The correlation
is negative because a higher item pass rate is indicative of
lower item difficulty. Item discrimination refers to the extent
to which students with high total exam scores correctly
answered the item; and item loading represents how
strongly an item is associated with the underlying factor.
Mathematically, item discrimination, in item response the-
ory, is a transformation of the item loading parameter and
so the two parameters are colinear, with a correlation of
r=.99 in this sample. As an additional statistic, we calcu-
late item level d-value equivalents from pass rates on the
Table 1.Item classification counts
Type Count Percent
Content
Number 23 57.50
Geometry 12 30.00
Data analysis 5 12.50
Level
Knowledge 15 37.50
Application 13 32.50
Reasoning 12 30.00
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders 3
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
assumption of normality. This is done using the qnorn
commands in R, which gives the inverse cumulative density
function; accordingly, item d-value equals: qnorm
(male_pass rate) qnorm(female_pass rate).
To examine measurement invariance, we employ an ana-
lytic method similar to that used by Fan (1998); see also
OECD (2015). We correlate item difficulty and item loading
parameters across random subsets of the data to determine
an expected correlation giving sampling error. We then cor-
relate the same two item parameters across sex. Since a
deviation from the expected correlation is not accountable
by sampling error, it must reflect a deviation from measure-
ment invariance. To determine if the found departure from
measurement invariance is related to school type (single-
sex vs. coed), we repeat the analysis with the sex groups
decomposed by school type.
Next, we examine the distribution of overall scores by
school type and by sex. For the comparison of group scores,
we compute Cohensd-scores. A d-score is a standardized
mean difference, with the effect size calculated using the
following equation:
d¼
x1
x2
s¼μ1μ2
s:
Cohensdis calculated as the difference between the two
group means divided by the pooled standard deviation
(i.e., the weighted average of the groupsstandard devia-
tions). In this analysis, we assign males to group 1,thusa
positive dindicates a male advantage. Conventional inter-
pretations for d-values have been provided by Cohen
(1988) and Sawilowsky (2009): very small (0.01), small
(0.2), medium (0.5), large (0.8), very large (1.2), and huge
(2.0). As Cohen (1988) notes, these are only interpretative
guidelines. For the comparisons for which the assumption
of measure invariance is warranted, we additionally con-
duct independent-samples t-tests to determine if the differ-
ences are statistically significant.
Finally, we examine the association between sex differ-
ences and item characteristics. First, we look at the simple
Pearson correlation coefficient between item difficulty and
item d-value both for the full sample and for the coed sam-
ple. Next, we run two multivariate regression models, one
for the full sample and one for the coed sample, with the
following independents: item position, item difficulty, item
loading, item content, and item level. In both multiple
regression models, we use dummy coded variables for level
(with application assigned zero) and content (with number
assigned zero).
Results
Main IRT Analyses
Table 3shows the item pass rates, the three item parame-
ters, and the item d-values for the 40 math questions for
the full sample, along with the item difficulties and item
loadings disaggregated by sex. For the full sample, item dif-
ficulty estimates ranged from 0.64 to 1.02, item discrim-
ination ranged from 0.93 to 0.30, and item loadings ranged
from .68 to .28. The loadings were not substantively differ-
ent by sex, with means of M=0.56 and M=0.55 and stan-
dard deviations of SD =0.11 and SD =0.09,formalesand
females, respectively. The item difficulties were also not
substantively different by sex with means of M=0.03
and M=0.11, and standard deviations of SD =0.34
and SD =0.39, for males and females, respectively. Item
level d-values ranged from a male advantage of d=0.17
to a male disadvantage of d=0.26, with an average item
level sex gap of d=0.07 (in favor of females).
Measurement Invariance
Five random subsets of half the data (to mirror the subsam-
pling by sex) were created and analyzed. For these, the
Table 2.TIMSS questions and corresponding national test questions
TIMSS Questions (Grade 4) National Test Questions (Grade 4)
E1a. What unit would be best to use to measure
the weight (mass) of an egg?
E1b. What is the most appropriate measure unit for
an eyedropper capacity?
a) Centimeters a) Milliliters
b) Milliliters b) Liters
c) Grams c) Grams
d) Kilograms d) Kilograms
E2a. Which of these is the name of 20507? E2b. Which of these is the name of 9740?
a) Twenty thousand five hundred seventy a) Nine thousand seventy four
b) Two hundred five thousand and seven b) Nine thousand seven hundred forty
c) Two hundred five thousand and seventy c) Nine thousand seventy-four hundred
d) Twenty thousand five hundred and seven d) Nine hundred seventy-four thousand
Journal of Individual Differences (2018) Ó2018 Hogrefe Publishing
4 I. S. Al-Bursan et al.,Sex Differences in 32,347 Jordanian 4th Graders
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
correlations between the item difficulty and item loading
parameters were always r=1. It can be concluded that
observed deviations from r=1are not accountable by sam-
pling error. Across sexes, item difficulty was very similar,
with a correlation of r=.97;CI95%[.95,.99]. The item
loading correlation was lower at r=.90;CI95%[.82,
.95]. Figures 1and 2show the plots by item number and
sex for, respectively, item difficulty and item loading. While
there are no conventional quantitative criteria which allow
us to affirm that measurement invariance holds,we can
compare the results to those presented by OECD (2015).
OECD (2015) interprets r=.94 for item difficulty and
r=.91 for item discrimination as suggesting that the same
constructs are being measured under both modes(p. 153)
in context to students taking the computer-based test and
those taking the written version. Given the magnitude of
the cross-sex correlations, we can reasonably infer the
same.
Table 3.Item parameters from the main item response theory model
Full sample Male Female
Item Pass rate Difficulty Discrimination Loading Male dDifficulty Loading Difficulty Loading
1 .60 .30 .61 .52 .01 .26 .52 .25 .51
2 .58 .27 .78 .62 .01 .21 .62 .21 .61
3 .77 .84 .58 .50 .20 .63 .52 .83 .47
4 .52 .05 .70 .58 .14 .03 .55 .12 .60
5 .48 .06 .66 .55 .00 .05 .54 .05 .56
6 .35 .46 .58 .50 .10 .35 .51 .45 .51
7 .48 .07 .55 .48 .07 .09 .48 .03 .48
8 .64 .47 .83 .64 .17 .44 .67 .28 .63
9 .79 1.02 .78 .61 .21 .70 .64 .91 .57
10 .28 .61 .30 .28 .00 .59 .27 .58 .30
11 .40 .29 .48 .43 .15 .34 .42 .19 .44
12 .59 .26 .52 .46 .06 .20 .51 .26 .41
13 .63 .43 .86 .65 .15 .25 .65 .40 .65
14 .71 .76 .93 .68 .23 .45 .69 .68 .66
15 .39 .35 .79 .62 .01 .28 .62 .27 .62
16 .40 .27 .53 .47 .04 .22 .43 .26 .52
17 .49 .04 .82 .64 .11 .09 .65 .02 .62
18 .57 .22 .88 .66 .07 .13 .67 .20 .65
19 .48 .07 .64 .54 .11 .11 .53 .01 .55
20 .28 .64 .49 .44 .02 .57 .39 .58 .50
21 .55 .17 .80 .63 .04 .15 .63 .11 .63
22 .48 .04 .44 .41 .04 .06 .44 .02 .37
23 .63 .44 .85 .65 .04 .32 .66 .35 .63
24 .58 .26 .89 .66 .15 .12 .68 .27 .65
25 .48 .07 .75 .60 .15 .13 .59 .02 .60
26 .33 .48 .47 .42 .00 .43 .39 .43 .46
27 .31 .54 .44 .41 .00 .49 .39 .50 .42
28 .53 .09 .77 .61 .01 .07 .61 .07 .61
29 .45 .14 .63 .53 .11 .07 .55 .17 .52
30 .38 .38 .72 .58 .00 .31 .60 .31 .57
31 .52 .07 .74 .60 .09 .01 .60 .10 .59
32 .67 .47 .43 .39 .24 .31 .42 .55 .35
33 .61 .36 .80 .62 .16 .20 .64 .37 .60
34 .68 .63 .91 .67 .13 .40 .72 .53 .62
35 .67 .55 .75 .60 .10 .39 .63 .49 .57
36 .63 .43 .76 .61 .18 .26 .63 .43 .57
37 .73 .83 .92 .68 .26 .49 .68 .75 .66
38 .67 .54 .74 .59 .26 .31 .61 .56 .57
39 .38 .37 .60 .52 .11 .37 .52 .26 .51
40 .35 .46 .56 .49 .05 .42 .47 .37 .50
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders 5
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
To determine if the found departure from measurement
invariance was related to school type (single-sex vs. coed),
we repeated the above analysis with the sex groups decom-
posed by school type. The results for the item parameter
correlation analysis are shown in Table 4. It can be seen
that the cross-sex correlations for item difficulty and item
loading are very strong (greater than or equal to r=.95)
for males and females in coed schools. In contrast, item
loadings showed more non-invariance both across single-
sex schools and between males at single-sex schools and
females at coed schools.
Sex Differences by School Type
For the following analyses, IRT-based scores were used
instead of summed scores; as it was, IRT-based scores cor-
related at r=1with summed scores. Figure 3shows the
Figure 1. Item difficulties by item
number and sex. Item difficulty is on
the y-axis and item number is on the
x-axis.
Figure 2. Item loadings by item num-
ber and sex. Item loading is on the y-
axis and item number is on the x-axis.
Journal of Individual Differences (2018) Ó2018 Hogrefe Publishing
6 I. S. Al-Bursan et al.,Sex Differences in 32,347 Jordanian 4th Graders
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
fitted distribution of scores by sex for the full sample. The
test had a slight ceiling effect; 0.50%ofthesample
obtained perfect scores. There was almost no floor effect,
as only 0.05% of subjects obtained raw scores of zero.
Overall, males (M=0.06,SD =0.98) obtained lower
scores than females (M=0.06,SD =1.02), but the differ-
ence (d=0.12) was small by conventional standards.
Welchst-tests showed that this effect was statistically sig-
nificant; t(32,194)=10.78,p<.01). Consistent with find-
ings from other countries (Lehre, Lehre, Laake, & Danbolt,
2009), males exhibited slightly greater variability.
Table 5shows the mean scores by sex and school type. In
the data, one individual reported being female at a male
only school (M=0.17), 47 reported being male at female
only schools (M=0.10,SD =0.70),andonereportedbeing
female with no school type identified (M=0.33); data for
these individuals are not shown. When decomposed by
school type, there were no sex differences in variance.
There was a medium to large difference between boys in
coed and in single-sex schools (d=0.69), while there was
only a very small to small difference between girls in coed
andinsingle-sexschools(d=0.05). In coed schools, boys
outperformed girls by a small amount (d=0.27), while boys
underperformed girls across single-sex schools by a small to
medium amount (d=0.37).
Of the group comparisons, only for the female coed,
female single and the female coed, male coed comparisons
were the parameter correlations strong enough to warrant
confidence in measurement invariance. For these two com-
parisons, Welchst-tests were conducted. There was a sig-
nificant but very small to small difference between females
in coed schools (M=0.077,SD =0.97)andfemalesin
single-sex schools (M=0.033,SD =0.98); t(11,190)=2.71,
p<.01. And there was a significant but small difference
between males in coed schools (M=0.340,SD =0.97)
and females in coed schools (M=0.077,SD =0.97);
t(13,869)=17.18,p<.01. In the other cases, the differences
were also statistically significant, as expected given the
largesamplesizes.
Sex Differences and Item Characteristics
Next, we explored the relation between sex differences and
item characteristics. Figure 4shows the scatterplot for item
difficulty and item d-value. A clear relationship can be seen
despite girls having an overall advantage. Item 8,witha
residual z-score of z=3.45,p<.01, two-tailed, appears to
be an outlier. Excluding it increases the correlation to
r=.68;CI95%[.47,.82]. To ease interpretation, a simple
Table 4.Parameter correlations for student sex by school type with item difficulties above the diagonal and item loadings below the diagonal
Sex, school type Female, single-sex Female, coed Male, single-sex Male, coed
Female, single-sex 1.0 .96 .96
Female, coed .96 .95 .97
Male, single-sex .81 .83 .96
Male, coed .90 .96 .83
Figure 3. Distribution of mathemati-
cal ability by sex. Density fitting done
using the Gaussian kernel; vertical
lines show means.
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders 7
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
linear model was fit to compute the unstandardized effect.
The unstandardized betas were B=0.14 and B=0.15 with
and without the outlier removed, respectively. For items
with difficulties of 1,0and 1, the predicted male advan-
tages were d=0.22,0.07,and0.09 (based on the
model with the outlier removed). The relative lack of diffi-
cult items means that the observed female advantage is lar-
ger than it would have been if more difficult items had been
included. Interestingly, though, there was no appreciable
association between male advantage and item difficulty in
coed schools (r=.02,n=40;CI95%[.33,.33]).
Since the differences within coed schools showed rela-
tively little departure from measurement invariance, this
null finding might better represent the true association
between sex differences and item difficulty. Put another
way, for the cross-sex comparison which showed the least
psychometric bias, there was no substantial univariate asso-
ciation between item difficulty and male advantage. Thus,
we were unable to confirm previous results. To note, item
difficulty was largely unassociated with cognitive complex-
ity. The zero-order correlation between item difficulty and
cognitive complexity was r=.17 (withbothapplication
and knowledge, which had similar average difficulties,
coded as 0and reasoning coded as 1). Therefore, it is prob-
lematic to, as some authors do, take Blooms cognitive com-
plexity as an index of item difficulty.
To examine whether mathematical domain (number,
geometry, and data analysis) and cognitive taxonomy
(knowledge, application, and reasoning) were related to
sex differences, a regression model was fitted with the
outcome being item d-value and the predictors being item
position, item difficulty, item loading, item domain, and
item cognitive taxonomy. Item position was included in
the models as sex-item position effects, associated with both
differences in pupil motivation and item difficulty, have
been reported. Results for all students and for students at
coed schools are shown in Table 6,inModel1and Model
2, respectively. Multiple regression analysis was used to test
if the independents significantly predicted male item level
advantage in the full sample (Model 1). The results of the
regression indicated that the nine predictors explained
48% of the variance, R
2
=.48,F(9,40)=3.08,p<.01.It
was found that difficulty significantly predicted male
advantage (B=.16,p<.01). For Model 2,theresultsof
Figure 4. Scatterplot of item difficulty
and male advantage. 95% CI = analytic
95% confidence interval.
Table 5.Scores by sex and school type
Student sex School type MSDN
Female Female only 0.033 0.98 5,536
Female Coed 0.077 0.97 10,408
Male Male only 0.327 0.97 9,822
Male Coed 0.340 0.97 6,531
Journal of Individual Differences (2018) Ó2018 Hogrefe Publishing
8 I. S. Al-Bursan et al.,Sex Differences in 32,347 Jordanian 4th Graders
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
the regression indicated that the nine predictors explained
37% of the variance, R
2
=.37,F(9,40)=1.96,p=.08.How-
ever, the overall model was not significant and difficulty was
not significantly related to male item advantage.
Consistent with previously reported results (Else-Quest
et al., 2010;Lan,2014), girls and boys did not substantially
differ by domain or by cognitive taxonomy. Also consistent
with the results above, in this multiple regression model,
the beta for item difficulty did not overlap with zero with
all students, but it did in coed schools.
Discussion
The present study found multiple results of interest. On this
newly developed 4th-grade mathematical test, girls per-
formed slightly better than boys, overall. However, this
advantage only showed up between school types. Within
coed schools, boys performed better than girls. These
results were inconsistent with those presented by AlSindi
(2013) who found that, for 8th-grade students in Jordan,
girls performed slightly better on TIMSS tests in coed
schools. We cannot at this point explain the discrepancy.
It could represent an age effect, a cohort effect, or a sam-
pling effect.
The cause of the differences both within coed schools
and between single-sex schools in this study is not clear
and the dataset did not allow for an exploration of different
causal models, such as student composition. We speculate
that the advantage of girls in Jordan is due to a single-sex
schooling affect that is not raising latent mathematical abil-
ity. Examples of causes would include differences in stu-
dent motivation or differences in teaching styles at male
versus female schools. Verifying that this is the case may
be relevant to the ongoing debate about the relation
between sex differences in math and social equality. This
conjecture is consistent with the current findings: 4th-grade
boys had a small to medium size advantage in coed schools,
where the item parameter correlations were highly similar
between sexes; while girls performed better across single-
sex schools, in which case the relatively low loading param-
eter correlation suggested measurement non-invariance.
As for other results, in this study we were only partially
able to replicate results from other countries showing that
boys have an advantage on more difficult test questions.
This relation did not replicate within coed schools. Why this
is the case and what general relevance it has for sex differ-
ences is not clear. This issue deserves further investigation.
Consistent with previous results, we found no relation
between sex differences and Blooms cognitive taxonomy.
Interpretations that this indicates no sex differences by item
difficulty are questionable, however, given the weak corre-
lation between item difficulty and cognitive taxonomy.
We found no effect of item position in the multiple
regression model. As item position effects are often taken
as indexes of motivation effects, this might argue against
a motivational hypothesis for differences. Consistent with
prior research, we found no substantial effects of item
domain (number, data analysis, and geometry). As we did
not have relevant information, we could not determine if
boys performed better on unfamiliar items involving esti-
mation and approximation as found by Innabi and Dodeen
(2006).
An advantage of this study was that the sample size was
very large and representative (involving 50% of the total
4th-grade population). As such, the present results can be
said to be extremely reliable with regards to sampling error.
The study had several limitations, though. First, data for
only 40 items were available, a moderate number which
results in somewhat low power to detect small but perhaps
important item-level relationships. Second, related to this,
the item distribution among content types and levels was
imbalanced, reducing the statistical certainty for item
content analyses. Third, there was very limited information
available about the students. For instance, student social
Table 6.Regression results for sex differences by item characteristics
Model 1: All students Model 2: Students in coed schools
Predictor βSE B CI lower CI upper βSE B CI lower CI upper
Position 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
Difficulty 0.16 0.04 0.08 0.24 0.05 0.03 0.01 0.11
Loading 0.20 0.18 0.16 0.56 0.59 0.15 0.28 0.90
Content: number 0.00 0.00
Content: data analysis 0.06 0.06 0.18 0.06 0.01 0.06 0.12 0.10
Content: geometry 0.01 0.04 0.06 0.08 0.02 0.03 0.05 0.09
Level: application 0.00 0.00
Level: knowledge 0.02 0.03 0.09 0.05 0.02 0.03 0.08 0.05
Level: reasoning 0.00 0.04 0.07 0.07 0.00 0.03 0.07 0.07
Notes. Outcome: item d-value.N=40.Reference classes, content: number and level: application, were set to the modes. B= unstandardized betas;
SE = standard error; CI = 95% confidence intervals.Model 1: R
2
= .48; R
2
-adj = .37; Model 2: R
2
= .37 and R
2
-adj = .23.
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders 9
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
class data were unavailable, making it impossible to test
if school type selection, and thus student composition,
explained sex differences in coed schools and between sin-
gle-sex schools. Fourth, the sample only contained students
from a single grade, which consequently makes it impossi-
ble to analyze between grade effects. Further analyses will
have to take advantage of more comprehensive national
datasets involving school, teacher, and student level indica-
tors to further explore the issues noted.
Acknowledgments
The authors extend their appreciation to the Deanship of
Scientic Research at King Saud University for funding this
work through Research Group No. RG-1438-064.
References
Ababneh, E., Al-Tweissi, A., & Abulibdeh, K. (2016). TIMSS and
PISA impact The case of Jordan. Research Papers in
Education, 31, 542555. https://doi.org/10.1080/02671522.
2016.1225350
AlSindi, N. A. (2013). Single-sex schooling and mathematics
achievement in the Middle East: The case of Iran, Syria, Jordan,
and Oman (Masters thesis). Retrieved from http://hdl.handle.
net/10822/558666
Alomari, H., & Shatnawi, A. (2016). Differential Item Functioning of
the National Educational Quality Control Test in Mathematics
for 10th Grade According to Gender [in Arabic]. Alnajah
University Journal for Researches (Human Sciences), 30,
15301554. Retrieved from https://journals.najah.edu/article/
1306/
Beller, M., & Gafni, N. (2000). Can item format (multiple choice vs.
open-ended) account for gender differences in mathematics
achievement? Sex Roles, 42,121.
Bielinski, J., & Davison, M. L. (2001). A sex difference by item
difficulty interaction in multiple-choice mathematics items
administered to national probability samples. Journal of Edu-
cational Measurement, 38,5177. https://doi.org/10.1111/
j.1745-3984.2001.tb01116.x
Bloom, B. S. (1984). Taxonomy of educational objectives book 1:
Cognitive domain (2nd ed.). New York, NY: Addison Wesley.
Borgonovi, F., & Biecek, P. (2016). An international comparison of
studentsability to endure fatigue and maintain motivation
during a low-stakes test. Learning and Individual Differences,
49, 128137. https://doi.org/10.1016/j.lindif.2016.06.001
Bursan, I. (2013). Gender related differential item functioning for
Jordanian national test items for mathematics learning quality
control for tenth grade [in Arabic]. Journal of Education in
Zagazig University, 79,1535.
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Debeer, D., & Janssen, R. (2013). Modeling item-position effects
within an IRT framework. Journal of Educational Measurement,
50, 164185. https://doi.org/10.1111/jedm.12009
DeMars, C. (2010). Item Response Theory. Oxford, UK/New York,
NY: Oxford University Press.
Dickerson, A., McIntosh, S., & Valente, C. (2015). Do the maths: An
analysis of the gender gap in mathematics in Africa. Economics
of Education Review, 46,122. https://doi.org/10.1016/j.
econedurev.2015.02.005
Else-Quest, N. M., Hyde, J. S., & Linn, M. C. (2010). Cross-national
patterns of gender differences in mathematics: A meta-
analysis. Psychological Bulletin, 136, 103127. https://doi.org/
10.1037/a0018053
Fan, X. (1998). Item response theory and classical test theory: An
empirical comparison of their item/person statistics. Educa-
tional and Psychological Measurement, 58, 357381. https://
doi.org/10.1177/0013164498058003001
Fryer, R. G. Jr., & Levitt, S. D. (2010). An empirical analysis of the
gender gap in mathematics. American Economic Journal:
Applied Economics, 2, 210240.
Halpern, D. F., Benbow, C. P., Geary, D. C., Gur, R. C., Hyde, J. S., &
Gernsbacher, M. A. (2007). The science of sex differences in
science and mathematics. Psychological Science in the Public
Interest: A Journal of the American Psychological Society, 8,1
51. https://doi.org/10.1111/j.1529-1006.2007.00032.x
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest
people in the world? The Behavioral and Brain Sciences,
33,6183. Discussion 83135. https://doi.org/10.1017/
S0140525X0999152X
Innabi, H., & Dodeen, H. (2006). Content analysis of gender-related
differential item functioning TIMSS items in mathematics in
Jordan. School Science and Mathematics, 106, 328337.
https://doi.org/10.1111/j.1949-8594.2006.tb17753.x
Kane, J. M., & Mertz, J. E. (2012). Debunking myths about gender
and mathematics performance. Notices of the AMS, 59,1021.
https://doi.org/10.1090/noti790
Lan, M. C. (2014). Exploring gender differential item functioning
(DIF) on eighth grade mathematics items for the United States
and Taiwan (Doctoral dissertation). Retrieved from http://hdl.
handle.net/1773/26209
Lehre, A. C., Lehre, K. P., Laake, P., & Danbolt, N. C. (2009).
Greater intrasex phenotype variability in males than in females
is a fundamental aspect of the gender differences in humans.
Developmental Psychobiology, 51, 198206. https://doi.org/
10.1002/dev.20358
Liou, P. Y., & Bulut, O. (2017). The effects of item format and
cognitive domain on studentsscience performance in TIMSS
2011. Research in Science Education. Advance online publica-
tion. https://doi.org/10.1007/s11165-017-9682-7
Mäkitalo, Å. (1996). Gender differences in performance on the
DTM subtest in the Swedish scholastic aptitude test as a
function of item position and cognitive demands. Scandinavian
Journal of Educational Research, 40, 189201.
Nosek, B. A., Smyth, F. L., Sriram, N., Lindner, N. M., Devos, T.,
Ayala, A., ... Kesebir, S. (2009). National differences in gender
science stereotypes predict national sex differences in science
and math achievement. Proceedings of the National Academy of
Sciences, 106, 1059310597. https://doi.org/10.1073/pnas.
0809921106
OECD. (2015). PISA 2015 Technical Report. Paris, France: OECD.
Retrieved from http://www.oecd.org/pisa/data/2015-technical-
report/
Penner, A. M. (2003). International gender item difficulty
interactions in mathematics and science achievement tests.
Journal of Educational Psychology, 95, 650655. https://doi.
org/10.1037/0022-0663.95.3.650
R Core Team. (2017). R: A language and environment for statistical
computing (Version 3.4.0). Vienna, Austria: R Foundation for
Statistical Computing. Retrieved from https://www.R-project.
org
Journal of Individual Differences (2018) Ó2018 Hogrefe Publishing
10 I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
Revelle, W. (2017). psych: Procedures for psychological, psycho-
metric, and personality research. (Version 1.7.8). Retrieved from
https//cran.r-project.org/web/packages/psych/index.html
Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal
of Modern Applied Statistical Methods, 8, 597599. https://doi.
org/10.22237/jmasm/1257035100
Schroeders, U., Wilhelm, O., & Olaru, G. (2016). The influence of
item sampling on sex differences in knowledge tests. Intelli-
gence, 58,2232. https://doi.org/10.1016/j.intell.2016.06.003
Shteiwi, M. (2015). Attitudes towards gender roles in Jordan.
British Journal of Humanities and Social Sciences, 12,1527.
Stoet, G., & Geary, D. C. (2015). Sex differences in academic
achievement are not related to political, economic, or social
equality. Intelligence, 48, 137151. https://doi.org/10.1016/j.
intell.2014.11.006
Tweissi, A., Ababneh, I., & Lebdih, K. A. (2014). Gender gap in
student achievement in Jordan study report. Retrieved from
http://www.nchrd.gov.jo/assets/PDF/Studies/En/Gender%
20Gap%20Report%20%2008_25_14.pdf
Uebersax, J. S. (2015, September 8). Introduction to the tetra-
choric and polychoric correlation coefficients. Retrieved from
http://john-uebersax.com/stat/tetra.htm
Wicherts, J. M. (2017). Psychometric problems with the method of
correlated vectors applied to item scores (including some
nonsensical results). Intelligence, 60,2638. https://doi.org/
10.1016/j.intell.2016.11.002
Wicherts, J. M., & Johnson, W. (2009). Group differences in the
heritability of items and test scores. Proceedings of the Royal
Society B: Biological Sciences, 276, 26752683. https://doi.org/
10.1098/rspb.2009.0238
Zeller, F., Reiß, S., & Schweizer, K. (2017). Is the item-position
effect in achievement measures induced by increasing item
difficulty? Structural Equation Modeling: A Multidisciplinary
Journal, 24, 745754. https://doi.org/10.1080/10705511.2017.
1306706
Received November 13, 2017
Revision received May 20, 2018
Accepted May 29, 2018
Published online November 26, 2018
John Fuerst
Ulster Institute for Social Research
London
UK
j122177@hotmail.com
Ó2018 Hogrefe Publishing Journal of Individual Differences (2018)
I. S. Al-Bursan et al., Sex Differences in 32,347 Jordanian 4th Graders 11
${protocol}://econtent.hogrefe.com/doi/pdf/10.1027/1614-0001/a000278 - John Fuerst <j122177@hotmail.com> - Monday, November 26, 2018 12:23:41 PM - IP Address:104.183.60.80
... It is notable here that Jensen's method (method of correlated vectors), which has been criticized for nonsensical results (Wicherts, 2017;Wicherts & Johnson, 2009), was actually congruent with the item response theory-based results, even detecting the same highly biased knowledge item. Thus, our study indirectly shows that this method can produce sensible results with item data, as long as the analysis is done correctly using item response theory-based metrics (for another example, see Al-Bursan et al., 2018). 1 Fig. 9 Jensen's method applied to the 12 items from the total religiousness scale. The X axis is the item's factor loading on the religiousness factor. ...
Article
Full-text available
A recent study by Dutton et al. (J Relig Health 59:1567–1579. https://doi.org/10.1007/s10943-019-00926-3, 2020) found that the religiousness-IQ nexus is not on g when comparing different groups with various degrees of religiosity and the non-religious. It suggested, accordingly, that the nexus related to the relationship between specialized analytic abilities on the IQ test and autism traits, with the latter predicting atheism. The study was limited by the fact that it was on group-level data, it used only one measure of religiosity that measure may have been confounded by the social element to church membership and it involved relatively few items via which a Jensen effect could be calculated. Here, we test whether the religiousness-IQ nexus is on g with individual-level data using archival data from the Vietnam Experience Study, in which 4462 US veterans were subjected to detailed psychological tests. We used multiple measures of religiosity—which we factor-analysed to a religion-factor—and a large number of items. We found, contrary to the findings of Dutton et al. (2020), that the IQ differences with regard to whether or not subjects believed in God are indeed a Jensen effect. We also uncovered a number of anomalies, which we explore.
... However, this name is misleading because mathematically speaking, any correlation is a correlation of vectors, so the name describes nothing more than the regular correlation. Furthermore, the idea is more general and can also be used with multiple regression models and other multivariate methods(Al-Bursan et al., 2018). For this reason, it is more apt to name it Jensen's method, in his honor. ...
Article
Full-text available
Prior research has indicated that one can summarize the variation in psychopathology measures in a single dimension, labeled P by analogy with the g factor of intelligence. Research shows that this P factor has a weak to moderate negative relationship to intelligence. We used data from the Vietnam Experience Study to reexamine the relations between psychopathology assessed with the MMPI (Minnesota Multiphasic Personality Inventory) and intelligence (total n = 4,462: 3,654 whites, 525 blacks, 200 Hispanics, and 83 others). We show that the scoring of the P factor affects the strength of the relationship with intelligence. Specifically, item response theory-based scores correlate more strongly with intelligence than sum-scoring or scale-based scores: r’s = -.35, -.31, and -.25, respectively. We furthermore show that the factor loadings from these analyses show moderately strong Jensen patterns such that items and scales with stronger loadings on the P factor also correlate more negatively with intelligence (r = -.51 for 566 items, -.60 for 14 scales). Finally, we show that training an elastic net model on the item data allows one to predict intelligence with extremely high precision, r = .84. We examined whether these predicted values worked as intended with regards to cross-racial predictive validity, and relations to other variables. We mostly find that they work as intended, but seem slightly less valid for blacks and Hispanics (r’s .85, .83, and .81, for whites, Hispanics, and blacks, respectively).
Article
Full-text available
The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Article
Full-text available
Jordan has participated in international large-scale assessments (LSAs) since 1991 and in most of the Trends in International Mathematics and Science Studies (TIMSS) as well as the Programme for International Student Assessments (PISA). After a short description of education system and policy-making context in the country, this article provides an overview of Jordan’s involvement in and reporting of LSAs. It then discusses various ways in which results from TIMSS and PISA have impacted education policy, for example, as key performance indicators in the policy-making process, through the development of school curricula and teacher training guides, in the evaluation of initiatives and programmes by donors or by providing data to address questions regarding gender differences or declining performance. The article concludes with considerations as to how to further increase the usefulness of data from international LSAs through an increase in student motivation, further training in the data analyses and improved funding security.
Article
This work examines the hypothesis that the arrangement of items according to increasing difficulty is the real source of what is considered the item-position effect. A confusion of the 2 effects is possible because in achievement measures the items are arranged according to their difficulty. Two item subsets of Raven’s Advanced Progressive Matrices (APM), one following the original item order, and the other one including randomly ordered items, were applied to a sample of 266 students. Confirmatory factor analysis models including representations of both the item-position effect and a possible effect due to increasing item difficulty were compared. The results provided evidence for both effects. Furthermore, they indicated a substantial relation between the item-position effects of the 2 APM subsets, whereas no relation was found for item difficulty. This indicates that the item-position effect stands on its own and is not due to increasing item difficulty.