In many societies, mathematics is often considered a male domain (Nosek et al., 2009). Males tend to have higher motivation and self-efficacy for mathematics in comparison to females (e.g. Skaalvik & Skaalvik, 2004). On the other hand, gender differences in mathematics achievement are not that unambiguous. While girls usually have higher school grades during the entire education period (Duckworth & Seligman, 2006; Hicks, Johnson, Iacono & McGue, 2008), males have somewhat higher results on standardized mathematics tests (e.g. De Fruyt et al., 2008).
However, findings of the meta-analyses indicate that male advantage in mathematics test achievement has decreased or even disappeared in the last decades (De Lisi & McGillicuddy-De Lisi, 2002; Lindberg, Hyde, Petersen & Linn, 2010). Male advantage in test results is more often found in adolescence and in highly selected samples. Furthermore, male results have somewhat larger variability than female results (Hyde, Fennema & Lamon, 1990; Lindberg, Hyde, Petersen & Linn, 2010).
The main objective of this study was to explore the taxonomy of mathematics items’ characteristics that are expected to yield gender differences in performance on these items (Gallagher et al., 2000). This taxonomy was based on a psychobiosocial model (Halpern, 1997; 2000), which emphasizes the reciprocal relationships among different types of variables (psychological, biological and social) in the process of learning.
According to this taxonomy, males will have higher results on items with multiple solution paths and items that require spatial skills. Females will have higher results on items that require verbal skills and items that require application of routine mathematical solutions (items that require application of routine mathematical solutions to a new unfamiliar situation, items that require application of routine mathematical solutions to a familiar situation, items that require memorization and items that require the use of symbolic processes; Gierl, Bisanz, Bisanz & Boughton, 2003).
In exploring this taxonomy, researchers usually compared average scores of males and females on mathematics items (Gallagher, Levin & Cahalan, 2002). This strategy is problematic because it ignores the effects of the Simpson’s paradox (1951). In this study, a confirmatory approach to testing the taxonomy hypotheses was used in the context of the Croatian State Matura exam, based on a combination of different methodologies that are more adequate for this purpose.
1. To examine gender differences in mathematical problem-solving performance.
2. To examine the congruence of item classifications in taxonomy categories; the first one made by mathematics teachers, and the second one made based on the students’ remarks while solving individual items.
3. To compare gender differences in mathematical problem-solving performance among general grammar school students and science and mathematics grammar school students.
4. To examine if it is possible to reduce gender difference in mathematical problem-solving performance by manipulating the item characteristics.
The data used in this study were obtained from final-year general grammar school and science and mathematics grammar school students who participated in the 2010 and 2011 administrations of the Croatian secondary school final examinations in Mathematics (higher level). In 2010, there were 3425 students from general grammar schools who attended this examination (1361 males and 2064 females) as well as 1577 students from science and mathematics grammar schools (954 males and 623 females). In 2011, 3650 students from general grammar schools (1419 males and 2231 females) and 1531 students from science and mathematics grammar schools (923 males and 608 females) attended the examination. Gender differences in the results on items from every category were analysed using different approaches: the analysis of mean gender differences, differential item functioning (DIF) and differential bundle functioning (DBF) analysis. The DBF analysis has higher statistical power in comparison to analyses conducted on individual items. Therefore, it is more suitable for testing hypotheses about items' characteristics responsible for gender differences in performance on these items. However, it is still relatively rarely used in educational context. In this research, DIF and DBF analyses were performed using Mantel-Haenszel test, SIBTEST / Poly-SIBTEST methodology and empirical curves.
Two types of item classifications in taxonomy categories were compared: the first one was made by mathematics teachers, and the second one was made based on the students’ remarks that were recorded after they tried to solve individual items. More specifically, in a small sample of final-year general grammar school students (N = 16; 8 males and 8 females), think aloud protocols were used to inspect students' ways of understanding and solving items. Transcripts were coded according to the descriptions of taxonomy categories. Congruence of the two classifications is necessary if we want to draw conclusions about relationships between values and directions of gender differences on items on the one hand and the belonging taxonomy category on the other hand.
Furthermore, gender differences among general grammar school students and science and mathematics grammar school students were compared. Students that attend science and mathematics grammar schools chose to enroll in these schools at least partially based on their motivation for mathematics. On average, these students achieve higher results on the final examinations in Mathematics than students from general grammar schools. The comparison of gender differences within these two groups of students was used to test the hypothesis about larger male advantage in highly selected samples.
In order to check if it is possible to reduce gender difference by manipulating the item characteristics, items that require verbal skills (the taxonomy category in which largest gender differences were found) were modified and different versions of the same items were applied on a sample of university students (N = 205; 81 males and 124 females).
Results and discussion
Gender differences in average total scores on examinations were negligible, which is in accordance with the results of meta-analyses (De Lisi i McGillicuddy-De Lisi, 2002; Lindberg, Hyde, Petersen i Linn, 2010). The taxonomy hypotheses were only partially confirmed. Namely, the results confirmed male advantage on items with multiple solution paths and items that require spatial skills. Females had higher results on items that require application of routine mathematical solutions to a familiar situation. These findings were in accordance with the hypotheses. However, gender differences were ambiguous on items that require memorization and items that require the use of symbolic processes. Furthermore, males had somewhat higher results on items that require application of routine mathematical solutions to a new unfamiliar situation. The largest difference in favour of male students was found on items requiring verbal skills and this finding was contradictory to the hypothesis. This finding was supported with different types of analyses (analysis of mean gender differences, DIF and DBF). These items were also measuring content domain mathematical modelling. Other items from this content domain yield gender differences in the same direction. Gender differences in other categories of taxonomy were rather small and of no practical importance. Based on these findings, further investigation of gender differences in verbal problems was conducted.
Gender difference on problems requiring verbal skills was replicated on a sample of university students but only when Mathematics high school grades were used as covariate. The value of gender difference did not change statistically significantly when rule / algorithm for problem solving was explicitly added in the verbal problem. Gender differences were not found in some additional variables that were used in this part of the research (results on the test of verbal series and the inventory of use of mathematics in everyday life).
The teachers' classification of items matched the classification that was based on students' statements to a great extent. The comparison of gender differences between general grammar and science and mathematics grammar school students’ results did not give unambiguous results. In other words, the hypothesis of larger male advantage in highly selected samples was not confirmed.
Different methodologies used in this study led to similar findings regarding the largest gender differences in mathematical verbal problems. Although this research did not yield clear conclusions regarding the reasons behind these differences, the results indicate that more attention should be given to the girls' acquisition of the strategies involved in solving mathematical verbal problems.
According to the author's best knowledge, this is the first study that combines the aforementioned methodological approaches in the research of gender differences in performance on mathematics tasks. The confirmatory approach to testing hypotheses about group differences in item performance used in this research can be used in various contexts, e.g. different school subjects.
Furthermore, this is the first comprehensive study of gender differences in the context of the Croatian State Matura Mathematics examinations. These examinations are gender-neutral to a great extent.