PreprintPDF Available

Female Chess Players Do Underperform When Playing Against Men: Commentary on Stafford (2018)

Preprints and early-stage research may not have been peer reviewed yet.


One real-life domain in which sex differences in cognition are studied is chess, an intellectual sport in which men and women compete head-to-head. This commentary on Stafford (2018) demonstrates the importance of including other factors, namely players’ ages, in analyses of differences between men and women. Because female chess players are, on average, younger than male players, not taking into account the ages of both players could have consequential effects on the analyses. In my study of official chess data from around the world, I found that to indeed be the case.
Female Chess Players Do Underperform When Playing Against
Men: Commentary on Stafford (2018) *
Uri Zak
The Hebrew University of Jerusalem
February 16, 2020
Stafford (2018) found that female chess players outperform expectations when playing against
men, in a study of data from over 5.5 million official games around the world. I examined whether
that result could stem from not controlling for the ages of both players, as female players tend to
be much younger than male players. Using the same data as Stafford, I was able to replicate his
main result only when the opponent’s age was ignored. When the ages of both players were
included in the analysis, the gender-composition effect was reversed. Further analyses using other
data demonstrated the robustness of this pattern, re-establishing that female chess players
underperform when playing against men. Prior to Stafford’s paper, the leading premise was that
women encounter psychological obstacles that prevent them from performing at their normal
capacity against men. My commentary continues that line of evidence and is consistent with the
stereotype-threat explanation.
*The Sonas-FIDE 200-Month Dataset used here was assembled by Jeff Sonas
( and his work made this commentary possible. I am also grateful to
Judith Avrahami, Igor Bitensky, and Yaakov Kareev for their thoughtful suggestions.
This commentary was submitted to the Psychological Science journal, yet it was not considered
for publication due to another commentary on Stafford (2018) which was in process.
Discussion Paper # 734 (March 2020)
םילשוריב תירבעה הטיסרבינואה
תוילנויצרה רקחל ןמרדפ זכרמ
Feldman Building, Edmond J. Safra Campus,
Jerusalem 91904, Israel
PHONE: [972]-2-6584135 FAX: [972]-2-6513681
One real-life domain in which sex differences in cognition are studied is chess, an intellectual sport
in which men and women compete head-to-head. While male dominance in chess is indisputable,
its origins and implications remain unclear. Recently, Stafford (2018) compared the performance
of female players when playing male versus female opponents, using data from over 5.5 million
official chess games from around the world. That straightforward comparison captured a subtle
notion: If male dominance creates expectations that cause female players to perform poorly
(stereotype threat), female players will obtain inferior results when playing against men, but not
when playing against women.
Whereas previous findings have been consistent with the concept of stereotype threat
(Backus, Cubel, Guid, Sánchez-Pages, & Mañas, 2016; de Sousa & Hollard, 2015; Maass,
D’Ettole, & Cadinu, 2008; Rothgerber & Wolsiefer, 2014), Stafford concluded that the opposite
was the case: “Female players, far from suffering a stereotype threat, display a boost in
performance when playing men compared with playing women.” (p. 5). To reconcile that disparity,
my commentary considers whether Stafford’s main result may have stemmed from not controlling
for the ages of both players.
There is good a priori reason to assume that gender and age may be confounded in this
context. Among active players, the women are, on average, considerably younger than the men
(e.g., Blanch, 2016; Gerdes & Gränsmark, 2010).1 Stafford did control for the focal female players
birth years, but not for their opponents’ birth years. That is, the fact that a male opponent is likely
to be older than a female opponent was ignored. That omission could be consequential because
younger chess players are typically in a skill-acquisition period (e.g., Vaci & Bilalić, 2017), their
productivity is usually higher (Bertoni, Brunello, & Rocco, 2015), and such effects are not
necessarily reflected in their chess ratings (Viswanath, 2016).
Would controlling for the opponent’s age attenuate or even reverse Stafford’s findings?
Are the data truly inconsistent with a stereotype-threat effect? To answer these questions, I used
the same data as Stafford, with the same principal sample in which both players were rated and at
least one of them was a woman (i.e., 886,697 games from 104,824 male players and 16,156 female
players, played from January 2008 through August 2015). As expected, the female players (M age
= 22.24) were much younger than the male players (M age = 34.34). I fitted two regression models
to predict the outcome of a match for the focal female player: with and without a control for the
opponent’s age.
The first model corresponded to Stafford’s analyses. It included a dummy independent
variable indicating whether the opponent was a man or a woman, the popular predictor of
difference in players’ Elo ratings (Elo, 1978), and a control for the focal player’s age.2 Not
surprisingly, Stafford’s main finding was replicated. As shown in Column 1 of Table 1, the
coefficient of playing a male opponent was significantly positive (p < .001). However, as shown
in Column 2, this result was reversed when I controlled for the opponent’s age. This finding
supports the hypothesized confounding effect and demonstrates the importance of including both
players’ ages in the analyses. Predictably, the player’s age coefficient was negative; whereas the
opponent’s age coefficient was positive, both implying that performance declines with age.3
When I controlled for the opponent’s age, whether the focal player played white or black,
whether or not the two players were listed in the same country, and how many games were recorded
in the sample for each of the two players (a proxy for level of practice; Column 3 of Table 1), the
pattern of results remained unchanged. Finally, I repeated all of these analyses using other data
from official chess games played worldwide from January 2000 through December 2007.4 Once
again, controlling for the opponent’s age altered the gender-composition coefficient (Columns 4,
5, and 6 of Table 1). Overall, these results re-establish that female chess players underperform
when playing men, all else being equal.
Why should gender-composition matter after removing the variation due to other factors,
including players’ ages and skill differences? Prior to Stafford’s paper, the leading premise was
that women encounter psychological obstacles that prevent them from performing at their normal
capacity against men. Gerdes and Gränsmark (2010) considered whether players adjust their
strategies according to their opponent’s gender and found that, if anything, such behavior would
improve female players’ results. Backus et al. (2016) found that women played worse against men,
while men did not play better against women. Other explanations based on culture and chess
experience were considered and rejected by de Sousa and Hollard (2015). The current commentary
continues that earlier line of evidence and is consistent with the stereotype-threat explanation.
On a final note, it is important to bear in mind that the data used here were collected among
competitive chess players who had reached a certain level of expertise. As female players who do
compete are probably those least vulnerable to stereotype threat (Rothgerber & Wolsiefer, 2014),
stereotype threat in chess may be even more pervasive than one might conclude based only on the
results presented here.
Table 1
Results of Female Players Regressed With OLS
Data from 2008 through 2015
Data from 2000 through 2007
Male Opponent
Elo (Player) Elo (Opponent)
Opponent's Age
Note. The dependent variable is 1 (win), 0.5 (draw), or 0 (loss). The opponent is either male or female. Robust standard errors are
shown in parentheses. Standard errors are clustered at the player level.
*** p < .001.
1. As noted by de Sousa and Hollard (2015), the age difference can be explained by the
disproportionate dropping out of young women and the presence of older male newcomers.
2. The models were estimated with OLS and the focal player in female-only games was randomly
selected, as is common in analyses of chess data (e.g., Gränsmark, 2012). Fitting a fractional-
response model instead of OLS yielded similar results.
3. To verify these age effects, I examined 4,593,695 games involving male-only competitors. The
results resembled those presented in Column 2.
4. During those years, the World Chess Federation did not track game-by-game results. However,
some of these results were extracted by Jeff Sonas.
Backus, P., Cubel, M., Guid, M., Sanchez-Pages, S., & Mañas, E. (2016). Gender, competition
and performance: Evidence from real tournaments. SSRN Papers. Retrieved from
Bertoni, M., Brunello, G., & Rocco, L. (2015). Selection and the ageproductivity profile.
Evidence from chess players. Journal of Economic Behavior & Organization, 110, 45
Blanch, A. (2016). Expert performance of men and women: A cross-cultural study in the chess
domain. Personality and Individual Differences, 101, 9097.
de Sousa, J., & Hollard, G. (2015). Gender differences: Evidence from field tournaments.
CEPREMAP Papers. Retrieved from
Elo, A. E. (1978). The rating of chess players, past and present. New York, NY: Arco
Gerdes, C., & Gränsmark, P. (2010). Strategic behavior across gender: A comparison of female
and male expert chess players. Labour Economics, 17(5), 766775.
Gränsmark, P. (2012). Masters of our time: Impatience and self-control in high-level chess
games. Journal of Economic Behavior & Organization, 82(1), 179191.
Maass, A., D’Ettole, C., & Cadinu, M. (2008). Checkmate? The role of gender stereotypes in the
ultimate intellectual sport. European Journal of Social Psychology, 38(2), 231245.
Rothgerber, H., & Wolsiefer, K. (2014). A naturalistic study of stereotype threat in young female
chess players. Group Processes & Intergroup Relations, 17(1), 7990.
Stafford, T. (2018). Female chess players outperform expectations when playing men.
Psychological Science, 29(3), 429436.
Vaci, N., & Bilalić, M. (2017). Chess databases as a research vehicle in psychology: Modeling
large data. Behavior Research Methods, 49(4), 12271240.
Viswanath, G. (2016). Age and the Elo rating system: How underrated are the kids? Retrieved
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
There is a growing literature looking at how men and women respond differently to competition. We contribute to this literature by studying gender differences in performance in a high-stakes and male dominated competitive environment, expert chess tournaments. Our findings show that women underperform compared to men of the same ability and that the gender composition of games drives this effect. Using within player variation in the conditionally random gender of their opponent, we find that women earn significantly worse outcomes against male opponents. We examine the mechanisms through which this effect operates by using a unique measure of within game quality of play. We find that the gender composition effect is driven by women playing worse against men, rather than by men playing better against women. The gender of the opponent does not affect a male player's quality of play. We also find that men persist longer against women before resigning. These results suggest that the gender composition of competitions affects the behavior of both men and women in ways that are detrimental to the performance of women. Lastly, we study the effect of competitive pressure and find that players' quality of play deteriorates when stakes increase, though we find no differential effect over the gender composition of games.
Full-text available
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess—namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
"Stereotype threat" has been offered as a potential explanation of differential performance between men and women in some cognitive domains. Questions remain about the reliability and generality of the phenomenon. Previous studies have found that stereotype threat is activated in female chess players when they are matched against male players. I use data from over 5.5 million games of international tournament chess and find no evidence of a stereotype threat effect. In fact women players outperform expectations when playing men. Further analysis shows no influence of degree of challenge, nor of player age, nor of prevalence of female role models in national chess leagues on differences in performance when women play men versus when they play women. Though this analysis contradicts one specific mechanism of influence of gender stereotypes, the persistent differences between male and female players suggest that systematic factors do exist and remain to be uncovered.
There is a persistent higher performance of men over women in chess that has been attributed to the disproportioned participation rates of men and women in this domain, but also to biological and cultural factors. This study addresses the disparity between men and women in performance at the expert chess level. Actual sex differences in chess performance were contrasted with differences estimated from the divergent participation rates of men and women chess players from twenty-four countries in the Eurasian region. There was a male advantage in chess performance throughout all countries. Sex differences in chess performance emerged for all the studied countries, with remarkable and highly variable unexplained gaps that were unrelated to the men versus women ratios. The cross-country variability about sex differences in chess performance indicates differences in geographical and cultural factors that might elicit differential participation rates, starting age, and perseverance in the domain for men and women. These differences are also likely to underlie the remarkable disparity in expert chess performance of men and women than only differential participation rates.
The present research sought to determine whether young female chess players would demonstrate stereotype threat susceptibility in a naturalistic environment. Data from 12 scholastic chess tournaments indicated that females performed worse than expected when playing against a male opponent, achieving 83% of the expected success based on their own and their opponent's prerating. These effects were strongest for the youngest players in lower elementary school but also present for those in upper elementary. Stereotype threat susceptibility was most pronounced in contexts that could be considered challenging: when playing a strong or moderate opponent and when playing someone in a higher or the same grade. As evidence of disengagement, those most vulnerable to stereotype threat were less likely to continue playing in future chess tournaments. These results were not found in a matched comparison male group suggesting the outcomes were unique to stereotype threat and not universal to young chess players.
We use data on professional chess tournaments to study how endogenous selection affects the relationship between age and mental productivity in a brain-intensive profession. We show that less talented players are more likely to drop out, and that the age-productivity gradient is heterogeneous by ability, making fixed effects estimators inconsistent. Since we do not observe the players who dropped out of chess before the beginning of our sampling period, we cannot exploit the standard Heckman sample selection correction procedure. Therefore, we correct for selection by using an imputation method that repopulates the sample by applying to older cohorts the self-selection patterns observed in younger cohorts. We estimate the age-productivity profile on the repopulated sample using median regressions, and find that median productivity increases by close to 5 percent from initial age (15) to peak age (21.6), and declines substantially after the peak. At age 50, it is about 10 percent lower than at age 15. We compare profiles in the unadjusted and in the repopulated sample and show that failure to adequately address endogenous selection in the former leads to substantially over-estimating productivity at any age relative to initial age.
This paper presents empirical findings on gender differences in time preference and inconsistency based on international, high-level chess panel data with a large number of observations, including a control for ability. Due to the time constraint in chess, it is possible to study performance and choices related to time preferences. The results suggest that men play shorter games on average and pay a higher price to end the game sooner. They also perform worse in shorter game compared to women but better in longer games. Furthermore, women perform worse in time pressure (the 40th move time control). The results are consistent with the interpretation that men are more impatient (with a lower discount factor) but also more inconsistent in the sense that they tend to be too impatient. Women, on the other hand, are more inconsistent as they tend to over-consume reflection time in the beginning, leading to time pressure later.
Women are surprisingly underrepresented in the chess world, representing less that 5% of registered tournament players worldwide and only 1% of the world's grand masters. In this paper it is argued that gender stereotypes are mainly responsible for the underperformance of women in chess. Forty-two male–female pairs, matched for ability, played two chess games via Internet. When players were unaware of the sex of opponent (control condition), females played approximately as well as males. When the gender stereotype was activated (experimental condition), women showed a drastic performance drop, but only when they were aware that they were playing against a male opponent. When they (falsely) believed to be playing against a woman, they performed as well as their male opponents. In addition, our findings suggest that women show lower chess-specific self-esteem and a weaker promotion focus, which are predictive of poorer chess performance. Copyright © 2007 John Wiley & Sons, Ltd.
This paper aims to measure differences in risk behavior among expert chess players. The study employs a panel data set on international chess with 1.4 million games recorded over a period of 11 years. The structure of the data set allows us to use individual fixed-effect estimations to control for aspects such as innate ability as well as other characteristics of the players. Most notably, the data contains an objective measure of individual playing strength, the so-called Elo rating. In line with previous research, we find that women are more risk-averse than men. A novel finding is that men choose more aggressive strategies when playing against female opponents even though such strategies reduce their winning probability.