Content uploaded by Azlan Iqbal
Author content
All content in this area was uploaded by Azlan Iqbal on Nov 09, 2015
Content may be subject to copyright.
Which Gender Plays More Beautiful Chess?
A. Iqbal
College of Information Technology, Universiti Tenaga Nasional, Malaysia.
ABSTRACT: Chess is typically a male-dominated sport. However, women play the game as well but usually
against each other. The reasons for this are debatable. Aesthetics is also an important part of the game and the
reason why many people play. It is an essential component of the (even more so) male-dominated world of
chess composition. In this research, our goal was to determine if games between men and games between
women showed any statistically significant difference in terms of aesthetics. We analyzed using a
computational aesthetics model two sets of games (one small, one large) between males and between females
irrespective of playing strength and age. We found in the smaller set that there was no difference but in the
larger set that the games between men were, on average, more beautiful than those between women. This
suggests that men are more likely to have a better artistic sense in the game and therefore appreciate it more.
It might also help to explain the relative non-existence of master female chess problem composers. It follows
that, similarly, women may have better artistic senses in other games or domains as compared to men.
1 INTRODUCTION & REVIEW
Chess is a science, sport and art all at the same time.
It has been studied extensively in artificial intelli-
gence (AI), psychology and other fields over the
last 60 years, is regulated by the World Chess Fed-
eration (FIDE) and recognized by the International
Olympic Committee (IOC), and is also featured in
specialized magazines and newspapers worldwide
as puzzles with creative and artistic solutions.
Chess is dominated by men in the sense that the
greatest players (i.e. having the highest Elo rating)
have consistently been men. The highest rated chess
player of all time, for instance, is Magnus Carlsen
of Norway with a peak rating of 2882 in May 2014.
The highest rated female player of all time is Judit
Polgar of Hungary, with a peak rating of 2735 in
2005.
The difference in ratings between the aforemen-
tioned players may not seem like very much but at
the grandmaster level, even a few rating points can
be difficult to earn and even more easily lost. In the
top 100 players list for May 2015, Judit Polgar’s
rating mentioned above would sit at 23rd place. The
highest rated woman on that list is Hou Yifan of
China at 55th place; rated 2686 (World Chess Fed-
eration, 2015). Viswanathan Anand of India, nota-
bly, at the ‘advanced’ age of 45 years, is world
number 2 (rated 2804). This is just 5 years away
from being eligible as a ‘senior’ category player in
chess. Fig. 1 shows photographs of some of these
chess icons for the curious reader (Autopilot, 2012;
Stefan64, 2008; d'Andorra, 2012).
To our knowledge, there has been no documented
research looking specifically at aesthetics or beauty
in the game in relation to gender. However, previ-
ous work with regard to chess-playing ability and
gender has found, for instance, that women are
more risk-averse in playing (Gerdes and Gräns-
mark, 2010), though more males enter the sport at
lower levels in the game which translates to more
men at the highest levels (Chabris & Glickman,
2006).
Magnus Carlsen
Judit Polgar
Hou Yifan
Fig. 1. Top male and female players.
Some research suggests that women play as well as
men only when they are playing against other wom-
en but show a performance drop when playing
against men (Maass, D'Ettole & Cadinu, 2008). Yet
other research suggests that the lower performance
of women in chess may be due to statistical sam-
pling, i.e. far fewer women in the game (Bilalić,
Smallbone & McLeod, 2009). However, when a
veteran male grandmaster in the modern day claims
the difference is genetic, there can be a media storm
and much controversy (Friedel, 2015).
Our intention in this research is not to go into de-
tails about the dynamics and interplay of the gender
roles in contrast to other areas where there are such
discrepancies. We are simply interested in demon-
strating if the data and experimental results show,
quantitatively, that there is indeed such a difference
in this particular domain. The results of this re-
search can therefore be used in a larger tapestry dis-
cussing ideas related to gender differences in aes-
thetics. Our background is not in gender studies so
such an exploration of ideas would be better han-
dled by people who are; but they are free to use our
data and results.
In section 2, we explain the experimental materi-
als and assumptions. In section 3, we present the
results and a brief discussion. We conclude the pa-
per with section 4, summarizing the main points and
suggesting some directions for further work.
2 EXPERIMENTAL SETUP
For our experimental work, we used the updated
ChessBase Big Database 2015 (6,251,221 games)
as the primary resource for games (ChessBase
Shop, 2015a). We also used an experimentally-
validated computational chess aesthetics model that
is able to assess beauty in the game using various
domain-related aesthetic principles and themes. Its
evaluations with regard to three-movers and studies
(a longer type of chess problem) correlate positively
and well with domain-competent and expert human
assessment.
The model uses a combination of well-known
aesthetic principles in the game (e.g. economy, sac-
rifice) and themes (e.g. pin, fork) in combination
with stochastic components in order to produce an
aesthetics score for a given three-move mate se-
quence or endgame study. The second time a se-
quence or study is evaluated its score may be slight-
ly different, much like the second time a human
judge assesses a sequence he may decide to change
his mind slightly. So typically, one cycle is used for
all evaluations. This is not only faster but also more
akin to how human judgements are made. The aes-
thetics model is incorporated into the prototype
computer program, Chesthetica, and this software
was used to perform the evaluations. Further details
about the aesthetics model is available in (Iqbal et
al., 2012).
The first task was to filter the 6+ million games in
the database to those that ended with the white
pieces checkmating black. This could be done au-
tomatically in a few hours using the ChessBase 13
software (ChessBase Shop, 2015b). This left
157,358 games. Separating the games based on
gender was not relevant at this point because this
collection needed to be filtered again for mate-in-3
‘exclusivity’ (using the Chesthetica software). This
means ensuring that the last three moves of the line
played in a game is actually a forced mate-in-3 line
in that position, and not something that occurred
because the winner got lucky or the opponent over-
looked a possible defense. Doing so increases the
likelihood that more thought and skill went into the
actual game and the final winning sequence. A
forced line is also typically considered more beauti-
ful than one where the opponent could have defend-
ed longer or escaped checkmate. Even though there
may be more than one way to checkmate, if the line
played was not forced, it suggests a lack of skill or
attention on the part of the opponent.
Filtering for mate-in-3 exclusivity took about 5
days of continuous processing on a single desktop
computer as each of the 150,000+ games had to be
tested for the existence of forced mates and whether
the line actually played in the game was one of
them. We could not go back further than three
moves or test games that did not end in mate, even
though the aesthetics model theoretically supports
it, because the number of moves one would need to
go back cannot be the same for each game and
would therefore be arbitrary and lack consistency.
Exclusivity filtering left 34,868 games.
From these we were left with the difficult task of
isolating games between men and games between
women. Curiously, there is no easy way to do this
using the aforementioned world-standard chess da-
tabase management software (ChessBase 13). Fur-
thermore, typical chess player names, especially to
those not accustomed to them, can often not reflect
their gender clearly. So we decided to run a search
for the terms “(Women)” and “(Men)” in ‘any
field’, which returned the tournaments that were
sensible enough to include those terms in their ti-
tles. We also got a handful of hits with the term,
“girls”. Unfortunately, the term, “boys” returned
nothing as tournament names tend not to feature
that word.
The result was 1,069 games between women (or
girls) and only 115 games between men (or boys).
There was no filtering based on age or playing
strength as this study is concerned more with gen-
der differences and aesthetic quality of play. We
also did not have too many games left to work with.
We managed, however, to identify enough addi-
tional games between males to bring the 115 set to
1,069 as well. We also created a random subset of
the 1,069 games between females consisting of just
115 games. The result was two sets of games be-
tween men and between women that had the same
sample size, i.e. 115 and 1,069 games. We tested,
aesthetically using Chesthetica, the smaller sample
first.
It is worth noting that the aesthetic evaluation of
thousands of chess positions like these by human
experts would not be cost-effective, feasible in a
reasonable amount of time, or even consistent and
reliable. This sort of experiment can only be carried
out using a computational aesthetics model. For the
statistical testing of means, an F-test was first per-
formed on a pair of samples to determine if the two-
tailed T-test should be assuming equal (TTEV) or
unequal variances (TTUV). A significance level of
5% was used for all tests.
3 RESULTS & DISCUSSION
The experimental results we obtained were interest-
ing. The first set of 115 games between men and
115 games between women were aesthetically ana-
lyzed using Chesthetica and we found the mean for
the former was 1.847 and the latter, 1.810. The dif-
ference (TTUV) was not statistically significant.
The aesthetics score is typically used for ranking
purposes so even a numerically small difference
would rank one composition or game ahead of an-
other (or the average for a set ahead of another set).
The second set of 1,069 games was analyzed and
the mean for the games between men was 1.769 and
the mean for the games between women was 1.720.
The difference (TTEV) this time was statistically
significant: t(2136) = -2.094, P = 0.036.
So the larger set exposed a significant difference
between the average aesthetic quality of games be-
tween females as opposed to between males. The
critical last three forced winning moves, to be pre-
cise. What does this mean? In order to put these re-
sults in perspective, consider the data in Table 1.
The first two columns show the results just ex-
plained. The third column is a collection of 1,069
games of the same kind between strong players (not
exclusively men, but largely) with an Elo rating
above 2500 taken from Big Database 2011. The
fourth column is a collection of 1,069 published
chess compositions (three-movers) taken from the
Meson Database (The BDS Website, 2015). Chess
compositions are usually composed with aesthetics
in mind.
Table 1. Average aesthetic scores for various sets.
Women vs.
Women
Men vs.
Men
Elo >
2500
#3 Composi-
tions
1.720
1.769
1.789
2.281
A single factor analysis of variance (ANOVA) test
was performed across all the four sets comparing
the aesthetic means and the differences were found
to be statistically significant: F (3, 4272) = 254.95,
p < 0.001. This suggests that games between
stronger players rank higher aesthetically than those
between weaker players. It also shows, as expected,
that published chess problems rank the highest. We
used the ANOVA test because it is suitable for in-
cluding all available data and comparing group
means. If there were just two sets to compare, for
example, we may have used a TTEV, TTUV or
even the Mann-Whitney U test.
Consider the data in Table 2 which shows the re-
sults of two sets of 1,069 games taken from Big Da-
tabase 2011. The column to the right is a set of
games where the mates were exclusive and forced
whereas the column to the left is a set where there
was no forced mate in the position yet white won
anyway. It is generally accepted that such positions,
i.e. where there was no forced win, are considered
less beautiful, even in real games. We could only
test using the games of weak players (irrespective
of gender, by the way) because unforced wins sel-
dom occur at high levels of play; especially with
games that end in mate.
Table 2. Average aesthetic scores of forced vs. unforced wins.
Elo < 1500
(Unforced)
Elo < 1500
(Forced)
1.610
1.702
The difference (TTUV) in mean aesthetics here was
also statistically significant: t(2114) = 3.956, P <
0.001. This suggests, as expected, that games where
the checkmate did not stem from a forced position,
should rank lower aesthetically than those where the
checkmate was forced. Given the results in Table 1,
the idea that aesthetics tends to improve with play-
ing quality is also supported. Even very strong
players can only wish they played games where the
win actually resembled a chess composition. Final-
ly, consider the data in Table 3. This shows the aes-
thetic scores of 1,069 games randomly selected
from a collection of games between two chess en-
gines, (i.e. Rybka 3 and Fritz 8) at 10 minutes + 10
seconds time controls and 1 minute + 1 second time
controls.
Table 3. Average aesthetic scores of games between chess
engines.
Rybka 3 vs. Fritz 8 (10+10)
Rybka 3 vs. Fritz 8 (1+1)
1.979
1.992
The assumption might have been that, given less
time to make a move (i.e. 1+1), the quality of the
wins would be lower, like it is with human players.
However, the difference in means (TTEV) was not
significant. This suggests that computers, regardless
of their playing strength and ‘experience’ (if any),
have the same style of play or perhaps just no con-
scious or unconscious appreciation of art and how
to win in a more appealing way, if possible. This
style and appreciation for beauty is something that
usually comes with time and experience in human
players.
It is still interesting, however, that on average, the
aesthetic quality of engine vs. engine games is
higher than even between strong players (1.789) but
expectedly lower than chess compositions by expe-
rienced humans (2.281). We confirmed this using
an ANOVA test as before on the four relevant sets:
F (3, 4272) = 190.1, p < 0.001. Games between
chess engines may be more aesthetic or beautiful
because they probably stem from more precise and
logical moves which tend to lead to better economy
of pieces in the checkmate (a known principle of
beauty in the game). Both games between chess en-
gines and between human experts, however, tend to
lack the artistry and paradoxical nature of chess
problems, which is what makes them so appealing
and an art form.
Returning to the question of the aesthetic quality
of games between women as compared to between
men, the experimental evidence would suggest that
games between men do indeed rank higher than
games between women in terms of beauty. This
may not be evident, however, using relatively small
sets of games (i.e. around 100). Using a set of at
least 1,000 games is recommended, depending on
availability. Playing strength may be regarded as a
separate issue.
Do the results then imply that women have less
artistic appreciation of the game or play in a less
artistic fashion even though they may be good play-
ers? Perhaps. It would help explain the relative non-
existence of master female composers of chess
problems and lack of notable award-winning ‘bril-
liant’ games between women, such as Adolf An-
derssen’s, ‘Immortal Game’ and Bobby Fischer’s,
‘Game of the Century’.
Are there or can there be exceptions to the rule?
Certainly, as with most things. Logically, it would
also follow that there are likely domains where
women fare better aesthetically than men. Under-
standing these domains and differences better
would add to our body of knowledge about the hu-
man brain and gender differences. It may even help
optimize human performance in domains where
equal gender distribution is not a requirement.
4 CONCLUSIONS
In this research we have shown experimentally that
games between female players tend to be of lower
aesthetic quality than games between male players.
This is likely true, at least, in games that end deci-
sively in checkmate and where the win can be
traced back to three forced moves (as explained in
section 2). We are unable to draw any real conclu-
sions about playing strength from the experiments.
How might this finding be useful? For one thing, it
may be of interest to psychologists who study aes-
thetic perception in males and females. Chess, as a
domain of investigation, would be a place where it
can be investigated why females tend to lack in art-
istry or at least rank lower in artistry, compared to
males.
Likewise, there are probably domains where fe-
males fare better artistically. Neuroscientists may be
able to provide more insight into that question after
designing and conducting the right experiments. In
terms of aesthetics in chess, the difference discov-
ered in this research may or may not extend to
games that do not end in mate but as mentioned ear-
lier, the longer the sequence that needs to be inves-
tigated, the less reliable the experiment. There is
little reason to believe they do not, however. The
availability of data is also a significant issue. For
instance, there are simply not enough published
compositions by females (of the same level) to
compare against those by men so chess aesthetics in
one of its finer forms cannot, at this time, be inves-
tigated for consistency with the finding of this re-
search.
In general, what we have demonstrated should not
be taken too seriously as it merely opens a point of
inquiry related to the game that may not have been
adequately considered before. The technology to
investigate such a question has only relatively re-
cently become available and computers today can
already compose interesting chess problems of
some aesthetic merit on their own by integrating
information from various domains. Hundreds of ex-
amples of computer-generated chess problem com-
positions are available at (Iqbal, 2015), for instance.
So we are optimistic about further technological
progress in this area and related ones in the years to
come.
Whether or not the difference between the gen-
ders identified in this research can be confirmed or
stands the test of time remains to be seen, however.
It should also be interesting to explore if the more
established question of a difference in playing
strength between males and females relates in any
way to the aesthetics of gameplay. This is because it
is often assumed that stronger players play more
effectively and effectiveness tends to corrrelate with
aesthetics.
ACKNOWLEDGEMENTS
We would like to thank Frederic Friedel (co-
founder of ChessBase) and Woman Grandmaster
Jana Krivec for their assistance. This research was
sponsored, in part, by the Universiti Tenaga Na-
sional grant, J510050547.
REFERENCES
1. Autopilot (2012). http://upload.wikimedia.org/wikipedia/
commons/0/04/Magnus_Carlsen_cropped.jpg, 27 April.
2. Bilalić, M., Smallbone, K., McLeod, P., & Gobet, F.
(2009). Why are (the Best) Women So good at Chess?
Participation Rates and Gender Differences in Intellectual
Domains. Proceedings of the Royal Society B: Biological
Sciences, 276(1659), 1161-1165.
3. Chabris, C. F., & Glickman, M. E. (2006). Sex
Differences in Intellectual Performance Analysis of a
Large Cohort of Competitive Chess Players.
Psychological Science, 17(12), 1040-1046.
4. ChessBase Shop (2015a). Big Database 2015,
https://shop.chessbase.com/en/products/big_database_201
5, ChessBase, Hamburg, Germany.
5. ChessBase Shop (2015b). ChessBase 13,
http://shop.chessbase
.com/en/products/chessbase13_starter_package_engl,
ChessBase, Hamburg, Germany.
6. d'Andorra, Federació d'Escacs Valls (2012).
http://www.flickr.com/photos/
feva/7930695086, 4
September.
7. Friedel, F. (2015). Chess Gender Debate in the
International Press, ChessBase News, 21 April,
http://en.chessbase.com/post/chess-gender-debate-in-the-
international-press, Hamburg, Germany.
8. Gerdes, C., & Gränsmark, P. (2010). Strategic Behavior
Across Gender: A Comparison of Female and Male
Expert Chess Players. Labour Economics,17(5), 766-775.
9. Iqbal, A. (2015). YouTube. https://www.youtube.com/
c/AzlanIqbal
10. Iqbal, A., van der Heijden, H., Guid, M., & Makhmali, A.
(2012). Evaluating the Aesthetics of Endgame Studies: A
Computational Model of Human Aesthetic
Perception. Computational Intelligence and AI in Games,
IEEE Transactions on,4(3), 178-191.
11. Maass, A., D'Ettole, C., & Cadinu, M. (2008).
Checkmate? The Role of Gender Stereotypes in the
Ultimate Intellectual Sport. European Journal of Social
Psychology, 38(2), 231-245.
12. Stefan64 (2008). http://upload.wikimedia.org/wikipedia/
commons/c/ce/Judit_The_Look_Polgar.jpg.
13. The BDS Website (2015). Meson – Introduction,
http://www.bstephen.me.uk/index.php/meson
14. World Chess Federation (2015). Standard Top 100
Players. https://ratings.fide.com/top.phtml?list=men