Author info: Correspondence should be sent to: Rena A. Kirkland, Department of
Psychology, U. of Northern Colorado, 501 20
Street, Campus Box 94, Greeley,
CO, 80639; email@example.com
North American Journal of Psychology, 2013, Vol. 15, No. 1, 121-146.
Meta-analysis Reveals Adult Female Superiority
in “Reading the Mind in the Eyes Test”
Rena A. Kirkland, Eric Peterson, Crystal A. Baker,
Stephanie Miller, & Steven Pulos
University of Northern Colorado
We examined gender differences in healthy adults on the revised version
of the Reading the Mind in the Eyes Test (henceforth the Eyes Test),
developed by Baron-Cohen, Wheelwright, Hill, Raste, and Plumb (2001).
In this task, participants examine photographs of pairs of eyes and choose
among four descriptors (e.g., playful, comforting, irritating, bored).
Healthy adults and samples from ten countries (Australia, United States,
United Kingdom, Germany, Argentina, Canada, Ireland, Italy, Chile, and
Hungary) were included in the analysis. Consistent with previous
evidence of a small female advantage in decoding nonverbal behavior,
we found a small statistically significant effect for female superiority
over males on the Eyes Test (g=.177, k= 42). Together, the test for
heterogeneity and I² indicate that the female advantage on the Eyes Test
is homogenous across studies, suggesting that the variability of effect
sizes across studies is only due to what would be expected by random
subject-level error. We examined the following moderator analyses: (a)
language of Eyes Test administration; (b) country; (c) group of
researchers; and (d) reported data versus articles in which we requested
data from authors. The moderator analyses yielded no significant
differences. The small effect in favor of females suggests that women
tend to be better than men at judging emotions or mental states
represented by eye stimuli.
Across the past two decades a growing interest in measuring
individual differences in mental state understanding among adults has
given rise to the development of new instruments (e.g., Abell, Happé, &
Frith, 2000; Dziobek et al., 2006). Toward this effort, Baron-Cohen,
Jolliffe, Mortimore, and Robertson (1997) developed the Reading the
Mind in the Eyes Test (henceforth the Eyes Test). In this task, participants
examine photographs of the eye region and make a forced choice among
four descriptor words to match the eyes. Since its development, the Eyes
Test has been used in over 250 studies across at least fifteen countries.
Some studies report a female advantage (e.g., Carroll & Yung, 2006),
122 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
others report no significant gender differences (e.g., Mar, Oatley, Hirsh,
dela Paz, & Peterson, 2006) and yet other studies report gender effects
favoring males (e.g., Nettle & Liddle, 2008). The majority of the studies
do not report gender results (e.g., Spek, Scholte, & Van Berckelaer-
Onnes, 2010). In these cases, it is not clear if gender was not examined
in the first place or if no gender differences were found. The purpose of
the current meta-analysis is to investigate the gender effect on the Eyes
Test in healthy adults. Meta-analytic investigations provide a statistically
appropriate method of combining studies to capitalize on the large total
sample size (N = 4290 in the present investigation) in order to reduce
measurement error and provide a precise estimate of the true effect size
(Borenstein, Hedges, Higgins, & Rothstein, 2009). In our review below,
we first examined the general evidence for female superiority in
decoding nonverbal behavior and then we discussed the development of
the Eyes Test in order to make clear its contribution to the current
literature and the value of the present meta-analysis.
From birth, females are more likely than males to attend to social
stimuli (i.e., faces versus inanimate objects; Connellan, Baron-Cohen,
Wheelwright, Ba’tki, & Ahluwalia, 2001). Female infants maintain eye
contact more frequently and for longer durations than males (Argyle &
Ingham, 1972; Hittelman & Dickes, 1979; Leeb & Rejskind, 2004;
Lutchmaya, Baron-Cohen, & Raggatt, 2002; Podrouzek & Furrow,
1988). Between 9 and 12 months, females are more likely to initiate and
respond to joint attention (Mundy, et al., 2007; Olafsen, et al., 2006). In
regards to facial expression processing, McClure (2000) conducted two
meta-analyses and found small effects in favor of females in both infant
and child samples (d = .18 and .13, respectively).
The first unequivocal evidence of enhanced social cognition for
female adults was borne out of Hall’s (1978) meta-analysis on decoding
non-verbal behaviors. The effect was demonstrated in both visual and
auditory modalities (d= .32 and .18, respectively), with a very large
effect when both modalities were combined (d= 1.02). In a second,
larger meta-analysis, Hall (1984) found similar effects in favor of
females over males in decoding nonverbal behavior in nine countries. In
the decades since the earliest nonverbal behavior studies began, the
evidence for female superiority in decoding nonverbal behavior remains
robust (Rosip & Hall, 2004; Schmid, Schmid Mast, Bombari, & Mast,
Analogous to the infant and child findings, differential gender effects
have been observed in adults both in the amount of eye contact as well as
the ability to glean information from eye gaze (Alwall, Johansson, &
Hansen, 2010); males do not process, orient to, or utilize eye gaze as
efficiently as females (Deaner, Shepherd, & Platt, 2007). Hall, Hutton,
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 123
and Morgan (2010) used eye-tracking technology to investigate the
relationship between fixations to the eye region and emotion expression
recognition. Females demonstrated significantly more fixations as well
as longer dwell time to the eye region than males and were more accurate
in emotion expression recognition. In a gaze cueing paradigm, Bayliss,
Pellegrino, and Tipper (2005) demonstrated stronger cueing effects for
women for both a social cue (eye stimuli) and a nonsocial cue (arrow).
Using a similar gaze-cueing paradigm, Alwall, Johansson, and Hansen
(2010) randomly varied stimulus presentation time (100, 300, or 700
milliseconds, ms). Females revealed an enhanced gaze-cueing effect
across all three viewing times with the strongest gender effect for 300
With respect to face emotion perception, women tend to be more
accurate (Golan, Baron-Cohen, & Hill, 2006; Suzuki, Hoshino, &
Shigemasu, 2006) and faster (Montage, Kessels, Frigerio, De Haan, &
Perrett, 2005) than men at detecting emotional expressions. This gender
effect appears to be amplified in conditions when degraded stimuli are
used (i.e., when stimuli are 50% of the full emotion intensity; Hoffmann,
Kessler, Eppel, Rukavina, & Traue, 2010) and when stimuli are
presented quickly. For instance, in a face emotion judgment task (Hall &
Matsumoto, 2004), women were significantly more accurate than men for
face emotion judgments at brief (200 ms) stimulus presentations but not
at longer durations (10 seconds). Women have also been shown to
outperform men on basic face recognition tasks (Lewin & Herlitz, 2002).
For example, McBain, Norton, and Chen (2009) found females were
better than males at face detection and face identity discrimination when
emotion and gender cues were minimal to nonexistent.
Research examining adult performance on tasks that clearly require
an explicit mental state representation (e.g., representing the content of a
character’s mind) have reported varied gender differences. For example,
Ahmed and Miller (2011) found women did significantly better than men
on the Faux Pas test (participants read a short vignette and identify if an
error in social etiquette occurred; Baron-Cohen, O’Riordan, Stone, Jones,
& Plaisted, 1999). In contrast, Russell, Tchanturia, Rahman, and Schmidt
(2007) found that males outperformed females on Happé, Brownell, and
Winner’s (1999) Cartoon Task (single frame cartoons which require the
participant to infer the mental state of a character).
In summary, the predominant evidence for enhanced social cognition
in women is for reading nonverbal behavior, including emotional
expressions (e.g., Hall, 1978; 1984). A developmental review (e.g., Leeb
& Rejskind, 2004; McClure, 2000) suggests the female advantage begins
in infancy. In particular, female superiority was most apparent for
identifying relatively impoverished or rapidly presented face stimuli
124 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
(Hall & Matsumoto, 2004; McBain, Norton, & Chen, 2009), indicating
that women are more sensitive than men to subtle facial cues. To date,
the evidence for a female advantage in more explicit theory of mind tasks
remains equivocal. Of course, failure to observe gender differences in
mental state understanding among adults may reflect the constraint of
instrument sensitivity. The demand for instruments designed for the
study of individual differences in explicit and implicit mental state
understanding among healthy adults provided an important motivation
for the development of the Eyes Test.
The Reading the Mind in the Eyes Test (Eyes Test)
The Eyes Test requires participants to examine photographs of pairs
of eyes cut out from the face and choose among four possible descriptors
(e.g., playful, comforting, irritating, bored). The revised version includes
36 items, and participants receive one point for each correct item
(Wheelwright, Hill, Raste, & Plumb, 2001). In one of the first Eyes Test
studies, Baron-Cohen and colleagues demonstrated that parents of
children with Asperger syndrome performed significantly worse than
age- and IQ-matched control participants (Baron-Cohen & Hammer,
1997); in both the control and Asperger groups, females outperformed
males. Thus, this early study demonstrated the instrument's usefulness for
discriminating among high-functioning, normally-developing adults, and
it yielded the gender effect (female superiority) predicted by previous
studies (e.g., Hall, 1978). Since then, the Eyes Test has become a
standard for exploring the relationship between adult individual
differences in social cognition and overall phenotype (Bailey & Henry,
2008; Billington, Baron-Cohen, & Wheelwright, 2007; Ferguson &
Austin, 2010; Losh & Piven, 2007; Strong, Russell, Germine, & Wilmer,
2011; Sylwester, Lyons, Buchanan, Nettle, & Roberts, 2012).
Given the existence of other instruments for measuring individual
differences in adult social cognitive skills, it is important to appreciate
the unique contribution of the Eyes Test. To date, we are not aware of
another instrument that examines mental state understanding based only
on “reading the eyes.” The ability to glean social information specifically
from the eye region plays a critical role in typical development (e.g.,
Frischen, Bayliss, & Tipper, 2007), just as poor use of the eye region has
been implicated in autism (e.g., Pelphrey, et al., 2002). In both typical
and atypical development, the role of attention to the eye region is
evident in the first year of life (Farroni, Csibra, Simion, & Johnson,
2002). In the case of autism, which is characterized by difficulty using
the eye region (Baron-Cohen, Campbell, Karmiloff-Smith, Grant, &
Walker, 1995), reduced attention to the eye region has also been
identified among siblings of individuals with autism (i.e., an aspect of the
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 125
broader autism phenotype and a potential endophenotype, Dalton,
Nacewicz, Alexander, & Davidson, 2006). The eyes are particularly
important targets for the social perceiver as they convey both emotion (as
in anger or sadness) as well as information about desire and intention
(i.e., direction of gaze). Indeed, discounting language, the eye region
plays a unique and central role in conveying social cognitive information
(Frischen, Bayliss, & Tipper, 2007).
The present investigation asks whether the female superiority in the
Eyes Test, obtained in the original Baron-Cohen studies (1997, 2001),
emerges as a real effect across a meta-analytic investigation of studies
involving neurotypical adult samples. Our review above suggests that the
clearest evidence for female superiority has been observed in relatively
more implicit tasks (e.g., face emotion reading). On the other hand,
results from relatively more explicit tasks (e.g., Faux Pas), have been
equivocal. We hypothesize that females will show a real advantage on
the Eyes Test, given its emphasis on a relatively implicit social-
perceptual analysis. We do wish to emphasize that the degree to which
this instrument or others relies on relatively more or less implicit or
explicit processes remains hypothetical. The important point for our
hypothesis is that this task clearly requires a perceptual analysis of facial
expression which differentiates it from the more explicit tasks that do not
involve any face emotion perception.
Four moderator variables were examined in the current study testing
the gender effect across (a) language of test administration; (b) country;
(c) group of researchers; (d) and reported versus unreported data. The
first moderator variable examined gender differences on the Eyes Test by
language of administration. We chose to examine English (the original
version of the Eyes Test) compared to the other languages in order to
investigate if a female advantage on the Eyes Test is an artifact of the
English version or if it is a real effect when translated. For our second
moderator variable, we examined studies from the United Kingdom (UK)
versus studies conducted in all other countries (Argentina, Australia,
Canada, Chile, Germany, Hungary, Ireland, Italy, and United States) in
order to test if the gender effect is homogeneous outside of the UK where
the Eyes Test was developed. Our third moderator variable examined the
impact of the group of researchers (Baron-Cohen versus all others) in
order to test for bias (i.e., overinflated gender effect in favor of females)
in Baron-Cohen’s studies. While we have no reason to suspect
methodological bias has been introduced into Baron-Cohen’s studies,
experimenter effects (i.e., the outcome of a study is an artifact of the
researcher’s expectation) have been documented and may be introduced
126 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
during the research process (Rosenthal, 1966). Lastly, we tested the
moderator of reported versus unreported data (i.e., data sent after our
request). The traditional notion of publication bias (i.e., studies are more
likely to be published when results are statistically significant; Rothstein,
Sutton, & Borenstein, 2005) is a less serious problem in meta-analyses
examining gender differences (Eagly & Wood, 1991; Hall, 1984). That
is, articles are being published regardless of whether or not a gender
effect is found. Instead, an issue of importance in meta-analyses
examining gender differences is the problem of a lack of gender data
available. In the majority of the Eyes Test studies, gender differences are
not the primary research question of interest. In fact, in many studies it
may be the case that Eyes Test performance by gender was not examined
at all (e.g., Bodden, et al., 2010). In order to examine whether articles
reporting gender data on the Eyes Test are overestimating the gender
effect, we conducted a moderator analysis with reported versus
unreported data as a variable.
Several steps were taken to obtain published and unpublished studies
utilizing the Eyes Test. First, an exhaustive literature search was
conducted using the following databases: Academic Search Premier,
ERIC, PsycINFO, Medline, CINAHL (EBSCO Host), PAIS International
(CSA), ProQuest Dissertation & Theses, Scirus, and Sherpa. The
following key words were used: Reading the Mind in the Eyes Test, Eyes
Test, and Eyes Task. Second, using Social Sciences Citation Index, we
examined each study that cited the revised version of and the original
version of the Eyes Test (403 and 279 studies, respectively). Third, we
examined references of major publications studying theory of mind,
including several meta-analyses. The cutoff date for obtaining studies
was November 1
In order to develop a coding system, we first examined forty articles
that included the Eyes Test. Decisions regarding which variables to code
were made based on this initial literature review. Four researchers served
as coders. The final studies included in the analysis were coded a second
time to ensure no errors were made. When discrepancies between the two
coding forms were found, the original study was examined and the
correct information was recorded. Thus, prefect agreement was achieved.
Information was coded for two purposes, to calculate the effect and to
identify moderator variables. To calculate effect size we recorded (a)
sample size for each gender, (b) means and standard deviations for each
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 127
gender, and (c) the values and degrees of freedom for the t or F test if
reported. The effect was calculated from the means and standard
deviations of the total number of correct items whenever possible. When
means and standard deviations were not available effect size was
calculated based on a t or F test.
The following variables were coded for use as moderator variables:
(a) language of the Eyes Test administrated; (b) country; (c) group of
researchers; (d) reported versus non-reported data. Other potential
moderator variables were not included in the study due to insufficient
reporting or variability.
Criteria for Inclusion
The current meta-analysis is limited to the following inclusion
criteria: (a) studies published in English; (b) studies using the revised
version of the Eyes Test (Baron-Cohen, et al., 2001); (c) adult
participants free from psychiatric or developmental disorders; (d)
participants must be independent groups with no overlapping samples.
Only the revised version of the Eyes Test was included in the current
meta-analysis because it includes a balance of male and female stimuli
while the original version had more female than male faces, which may
have biased the test. Furthermore, the revised version has been
adequately sensitive to detect individual differences in healthy adults
(Baron-Cohen & Hammer, 1997) due to a broader range of scores (36
items versus 25 and four choices versus of two), and thus provides
opportunity to examine differential performance by gender. We included
only adult participants due to the low number of studies including
children samples (six studies had children samples). Finally, only healthy
adult samples were included in order to examine gender differences
without the confounding effects of male and female differences in
developmental and psychological disorders.
The literature search revealed 259 studies using the Eyes Test. The
majority of the studies did not report descriptives by gender. In each of
these cases, a minimum of two emails was sent asking for relevant data.
Responses from 49 authors produced 114 independent effect sizes from
70 studies. Sixty-four of these effect sizes (from 42 studies) did not meet
inclusion criteria. The final 40 studies produced 50 effect sizes included
in the current meta-analysis (Table 1).
Main effects and moderator analyses were calculated with Hedge’s g
as the effect size (Hedges, 1981). All analyses were calculated using
Comprehensive Meta-analysis version 2 (Borenstein, Hedges, Higgins, &
Rothstein, 2005). Main effects were examined using both fixed and
128 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
random effects models. A fixed effect model assumes between-study
effect sizes vary only due to sampling error. The random effects model
allows for study variation due to both sampling error plus additional
We used a Q statistic to examine heterogeneity. The Q-test of
significance is sensitive to identifying the ratio of true heterogeneity
across studies to the within-study random error (Borenstein, Hedges,
Higgins, & Rothstein, 2009). The magnitude of heterogeneity was tested
using the I² statistic, which examines the proportion of observed variance
that reflects true effect size variation across studies (Higgins &
Thompson, 2002). The moderator variables were investigated using a
fixed effect model.
Four methods were used to investigate potential publication bias.
First, we produced a funnel plot (see figure 1); effect size is plotted on
the horizontal axis and sampling error on the vertical axis (Light &
Pillemer, 1984). As sampling error decreases, observed effect sizes
should center (become more narrow) around the mean; as sampling error
increases, observed effect sizes should widen symmetrically (this is
based on the assumption that as sample size increases sampling error
Second, the correlation of standardized effect sizes and their
variances (standard errors) was calculated based on Kendall’s Tau rank
correlation coefficient (Begg & Berlin, 1988). Tau is interpreted as a
normal correlation; large correlations indicate a relationship between the
effect sizes and their variances, while correlations close to zero indicate
no such relationship exists; non-significant correlations (i.e., that are
close to zero) suggest that publication bias is not present.
Third, we used ‘trim and fill’ (Duval, 2005) with the index L to
estimate the number of ‘unfound’ studies based on asymmetry of a
funnel plot of the data. If asymmetry is found, an estimated mean effect
size and variance may be calculated as if ‘unfound’ studies were
included. ‘Trim and fill’ may also be seen as a form of sensitivity
Fourth, we calculated Rosenberg’s fail-safe number (Rosenberg,
2005), which is an advanced form of Rosenthal’s file-drawer number
(Rosenthal, 1979) and more consistent with modern meta-analyses.
Rosenberg’s fail-safe number is calculated using study variance and,
therefore, is a weighted estimate of the number of additional studies with
an effect of zero that would drop the reported mean difference to a
nonsignificant level (p > .05).
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 129
A small but significant effect in favor of females was found on the
Eyes Test. Using both fixed and random effects models the mean
weighted effect size was g = .177, p < .001, with a 95 percent confidence
interval of .115 to .242. Using a fixed effect model, the test of
homogeneity was not significant (Q = 42.449, p = .734, df=49, I² = .000)
indicating no evidence was present to suggest that the effect sizes across
studies were not homogeneous.
We tested four moderator variables: (a) language of test
administration; (b) country (c) group of researchers; (d) and reported
versus unreported data. None of the moderator analyses were significant
TABLE 1: Moderator Analyses
Note: k = number of studies.
Assessing Publication Bias
The funnel plot (Figure 1) displayed a fairly symmetrical scatterplot
with more studies centering around the mean effect size as sampling error
increases. Smaller studies were spread across the lower portion of the
graph, indicating that studies were not missing from the analysis (smaller
sample sizes have more sampling error). A trim-and-fill analysis was
attempted but no asymmetry was found.
Kendall’s Tau with and without continuity correction was not
significant (τ = -.049, z= .501, p= .307; and τ = -.050, z= .510, p= .305
respectively using a 1-tailed test). Rosenberg’s fail-safe (Rosenberg,
Language of Eyes Test
All other Languages
.024 1 .876
Country of Study
All other countries
.123 1 .726
Group of researchers
.270 1 .603
Reported in article
Sent from authors
.436 1 .509
130 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
2005) N was 348, suggesting that an additional 348 studies with an effect
of zero would be needed to reduce the results of the current meta-analysis
to a nonsignificant level. This result is well above 210, which is the fail-
safe number that is considered robust (i.e., greater than 5N + 10, where N
is the original number of studies; Rosenthal 1991). In other words, these
results suggest that it is highly unlikely that enough studies exist with no
significant male/female differences to cancel out the small female
advantage seen in the current study. Note, however, fail-safe statistics do
not apply to the situation where there are a few large studies showing
male superiority. Collectively, the methods investigating publication bias
failed to provide any evidence that publication bias was present in the
FIGURE 1: Funnel Plot
The current meta-analysis found a small mean effect size (g=.177) in
favor of females over males on the Eyes Test. This effect is considered
small (Cohen, 1988), suggesting that 57% of the males would fall below
the mean of females and that 87% of the distributions for males and
females would overlap. It should be noted, however, that even with the
small effect sizes seen in this study, substantial gender differences may
be seen in the upper and lower ends of the distribution of scores (Hedges
& Nowell, 1995; Martell, Lane & Emrich, 1996). The confidence interval
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Standard Difference Between the Means
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 131
(.114 to .240) suggests that the true gender effect falls within a relatively
narrow range indicating little variance is present in the estimated effect
size. Notably, the confidence interval does not contain zero, suggesting
that gender differences on the Eyes Test never drops to nonsignificance.
It is important to emphasize that previous studies reported mixed gender
results, with some studies reporting either no gender effects (e.g., Mar,
Oatley, Hirsh, dela Paz, & Peterson, 2006) or a male advantage (e.g.,
Courture, Penn, Addington, Woods, Perkins, 2008) on the Eyes Test and
thus, prior to the current study it had not been established that the female
advantage as reported by Baron-Cohen’s studies was reliable. The results
of the current meta-analysis indicate that the female advantage is indeed
a real effect and therefore, we have substantiated that on average women
perform slightly better than men.
The Q statistic for heterogeneity was not significant, suggesting the
gender differences on the Eyes Test across studies are homogeneous and
a single population effect size can be estimated from the data.
Furthermore, the I² value of zero indicates that there is no variability due
to true heterogeneity among studies (i.e., the only variance in effect sizes
across studies is due to sample-level error).
We explored four moderator analyses, none of which produced
significant effects. The first two moderator variables, language of the
Eyes Test administration and country, were homogeneous indicating that
the female advantage on the Eyes Test is stable across language and
country. The magnitude of gender differences has been found to vary
across cultures, such that research in individualistic cultures (which place
greater value on personal independence) often produce larger gender
differences than in collectivistic cultures (which place greater value on
groups; Fernández, Cerrera, Sánchez, Paez, & Cania, 2000; Fischer &
Manstead, 2000; Guimond, 2008). It should be noted that the current
meta-analysis included primarily individualistic cultures (Australia,
United Kingdom, United States, Germany, Canada and Ireland) that
share a common cultural context.
No evidence suggests that Baron-Cohen’s studies have produced
inflated effect sizes. There is no significant difference between articles
that reported data compared to articles that did not report gender data on
the Eyes Test (i.e., the studies in which authors sent us data), suggesting
that studies with unreported gender data are not biasing the results. This
is an important finding since the majority of studies did not report gender
data. In summary, the current meta-analysis failed to display patterns of
moderating effects regarding the female advantage on the Eyes Test. The
nonsignificant moderator analyses suggest a fixed effect model is
appropriate (i.e., variation among studies is due to sampling error).
132 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
One important issue which cannot be addressed in the current study is
whether the low effect size is the magnitude of differences in the true
latent trait or whether the Eyes Test underestimates the true female
superiority. The effect size reported in this meta-analysis is most likely
an underestimation of the true effect due to imperfect reliability of the
instrument. That is, imperfect instrument reliability will attenuate the true
effect in the latent trait (Hunter & Schmidt, 2004). Only two studies
included in the current meta-analysis reported internal reliability of the
Eyes Test; Cronbach’s alpha was .60 (Mar, Oatley, Hirsh, dela Paz, &
Peterson, 2006) and .48 (Meyer & Shean, 2006), which is considered
poor and unacceptable, respectively (Cronbach, 1951). These low
reliability coefficients suggest considerable measurement error is present.
Unfortunately, we could not correct for measurement error since little is
known about the psychometric properties of the Eyes Test.
The female superiority on the Eyes Test is consistent with the superior
performance of females over males in nonverbal behavior (e.g., Hall,
1978; Schmid, Schmid Mast, Bombari, & Mast, 2011). In a meta-
analysis of gender differences in decoding non-verbal cues (Hall, 1978),
the effect size for visual decoding was .32 and .40 (Cohen’s d) across all
modalities. Compared to these early findings, the effect size indicating
female superiority for the Eyes Test is low. As mentioned above, poor
reliability has likely contributed to the underestimation of the true effect.
Another potential reason for the relatively lower effect size obtained in
the Eyes Test may reflect the reduced information available to the
perceiver (i.e., static display of eyes only). In a meta-analysis examining
developmental gender differences in facial expression processing
throughout development, McClure (2000) found a statistically significant
weighted mean effect size of .13 (Cohen’s d; p < .05) in favor of female
children and adolescents over males, a comparable effect size albeit with
a developmental sample. The findings from the present meta-analysis are
interesting to consider in light of two different but related literatures:
gender differences in the neuropsychology and physiological basis of
social cognition in normal development; gender differences in the
development of disorders involving social cognition. We discuss these
literatures in turn.
The results of the current meta-analysis correspond with the literature
that women are more accurate and faster at facial emotion recognition.
For instance, Hall, Hutton, and Morgan (2010) found a positive
relationship between eye contact and both accuracy and speed of facial
expression recognition using eye-tracking technology. Hall et al.
discussed these findings as support for the hypothesis that females have
an increased tendency to pay attention to the eyes, which is associated
with enhanced facial expression recognition. It may be that the female
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 133
superiority at reading eyes contributes to women’s advantage in reading
emotion expressions (Biehl, Matsumoto, Ekman, Hearn, Heider, Kudoh,
& Ton, 1997; Hall & Matsumoto, 2004; Hoffman, Kessler, Eppel,
Rukavina, & Traue, 2010) as well as women’s enhanced empathizing
Baron-Cohen & Wheelwright, 2004; Eisenberg & Fabes, 1998;
Lawrence, Shaw, Baker, Baron-Cohen, & David, 2004; Rueckert &
Naybar, 2008; Toussaint & Webb, 2005).
The underlying mechanisms mediating gender differences in social
cognitive abilities are unknown, though at least two hormones have been
implicated: oxytocin and testosterone. In a review of studies examining
the effects of oxytocin on social behavior, McDonald and MacDonald
(2010) refer to this neuropeptide as a “social peptide” (p.1) because of its
ameliorating effects on social behavior and its potential as a therapeutic
intervention for disorders with social deficits. Domes, Heinrichs, Michel,
Berger, and Herpertz (2006) gave 30 healthy men a single dose of
oxytocin or a placebo intranasally 45 minutes before they took the Eyes
Test. Participants in the oxytocin condition demonstrated significantly
improved results compared to participants in the placebo condition. In
contrast to oxytocin, administration of testosterone has been shown to
reduce empathic behavior in women (Hermans, Putman, & van Honk,
2006). In a double blind experimental study, van Honk and Schutter
(2007) found that a single dose of testosterone significantly reduced
women’s ability to recognize angry facial expressions. Chapman, Baron-
Cohen, and Auyeung (2006) demonstrated in a developmental sample
that prenatal testosterone, presumed to influence brain development, was
negatively associated with both empathy and with performance on the
child version of the Eyes Test.
A number of researchers have examined the degree to which basic
social cognitive processes such as face emotion reading may be
associated with the etiology of developmental disorders characterized by
poor social cognition (e.g., autism spectrum disorders and conduct
disorder). Although we can only speculate as to when in development a
female advantage on the Eyes Test emerges, our review of the literature
suggests that gender differences in relatively implicit social perceptual
processes, in particular attention to the eye region, emerge very early in
development. In the case of the autism spectrum disorders, Schultz
(2005) has suggested that reduced attention to the eye region brings about
a cascade of social cognitive deficits that contribute to the development
of autism. Baron-Cohen's extreme male-brain hypothesis of autism
(Baron-Cohen, 2002) provides a framework for considering our findings
both with respect to the development of autism and the emergence of
subclinical gender differences in social cognition. In this framework,
autism is an extreme manifestation of a cognitive phenotype consisting of
134 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
a deficit in empathizing and an enhanced systemizing system (the drive
to construct and analyze nonhuman systems). Baron-Cohen (2002) has
demonstrated the predicted gender asymmetry in empathizing and
systemizing in both typical and atypical samples. Since then, at least two
investigations (Billington, Baron-Cohen, & Wheelwright, 2007;
Wakabayashi, Sasaki, & Ogawa, 2012) have yielded support for a
relationship between cognitive style (i.e., empathizing versus
systemizing) and Eyes Task performance. Billington and colleagues
(2007) have shown that an empathizing or systemizing style combined
with Eyes Test performance predicts one’s propensity to choose a science
or humanities-oriented field of study. That is, students who scored high
on the empathizing quotient (Baron-Cohen & Wheelwright, 2004) and
the Eyes Test were more likely to be registered as humanities compared
to science-related majors. With a large sample size, Strong, Russell,
Germine, and Wilmer (2011) found that Eyes Test scores were
significantly higher than average for individuals in arts and entertainment
(e.g., artists, actors, writers), whereas individuals in computer/IT careers
did not score higher than average. Perhaps, in line with Schultz (2005),
decreased attention to the eye region gives rise to decreased efficiency at
gleaning social information, which contributes to gender differences in
normal and atypical development. In line with Baron-Cohen’s framework
(2002), it may be that testosterone influences the developmental
propensity to attend to the eye region, part of a cascade that influences
the emergence of both gender differences in typical and atypical social
Psychopathy, antisocial personality disorder, and conduct disorder are
often referred to as disorders of empathy due to their shared characteristic
of an impaired empathic response (Blair, 1995). Given the gender
asymmetry in these disorders (Bao & Swaab, 2010; Eme & Kavanaugh,
1995) it is intriguing to consider the long-term developmental
consequences of the relatively increased difficulty experienced by men in
reading social information in the eyes, particularly for men who fall in
the left tail of the distribution. It is possible that being in the left tail of
social cognitive processes that mediate Eyes Test performance confers
risk for a range of developmental disorders.
There are at least three limitations to the current meta-analysis. First,
the psychometrics of the Eyes Test are poor and, as discussed earlier,
imperfect instrument reliability will attenuate the true effect in the latent
trait (Hunter & Schmidt, 2004). Due to these low reliability coefficients,
it is likely that the Eyes Test underestimates the gender effect. A second
limitation in the current meta-analysis is the low number of studies in
some of the cells for the moderator analyses, especially in regards to first
and third moderators: language of the Eyes Test (38 English versus 12
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 135
other language effect sizes) and research group (eight Baron-Cohen
versus 42 other author effect sizes). Finally, in order to make appropriate
comparisons (i.e., not confounding results with issues relating to
characteristics of developmental or mental disorders), only 50 of the Eyes
Test studies were retained in the current meta-analysis (42 studies did not
meet inclusion criteria).
In the future, it would be interesting to explore this gender finding
from two perspectives. Emotion recognition accuracy increases when
there is a match between the culture of a participant and the person
expressing the emotion (Elfenbein & Ambady, 2002). For example,
Adams and colleagues (2009) used the Eyes Test to examine the accuracy
of intra- versus other-culture accuracy of decoding mental states from
photographs of eyes and found an advantage for decoding mental states
from intra-culture eye stimuli (i.e., photos of eyes from the same culture).
We suggest that future research examines performance by gender of both
in-group and out-group members in order to investigate if an interaction
between gender and group (in-group versus out-group) exists.
Furthermore, in order to further investigate the hypothesis regarding
gender differences in individualistic cultures compared to collectivistic
cultures (Fischer & Manstead, 2000; Oyserman, Coon, & Kemmelmeier,
2002) we recommend examining how men and women perform on the
Eyes Test in countries that score high in collectivism.
In an earlier section, we considered the possibility that being in the
left tail of the distribution, where a gender performance asymmetry is
more pronounced, may confer etiologic risk for disorders involving
social cognition. However, at present, we can only speculate as to
whether the absolute magnitude of one's deficit, as indexed by Eyes Test
performance, is a more important predictor or one's relative performance
within gender. It is interesting to consider the possibility that being in the
left tail of the distribution confers relatively more etiologic risk for
women relative to men. In other words, while we assume a deficit in
aspects of social cognition measured by the Eyes Test confers risk for
both males and females, it may be that the risk is greater for females, for
whom such a development is more rare. Alternatively, being in the left
tail of the distribution may confer equal risk for males and females, and
the gender asymmetry of disorders of social cognition may simply reflect
the fact that fewer females are in the left tail.
Presently, we can only
speculate as to which alternative is correct. Future research comparing
outcomes of males and females could address the question of whether the
Eyes Test should be normed by gender.
In sum, our findings are consistent with a female advantage in social
cognitive processes. It may be the case that, beginning in infancy and
continuing across development, the additional time females spend
136 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
attending to the eye region of the face (Lutchmaya, Baron-Cohen, &
Raggatt, 2002) as well as the female advantage for face processing, gives
rise to the superior ability females demonstrate on the Eyes Test. A
parsimonious interpretation of the current meta-analysis suggests that
females show a small advantage in gleaning mental state information
from interpreting facial expressions conveyed by the eyes.
Abell, F., Happ, F., & Frith, U. (2000). Do triangles play tricks? Attribution of
mental states to animated shapes in normal and abnormal development.
Cognitive Development, 15, 1-16. doi: 10.1016/S0885-2014(00)00014-9
Adams, R. B., Rule, N. O., Franklin, R. G., Wang, E., Stevenson, M. T.,
Yoshikawa, S., . . .Ambady, N. (2009). Cross-cultural Reading the Mind in
the Eyes: An fMRI investigation. Journal of Cognitive Neuroscience, 22(1),
Ahmed, F. S., & Miller, S. (2011). Executive function mechanisms of theory of
mind. Journal of Autism and Developmental Disorders, 41, 667-678.
Alwall, N., Johansson, D., & Hansen, S. (2010). The gender difference in gaze-
cueing: Associations with empathizing and systemizing. Personality and
Individual Differences, 49, 729-732. doi:10.1016/j.paid.2010.06.016
Argyle, M., & Ingham, R. (1972). Gaze, mutual gaze, and proximity. Semiotica,
1, 32-49. doi:10.1515/semi.19184.108.40.206
Bailey, P. E., & Henry, J. D. (2008). Growing less empathetic with age:
Disinhibition of the self-perspective. The Journal of Gerontology, 63B(4),
Bailey, P. E., Henry, J. D., & Von Hipple, W. (2008). Empathy and social
functioning in late adulthood. Aging & Mental Health, 12(4), 499-503.
Bao, A., & Swaab, D. F. (2010). Sex differences in the brain, behavior, and
neuropsychiatric disorders. The Neuroscientist, 16(5), 550-565. doi:10.1177/
Baron-Cohen, S. (2002). The extreme-male-brain theory of autism. Trends in
Cognitive Sciences, 6, 248-254. doi:10.1016/S1364-6613(02)01904-6
Baron-Cohen, S., Campbell, R., Karmiloff-Smith, R., Grant, J., & Walker, J.
(1995). Are children with autism blind to the mentalistic significance of the
eyes? British Journal of Developmental Psychology, 13, 379–398.
Baron-Cohen, S., & Hammer, J. (1997). Is autism an extreme form of the male
brain? Advances in Infancy Research, 11, 193-217.
Baron-Cohen, S. Jolliffe, T., Mortimore, C., & Robertson, M. (1997). Another
advanced test of theory of mind: Evidence from very high functioning adults
with autism or Asperger syndrome. Journal of Child Psychology and
Psychiatry, 38(7), 813-822. doi:10.1111/j.2044-835X.1995.tb00687.x
Baron-Cohen, S., O’Riordan, M., Jones, R., Stone, V., & Plaisted, K. (1999). A
new test of social sensitivity: Detection of faux pas in normal children and
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 137
children with Asperger Syndrome. Journal of Autism and Developmental
Disorders, 29, 407–418. doi:10.1023/A:1023035012436
Baron-Cohen, S., Ring, H., Chitnis, X., Wheelwright, S., Gregory, L., Williams,
S., . . . Bullmore, E. (2006). fMRI of parents of children with Asperger
syndrome: A pilot study. Brain and Cognition, 61, 122-130. doi:10.1016/
Baron-Cohen, S., & Wheelwright, S. (2002). The Empathy Quotient (EQ). An
investigation of adults with Asperger Syndrome or High Functioning Autism,
and normal sex differences. Journal of Autism and Developmental Disorders,
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The
“reading the mind in the eyes” test revised version: A study with normal
adults, and adults with Asperger syndrome or high functioning autism.
Journal of Child Psychology and Psychiatry, 42, 241-251. doi:10.1111/1469-
Bayliss, A. P., Pellegrino, G., & Tipper, S. P. (2005). Sex differences in eye gaze
and symbolic cueing of attention. The Quarterly Journal of Experimental
Psychology, 58A(4), 631-650.
Begg, C. B., & Berlin, J. A. (1988). Publication bias: A problem in interpreting
medical data. Journal of the Royal Statistical Society, Series, A, 151, 419-463.
Billington, J., Baron-Cohen, S., & Wheelwright, S. (2007). Cognitive style
predicts entry into physical sciences and humanities: Questionnaire and
performance tests of empathy and systemizing. Learning and Individual
Differences, 17, 260-268. doi:10.1016/j.lindif.2007.02.004
Biehl, M., Matsumoto, D., Ekman, P., Hearn, V., Heider, K., Kudoh, T., & Ton,
V. (1997). Japanese and Caucasian facial expressions of emotions: Reliability
data and cross-national differences. Journal of Nonverbal Behavior 21(1), 3-
Blair, R. J. (1995). A cognitive developmental approach to morality:
Investigating the psychopath. Cognition, 57, 1-29. doi:10.1016/0010-
Bodden, M. E., Mollenhauer, B., Trenkwalder, C., Cabanel, N., Eggert, K. M.,
Unger, M., M., . . . Kalbe, E. (2010). Affective and cognitive theory of mind
in patients with partkinson’s disease. Parkinsonism & Related Disorders,
16(7), 466-470. doi:10.1016/j.parkreldis.2010.04.014
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2005). Comprehensive
meta-analysis (Version 2) [Computer software]. Englewood, NJ: Biostat.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to
meta-analysis.West Sussex, UK: John Wiley & Sons, Ltd.
Camargo, M. A. (2007). Hypothesized fitness indicators and mating success.
(Thesis, State University of New York at New Paltz).
Carroll, J. M., & Yung, C. K. (2006). Sex and discipline differences in
empathising, systemising and autistic symptomatology: Evidence from a
student population. Journal of Autism and Developmental Disorders, 36(7),
138 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
Chakrabarti, B., Dudbridge, F., Kent, L., Wheelwright, S., Hill-Cawthorne, G.
Allison, C.,. . . Baron-Cohen, S. (2009). Genes related to sex steroids, neural
growth, and social-emotional behavior are associated with autistic traits,
empathy, and Asperger syndrome. Autism Research, 2(3), 157-177.
Chapman, E., Baron-Cohen, S., Auyeung, B., Knickmeyer, R., Taylor, K., &
Hackett, G. (2006). Fetal testosterone and empathy: Evidence from the
Empathy Quotient (EQ) and the “Reading the Mind in the Eyes” test. Social
Neuroscience, 1 (2), 135 – 148. doi:10.1080/17470910600992239
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2
edition). Hillsdale, NJ: Earlbaum.
Connellan, J., Baron-Cohen, S., & Wheelwright, S. (2001). Sex differences in
human neonatal social perception. Infant Behavior and Development, 23,
Courture, S. M., Penn, D. L., Addington, J., Woods, S. W., & Perkins, D. O.
(2008). Assessment of social judgements and complex mental states in the
early phases of psychosis. Schizophrenia Research, 100, 237-241.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.
Psychometrika, 16(3), 297-334. doi:10.1007/BF02310555
Dalton, K. M., Nacewicz, B. M., Alexander, A. L., & Davidson, R. J. (2007).
Gaze- fixation, brain activation, and amygdala volume in unaffected siblings
of individuals with autism. Biological Psychiatry, 61, 512 – 520.
de Achaval, D., Costanzo, E. Y., Vilarreal, M., Jauregui, I. O., Chiodi, A., Castro,
M. N., . . . Guinjoan, S. M. (2010). Emotion processing and theory of mind in
schizophrenia patients and their unaffected first-degree relatives.
Neuropsychologia,48, 1209-1215.doi:10.1016/j.neuropsychologia. 2009.12.
Deaner, R. O., Shepherd, S. V., & Platt, M. L. (2007). Familiarity accentuates
gaze cueing in women but not men. Biology Letters, 3, 64-67.
Domes, G., Heinrichs, M., Michel, A., Berger, C., & Herpertz, S. C. (2006).
Oxytocin improves “Mind-Reading” in humans. Biological Psychiatry, 61(6),
Duval, S. J. (2005). The Trim and Fill Method. In H. R. Rothstein, A. J. Sutton &
M. Borenstein (Eds.) Publication bias in meta-analysis: Prevention,
assessment, and adjustments (pp. 127–144). Chichester, England: Wiley.
Dziobek, I., Fleck, S., Kalbe, E., Rogers, K., Hassenstab, J., Brand, . . . Convit,
A. (2006). Introducing the MASC: A move for the Assessment of Social
Cognition. Journal of Autism and Developmental Disorders, 36, 623-
Eagly, A. H., & Wood, W. (1991). Explaining sex differences in social behavior:
A meta-analytic perspective. Personality and Social Psychology Bulletin, 17,
Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural
specificity of emotion recognition: A meta-analysis. Psychological
Bulletin, 128(2), 203-235. doi:10.1037/0033-2909.128.2.203
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 139
Eme, R. F., & Kavanaugh, L. (1995). Sex differences in conduct disorder.
Journal of Clinical Child Psychology, 24, 406–426. doi:10.1207/
Euteneuer, F., Schaefer, F., Stuermer, R., Boucsein, W., Timmermann, L., Barbe,
M. T.,…Kalbe, E. (2009). Dissociation of decision-making under ambiguity
and decision-making under risk in patients with Parkinson’s disease:
neuropsychological and psychophysiological study. Neuropsychologia, 47,
Farroni, T., Csibra, G., Simion, F., & Johnson, M. H. (2002). Eye contact
detection in humans from birth. Proceedings of the National Academy of
Science, 99(14), 9602-9605. doi:10.1073/pnas.152159999
Ferguson, F. J., & Austin, E. J. (2010). Associations of trait and ability emotional
intelligence with performance on theory of mind tasks in an adult sample.
Personality and Individual Differences, 49(5), 414-418. doi:10.1016/
Fernández, I., Cerrera, P., Sánchez, F., Paez, D., & Candia, L. (2000).
Differences between cultures in emotional verbal and non-verbal reactions.
Psicothema, 12(Supplement), 83-92.
Fischer, A. H., & Manstead, S. R. (2000). The relation between gender and
emotions in different cultures. In A. H. Fischer (Ed.), Gender and emotion:
Social psychological perspectives (pp 71-94). Paris: Cambridge University
Franklin, R. G., & Adams, R. B. (2010). What makes a face memorable? The
relationship between face memory and emotional state reasoning. Personality
and Individual Differences, 49(1), 8-12. doi:10.1016/j.paid.2010.02.031
Frischen, A., Bayliss, A. P., & Tipper, S. P. (2007). Gaze cueing of attention:
Visual attention, social cognition, and individual differences. Psychological
Bulletin, 133(4), 694-724. doi:10.1037/0033-2909.133.4.694
Garrido, L., Furl, N., Draganski, B., Weiskopf, N., Stevens, J., Tan, C. G., . . .
Duchaine, B. (2009). Voxel-based morphometry reveals reduced grey matter
volume in the temporal cortex of developmental prosopagnosics. Brain, 132,
Golan, O., Baron-Cohen, S., Hill, J. (2006). The Cambridge Mindreading (CAM)
Face-Voice Battery: Testing complex emotion recognition in recognition in
adults with and without Asperger Syndrome. Journal of Autism and
Developmental Disorders, 36(2), 169-183. doi:10.1007/s10803-005-0057-y
Golan, O., Baron-Cohen, S., Hill, J. J., & Rutherford, M. D. (2007). The ‘reading
the mind in the voice’ test-revised: a study of complex emotion recognition in
adults with and without autism spectrum conditions. Journal of Autism
Developmental Disorders, 37, 1096-1106. doi:10.1007/s10803-006-0252-5
Gooding, D. C., Johnson, M., & Peterman, J. S. (2010). Schizotypy and altered
digit ratios: A second look. Psychiatry Research, 178, 73-78. doi:10.1016/
Grisham, J. R., Henry, J. D., Williams, A. D., & Bailey, P. E. (2010).
Socioemotional deficits associated with obsessive-compulsive symp-
tomatology. Psychiatry Research, 175, 256-259. doi:10.1016/ j.psychres.
140 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
Guimond, S. (2008). Pyschological similarities and differences between women
and men across cultures. Social and Personality Psychology Compass, 2(1),
Hall, J. A. (1978). Gender effects in decoding nonverbal cues. Psychological
Bulletin, 85(4), 845-857. doi:10.1037/0033-2909.85.4.845
Hall, J. A. (1984). Nonverbal sex differences: Communication accuracy and
expressive style. Baltimore, MD: The John Hopkins University Press.
Hall, J. K., Hutton, S. B., & Morgan, M. J. (2010). Sex differences in scanning
faces: Does attention to the eyes explain female superiority in facial
expression recognition? Cognition and Emotion, 24(4), 629-637.
Hall, J. A., & Matsumoto, D. (2004). Gender differences in judgments of multiple
emotions from facial expressions. Emotion, 2, 201-206. doi:10.1037/1528-
Happé, F. G., Brownell, H., & Winner, E. (1999). Acquired ‘theory of mind’
impairments following stroke. Cognition, 70, 211-240. doi:10.1016/S0010-
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and
related estimates. Journal of Educational Statistics, 6, 107-128.
Hedges, L. V., & Nowell, A. (1995, July 7). Sex differences in mental test scores,
variability, and numbers of high-scoring individuals. Science, 269, 41–45.
Hermans, E. J., Putman, P., & van Honk J. (2006). Testosterone administration
reduces empathetic behavior: a facial mimcry study. Psychoneuro-
endocrinology, 31(7), 859-866. doi:10.1016/j.psyneuen.2006.04.002
Higgins, J. P. T., & Thomson, S. G. (2002). Quantifying heterogeneity in a meta-
analysis. Statistics in Medicine, 21, 1539-1558.
Hittelman, J. H., & Dickes, R. (1979). Sex differences in neonatal eye contact
time. Merrill Palmer Quarterly, 25, 171-184.
Hoffmann, H., Kessler, H., Eppel, T., Rukavina, S., & Traue, H. C. (2010).
Expression intensity, gender and facial emotion recognition: Women
recognize only subtle facial emotions better than men. Acta
Psychologica, 135, 278-283. doi:10.1037/0033-2909.84.4.712
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting
error and bias in research findings (2nd ed.). Newbury Park, CA: Sage.
Irani, F., Platek, S. M., Panyavin, I. S., Calkins, M. E., Kohler, C., Siegle, . . .
Gur, R. C. (2006). Self-face recognition and theory of mind in patients with
schizophrenia and first-degree relatives. Schizophrenia Research, 88, 151-
Krych-Appelbaum, M., Law, J. B., Jones, D., Barnacz, A., Johnson, A., &
Keenan J. P. (2007). “I think I know what you mean” The role of theory of
mind in collaborative communication. Interaction Studies, 8(2), 267-280.
Lee, S. A., Guajardo, N. R., & Short, S. D. (2010). Individual differences in
ocular level empathetic accuracy ability: The predictive power of fantasy
empathy. Personality and Individual Differences, 49(1), 68-71.
Leeb, R. T., & Rejskind, F. G., (2004). Here’s looking at you kid! A Longitudinal
study of perceived gender differences in mutual gaze behavior in young
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 141
infants. Sex Roles, 50(1/2), 1-14. doi.org/10.1023/B: SERS.
Lewin, C., & Herlitz, A. (2002). Sex differences in face recognition-Women’s
face make the difference. Brain and Cognition, 50, 121-128. doi:10.1016/
Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing
research. Cambridge, MA: Harvard University Press.
Losh, M., & Piven, J. (2007). Social-cognition and the broad autism phenotype:
Identifying genetically meaningful phenotypes. Journal of Child Psychology
and Psychiatry, 48(1), 105-112. doi:10.1111/j.1469-7610. 2006.01594.x
Lutchmaya, S., Baron-Cohen, S., & Raggett, P. (2002). Fetal testosterone and eye
contact at 12 months. Infant Behavior and Development, 25, 327–335. doi:10.
MacDonald, K., & MacDonald, T. M. (2010). The peptide that binds: A
systematic review of oxytocin and its prosocial effects in humans. Harvard
Review of Psychiatry, 18(1), 1-21. doi:10.3109/10673220903523615
Mar, R. A., Oatley, K., Hirsh, J., dela Paz, J., & Peterson, J. B. (2006).
Bookworms versus nerds: Exposure to fiction versus non-fiction, divergent
associations with social ability, and the simulation of fictional social worlds.
Journal of Research in Personality, 40, 694-712. doi:10.1016/
Martell, R. F., Lane, D., M., & Emrich, C. E., (1996). Male-female differences: A
computer simulation. American Psychologist, 51, 157-158. doi:10.1037/
McBain, R., Norton, D., & Chen, Y. (2009). Females excel at basic face
perception. Acta Psychologia, 130, 168-173. doi:10.1016/ j.actpsy.
McClure, E. B. (2000). Meta-analytic review of sex differences in facial
expression processing and their development in infants, children and
adolescents. Psychological Bulletin, 126(3), 424-453. doi:10.1037/0033-
McGlade, N., Behan, C., Hayden, J., O’Donoghue, T., Peel, R., Haq, F., . . .
Donohoe, G. (2008). Mental state decoding v. mental state reasoning as a
mediator between cognitive and social function in psychosis. The British
Journal of Psychiatry, 193, 77-78. doi:10.1192/bjp.bp.107.044198
Meyer, J. K. (2009). In the eye of the beholder: attachment style differences in
emotion perception. The Penn State McNair Journal, 16, 74-87.
Meyer, J., & Shean, G. (2006). Social-cognitive functioning and schizotypical
characteristics. The Journal of Psychology, 140(3), 199-207. doi:10.3200/
Montage, B., Kessels, R. P. C., Frigerio, E., de Haan, E. H. F., & Perrett, D. I.
(2005). Sex differences in the perception of affective facial expressions: Do
men really lack emotional sensitivity? Cognitive Process, 6, 136 – 141. doi:
Mundy, P., Block, J., A., Delgado, C., Pomeras, Y., Van Hecke, H., & Parlade,
M. (2007). Individual differences in the development of joint attention in
infancy. Child Development, 78, 938-954. doi:10.1111/j.1467-8624.2007.
142 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
Nettle, D., & Liddle, B. (2008). Agreeableness is related to social-cognitive, but
not social-perceptual, theory of mind. European Journal of Personality, 22,
Olafsen, K. S., Ronning, J. A., Kaaresen, P. I., Ulvund, S. E., Handegard, B. H.,
& Dahl, L. B. (2006). Joint attention in term and preterm infants at 12 months
of age: The significant of gender and intervention based on randomized
controlled trial. Infant Behavior & Development, 29, 554-563. doi:10.1016/
Oyserman, D., Coon, H. M., & Kemmelmeier, M. (2002). Rethinking
individualism and collectivism: Evaluation of theoretical assumptions and
meta-analyses. Psychological Bulletin, 128(1), 3-72. doi: 10.1037//0033-
Pardini, M., & Nichelli, P. F. (2009). Age-related decline in mentalizing skills
across adult life span. Experimental Aging Research, 35, 98-106.doi:10.1080/
Pelphrey, K. A., Sasson, N. J., Reznick, J. S., Paul, G., Goldman, B. D., & Piven,
J. (2002). Visual scanning of faces in autism. Journal of Autism and
Developmental Disorders, 32(4), 249-261. doi:10.1023/A:1016374617369
Podrouzek, W., & Furrow, D. (1988). Preschoolers’ use of eye contact while
speaking: The influence of sex, age, and conversational partner. Journal of
Psycholinguistic Research, 17, 89–98. doi:10.1007/BF01067066
Riberio, L. A., & Fearon, P. (2010). Theory of mind and attentional bias to facial
emotional expressions: A preliminary study. Scandinavian Journal of
Psychology, 51, 285-289.
Riveros, R., Hurtado, E., Escobar, M., & Ibanez, A. (2010). Context-sensitive
social cognition is impaired in schizophrenic patients and their healthy
relatives. Schizophrenia Research, 116(2-3), 297-298. doi:10.1016/
Rosenberg, M. S. (2005). The file-drawer problem revisited: A general weighted
method for calculating fail-safe numbers in meta-analysis. Evolution, 59(2),
Rosenthal, R. (1966). Experimenter effects in behavioral research. New York,
Rosenthal, R. (1979). The file drawer problem and tolerance for null results.
Psychological Bulletin, 86(3), 638-641.
Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.).
Newbury, Park, CA: Sage Publications.
Rosip, J. C., & Hall, J. A. (2004). Knowledge of nonverbal cues, gender, and
nonverbal decoding accuracy. Journal of Nonverbal Behavior, 28(4), 276-
Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication Bias in
Meta-analysis—Prevention, Assessment and Adjustments. Chicheter, UK:
Wiley & Sons, Ltd. doi:10.1002/0470870168
Russell, T. A., Tchanturia, K., Rahman, Q., & Schmidt, U. (2007). Sex
differences in theory of mind: A male advantage on happé's "cartoon" task.
Cognition and Emotion, 21(7), 1554-1564. doi:10.1080/02699930601117096
Sapienza, P., Zingales, L., & Maestripieri, D. (2009). Gender differences in
financial risk aversion and career choices are affected by testosterone.
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 143
Proceedings of the National Academy of Sciences of the United States of
America, 106(36), 15268-15273. doi:10.1073/pnas.0907352106
Schmid, P. C., Schmid Mast M., Bombari, D., & Mast, F. W. (2011). Gender
effects in information processing on a nonverbal decoding task. Sex Roles, 65,
Schultz, R. T. (2005). Developmental deficits in social perception in autism: The
role of the amygdala and fusiform face area. International Journal of
Developmental Neuroscience, 23, 125-141. doi:10.1016/j.ijdevneu. 2004.
Smeets, T., Dziobek, I., & Wolf, O. T. (2009). Social cognition under stress:
differential effects of stress-induced cortisol elevations in healthy young men
and woman. Hormones and Behavior, 55, 507-513. doi:10.1016/
Spek, A. A., Scholte, E. M., & Berckelaer-Onnnes, I. A. (2010). Theory of mind
in adults with HFA and Autism. Journal of Autism and Developmental
Disorders, 40, 280-289. doi:10.1007/s10803-009-0860-y
Spreng, R. N., McKinnon, M. C., Mar, R. A., & Levine, B. (2009). The Toronto
empathy questionnaire: Scale development and initial validation of a factor-
analytic solution to multiple empathy measures. Journal of Personality
Assessment, 91, 62-71. doi:10.1007/s10803-009-0860-y
Strong, E., Russell, R., Germine, L., & Wilmer, J. (2011). Face processing
abilities relate to career choice. Journal of Vision, 11(11), 621.
Suzuki, A., Hoshino, T., & Shigemasu, K. (2006). Measuring individual
differences in sensitivities to basic emotions in faces. Cognition, 99, 327-353.
Sylwester, K., Lyons, M., Buchanan, C., Nettle, D., & Roberts, G. (2012). The
role of theory of mind in assessing cooperative intentions. Personality and
Individual Differences, 52, 113-117. doi:10.1016/j.paid.2011.09.005
Szily, E. & Keri, S. (2009). Anomalous subjective experience and psychosis risk
in young depressed patients. Psychopathology, 42, 229-235. doi:10.1159/
Tso, I. F., Grove, T. B., & Taylor, S. F. (2010). Emotion experience predicts
social adjustment of neurocognition and social cognition in schizophrenia.
Schizophrenia Research, 122(1-3), 156- 163. doi:10.1016/j.schres.2009.
van Honk, J., & Schutter, D. J. (2007). Testosterone reduces conscious detection
of signals serving correction: Implications for antisocial behavior.
Psychological Science, 18(8), 663-667. doi:10.1111/j.1467-
Valla, J. M., Ganzel, B. L., Yoder, K. J., Chen, G. M., Lyman, L. T., Sidari, A.
P., . . . Belmonte, M. K. (2010). More than math and mindreading: Sex
differences in empathizing/systemizing covariance. Autism Research, 3, 174-
Voracek, M., & Dressler, S. (2006). Lack of correlation between digit ration (2
D;4D) and Baron- Cohen’s ‘Reading the Mind in the Eyes’ test, empathy,
systemizing, and autism-spectrum quotients in a general population sample.
144 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
Personality and Individual Differences, 41, 1481-1491. doi:10.1016/j.paid.
Wakabayashi, A., Sasaki, J., & Ogawa, Y. (2012). Sex differences in two mental
cognitive domains: Empathizing and systemizing in children and adults.
Journal of Individual Difference, 33(1), 24-34.
Wigan, E. (2007). Is general intelligence or social intelligence related to social
network size? (Undergraduate thesis, University of Edinbugh, 2007).
Retrieved from http://www.era.lib.ed.ac.uk/handle/1842/2546
Studies Included in the Current Meta-analysis.
Language of Eyes
Bailey & Henry
AU English S
Bailey & Henry 2008 21 12 33 -.260 .132 AU English
Bailey, Henry &
VonHippel 2008 57 23 80 .054 .061 AU English S
Bailey, Henry &
VonHippel 2008 33 16 49 .374 .094 AU English S
Ring, et al. 2006 6 6 12 .264 .336 UK English R
Ring et al. 2006 6 6 12 .470 .342 UK English R
al. 2001 67 55 122 .108 .033 UK English R
al. 2001 50 53 103 .372 .040 UK English R
Baron-Cohen, et 2007 108 160 268 .147 .015 UK English R
al. 2007 104 43 147 .260 .033 UK English R
Mollenhauer, et 2010 6 15 21 .432 .238 DE
Kirkland, Peterson, Baker, Miller & Pulos EYES TEST 145
Carroll & Yung 2006 12 12 24 .564 .173 UK English R
Carroll & Yung 2006 12 12 24 .896 .183 UK English R
Dudbridge, et al
2009 59 37 96 .125 .044 UK English S
et al. 2008 3 38 41 -.653 .365 US English R
Costanzo, et al. 2010 11 9 20 .471 .226 AR
Costanzo, et al. 2010 7 13 20 -.172 .203 AR
Costanzo, et al. 2010 11 9 20 .493 .208 AR
Schaefer, et al. 2009 12 11 23 -.199 .175 DE
Austin 2010 67 30 97 .132 .048 UK English S
Adams 2010 31 30 61 -.129 .066 US English S
Garrido, Furl, et
al. 2009 11 7 18 -.493 .240 UK English S
Cohen, et al.
Johnson, et al. 2010 62 45 107 .266 .039 US English S
et al. 2010 118 86 204 .046 .020 AU English S
Hall, Hutton, &
Irani, Platek, et
al. 2006 5 5 10 -.168 .402 US English S
Irani, Platek, et
al. 2006 5 5 10 .021 .399 US English S
Appelbaum et al 2007 63 15 78 .500 .084 US English S
Short, & King 2010 79 17 96 .364 .072 US English S
Mar, Oatley, et
146 NORTH AMERICAN JOURNAL OF PSYCHOLOGY
Behan, et al.
2008 33 45 78 -.282 .053 IE Irish S
Meyer 2009 160 82 242 .485 .019 US English S
Meyer & Shean 2006 47 95 142 .352 .031 US English R
Nettle & Liddle 2008 48 48 96 -.287 .042 UK English R
Nichelli 2009 61 59 120 .192 .033 IT Italian R
Fearon 2010 27 19 46 .185 .090 UK English S
Hurtado, et al. 2010 7 11 18 -.209 .235 CL Spanish S
Hurtado, et al. 2010 8 6 14 .041 .292 CL Spanish S
Zingales, et al. 2009 140 317 457 .209 .010 US English S
Bente, et al. 2010 9 11 20 .136 .203 DE German S
Wolf 2009 16 16 32 .000 .125 DE German R
McKinnon, et al 2009 55 24 79 .312 .061 CA English S
Szily & Keri
Tso, Grove, et
al. 2010 10 23 33 .748 .152 US English S
Turkstra 2008 9 10 19 -.048 .212 US English S
et al. 2010 79 65 144 .057 .028 US English R
Dressler 2006 217 206 423 .219 .010 AT German R
2007 33 27 60 .483 .069 UK English R
Note: AR= Argentina; AT= Austria; AU= Austrialia; CA= Canada; CL= Chile;
DE= Germany (Deutschland); HU= Hungary; HUN= Hungarian; IE= Ireland; IT= Italy; R=
Reported; S= Sent; UK= United Kingdom; US= United States; *Thesis;
Copyright of North American Journal of Psychology is the property of North American Journal of Psychology
and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for individual use.