A recent study by Tsukahara et al. (2016) found correlations between pupil size and measures of intelligence, with r values around .30. We attempted to replicate this association in a large dataset of US military personnel (n = 4,462). General intelligence, g, was extracted from 19 diverse tests. We first confirmed that right and left eye pupil size measures are strongly correlated (r = .97), suggesting high measurement reliability for this phenotype. However, unlike Tsukuhara et al., we could establish only small to nonexistent associations between cognitive ability and pupil size (r's-.01 to .06, r with g specifically = .05). Regression analyses, controlling for multiple covariates, revealed that the association in this large representative sample was entirely attributable to confounding with race/ethnicity. Mean pupil size (mm) was 3.56, 3.35, and 3.23 for whites, Hispanics, and blacks, respectively. Relative to whites, this corresponds to effect sizes of 0.22 and 0.34 d. It is unclear why our results differ from those reported by Tsukahara et al. (2016), but the ethnic size rank order suggests an evolutionary explanation in terms of geo-bio-climatic selection for pupil size.
MANKIND QUARTERLY 2020 60:4 525-538
Pupil Size and Intelligence: A Large-Scale Replication
Emil O. W. Kirkegaard*
Ulster Institute for Social Research, London, UK
Helmuth Nyborg
Aarhus University (1968-2007), Denmark
*Corresponding author. Email:
A recent study by Tsukahara et al. (2016) found correlations
between pupil size and measures of intelligence, with r values
around .30. We attempted to replicate this association in a large
dataset of US military personnel (n = 4,462). General intelligence, g,
was extracted from 19 diverse tests. We first confirmed that right and
left eye pupil size measures are strongly correlated (r = .97),
suggesting high measurement reliability for this phenotype.
However, unlike Tsukuhara et al., we could establish only small to
nonexistent associations between cognitive ability and pupil size (r’s
-.01 to .06, r with g specifically = .05). Regression analyses,
controlling for multiple covariates, revealed that the association in
this large representative sample was entirely attributable to
confounding with race/ethnicity. Mean pupil size (mm) was 3.56,
3.35, and 3.23 for whites, Hispanics, and blacks, respectively.
Relative to whites, this corresponds to effect sizes of 0.22 and 0.34
d. It is unclear why our results differ from those reported by
Tsukahara et al. (2016), but the ethnic size rank order suggests an
evolutionary explanation in terms of geo-bio-climatic selection for
pupil size.
Key Words: Intelligence, Cognitive ability, Pupil size, Pupil
diameter, Evolution, Replication
Physical correlates of intelligence have long been of interest to researchers.
Galton already observed a positive relationship between intelligence and height
as early as the 19th century (Galton, 1869). More than a century later, Jensen and
Sinha (1993), in a seminal book chapter, presented a 100-page review of links
between intelligence and a wide variety of physical traits. Of main interest to
researchers have been the relationships of intelligence to height and measures
of brain size and its proxies (Rushton & Ankney, 2009). Less obvious
relationships have also been reported by many others (e.g., blood groups,
compatibility of blood groups between mother and child or between twins, serum
uric acid/gout).
Recently, a study by Tsukahara et al. (2016) attracted considerable attention
(46 citations at the time of writing) by reporting fairly large correlations between
pupil size and intelligence, or rather, selected aspects of intelligence. In the first
sample in their study, they compared 20 subjects with high working memory
capacity (WMC), as measured by 3 tests, to 20 subjects with low WMC. They
measured the pupil size at baseline and found a 1.10 d (0.97 mm) difference in
size. In the second sample, they recruited 114 subjects (roughly half each with
high and low WMC) and measured their pupil sizes three times, weeks apart. At
time 1 (first measurement), the high WMC group had 0.62 mm (0.57 d) larger
pupils at baseline than the low WMC group. Measurement reliability or stability
was fairly high: the correlations between measurements were approximately .80.
In the third sample, they examined 337 subjects with 6 mental tests as well as
pupil size, and split their cognitive tests into two measures, WMC and fluid ability.
Both correlated with pupil size at baseline (r = .24 and .35 respectively). The
control for various demographic characteristics (ethnicity, age, and sex) did not
eliminate these relationships.
This potentially interesting pupil size-intelligence relationship seems fairly
robust, but sample sizes were small and the cognitive testing seemed limited in
scope. Accordingly, we think that their findings deserved a large-scale replication
study. For this we analyzed the relationship between pupil size and intelligence
in a much larger dataset with an extensive battery of very diverse cognitive tests.
Subjects, Methods, and Tests
Archival data were taken from the Vietnam Experience Study (VES, dataset
website The VES is a large
US military study in which a large sample of men were examined at enlistment
between 1965 and 1971 and followed up with an intensive physical,
psychological, psychiatric, and socio-economic examination between 1985 and
1986 (Centers for Disease Control Vietnam Experience Study, 1988a,b,c).
Though the dataset is legally in the public domain (not protected by copyright or
confidentiality agreements), there is no public repository for it. However, various
intelligence researchers have obtained copies of it and used it to test numerous
hypotheses, primarily in the area of cognitive epidemiology (relationships
between health and intelligence; Batty et al., 2008; Gale et al., 2008; Phillips et
al., 2009), but other uses include examining Spearman’s hypothesis (Nyborg &
Jensen, 2000), the longitudinal stability of intelligence (Larsen et al., 2008), and
the predictive validity of intelligence for income and education in different
racial/ethnic groups (Nyborg & Jensen, 2001).
Race/ethnicity in the VES breaks down as follows: 3,654 whites, 525 blacks,
200 Hispanics, 49 Native Americans, and 34 Asians. Mean age at the follow-up
examination was 38 (SD 2.5). Due to the modest sample size of some of these
groups, we concentrated on analyzing only data for the white, black and Hispanic
subsamples. Subjects were given a total of 19 cognitive tests:
1. Grooved Pegboard Test (GPT), right hand: A measure of manual dexterity
and fine motor speed (Ruff & Parker, 1993). The speed score is the reciprocal
of the number of seconds taken to place a set of pegs in a grooved hole as
quickly as possible.
2. GPT, left hand.
3. Paced Auditory Serial Addition Test (PASAT): A measure of mental control,
speed, and computational and attentional abilities (Tombaugh, 2006). The
subject mentally adds a sequence of numbers in rapid succession. Score is
the total number of correct responses.
4. Rey-Osterrieth Complex Figure Drawing (CFD): A measure of visuospatial
ability and memory (Shin et al., 2006). The direct copy score (CFDD) is given
from a subject reproducing a complex spatial figure while the figure is in full
5. CFD, copy from immediate recall: The immediate recall score (CFDI) is given
from a subject reproducing a complex spatial figure immediately after being
shown it.
6. CFD, copy from delayed recall: The delayed recall score (CFDL) is given
from a subject being exposed to a complex spatial figure and, after 20
minutes of other activities, drawing it.
7. Wechsler Adult Intelligence Scale-Revised (WAIS-R), general information: A
test of general knowledge (Leckliter et al., 1986).
8. WAIS-R, block design: A test of spatial ability.
9. Word List Generation Test (WLGT): A measure of verbal fluency. The
subject generates as many words as possible which begin with the letters F,
A, and S for 60 seconds. The score is the total number of words generated.
10. Wisconsin Card Sort Test (WCST): A measure of executive function (Greve
et al., 2005). The score is the ratio of correct responses to countable
11. Wide Range Achievement Test (WRAT): Measures ability to read aloud a
list of single words (untimed) (Witt, 1986).
12. California Verbal Learning Test (CVLT): A measure of verbal learning and
memory (Elwood, 1995). The subject recalls a list of 16 words over 5
repeated learning trials. The score is the total correct over 5 trials.
13. Army Classification Battery (ACB): A verbal test administered at induction
(VE time 1) (Bayroff & Fuchs, 1970).
14. ACB verbal: Administered at the follow-up interview (VE time 2).
15. ACB arithmetic reasoning test: An arithmetic test administered at induction
(AR time 1).
16. ACB arithmetic: Administered at the follow-up interview (AR time 2).
17. Pattern Analysis Test (PAT): A measure of pattern recognition administered
at induction.
18. General Information Test (GIT): A test of general knowledge administered
at induction.
19. Armed Forces Qualification Test (AFQT): A general aptitude battery. This
measure is the total score on four subtests (word knowledge, paragraph
comprehension, arithmetic reasoning, mathematics knowledge)
administered at induction.
Five of the tests (13, 15, 17-19) were given at induction and the remaining at
the follow-up interview.
Pupil size of both eyes was measured as part of further testing for visual
performance and problems during the full examination in 1985-1986. Trained
personnel used the semiautomatic Optec 2000 stereoscopic instrument when
concealed in a closed housing (i.e. no windows to outside or other rooms),
allowing only the tester and the subject being tested to view the targets. This was
a controlled environment such that lighting was the same for all subjects, thus
avoiding any confounding caused by light levels that would otherwise affect pupil
sizes. The illumination inside the apparatus was activated only when the subject
maintained steady forehead pressure during testing. The appendix contains an
excerpt of the manual provided by the CDC about the machine and the visual
testing (Centers for Disease Control, 1989a, p. 331ff). The appendix also provides
a photo of the machine. Finally, during data collection, a small subset of subjects
were remeasured for many variables by different observers. The resulting scores
were then tested for inter-observer variability. However, this was not found for the
vast majority of variables examined, including pupil size (Centers for Disease
Control, 1989b, Table 20).
The correlation between pupil sizes by eye may be taken as an estimate of
measurement reliability though it includes method variance related to the context
and personnel. We found a nearly perfect correlation of .97 between pupil size
across eyes. We then took the average pupil size as the single best measure. It
should be noted that pupil size was measured in whole millimeters, so the data
was only quasi-continuous. Figure 1 shows a histogram of pupil size.
Figure 1. Histogram of pupil size (mm) by eye.
As in prior studies (Nyborg & Jensen, 2000, 2001), we extracted a g factor
from all the tests and scored individuals on it. The subtest g loadings spanned .33
to .85, and the g factor accounted for 42% of the total variance. We similarly
computed gs for the early and later test sessions. The test battery has been found
to be free of racial/ethnic bias with respect to the white, Hispanic, and black
samples (Lasker et al., in prep.). Table 1 provides correlations among cognitive
test g-values and pupil size.
Table 1. Correlation matrix for cognitive tests, g, and pupil size. Time 1 =
enlistment (1965-1971), time 2 = follow-up interview (1985-1986).
Pupil size right
Pupil size left
Pupil size mean
VE time1
AR time1
VE time2
AR time2
Copy direct
Copy immediate
Copy delayed
GPT left
GPT right
g time1
g time2
Pupil size right
Pupil size left
Pupil size mean
The overall pattern of correlations in Table 1 indicates that pupil size is
positively but very weakly related to cognitive ability, no matter how it is estimated.
The standard error is approximately 0.015, so any value above |0.03| has p<.05.
We also carried out regression models to see whether confounding from multiple
diverse variables was an issue. Table 2 presents the results.
Table 2. Regression model results. Dependent variable = pupil size mean
(across eyes, in mm). Standard errors in parentheses. * p<.01; ** p<.005; ***
p<.001. Betas are not standardized. Nonlinear effects modeled with a spline.
race =
0 (ref)
race =
race =
race =
race =
past year
per month
per day
pupil hour
nonlinear nonlinear nonlinear nonlinear
R2 adj.
Our regressions revealed that the association between pupil size and
intelligence was entirely due to confounding with race/ethnicity. There was no
association within groups, whether or not covariates were included. Figure 2
shows the scatterplots of pupil size by race/ethnicity.
Table 3 provides the mean descriptive statistics of key variables by
race/ethnicity. The standardized effect sizes for pupil size are 0.22 and 0.34 for
Hispanics and blacks, respectively, compared with whites.
Figure 2. Scatterplots of relations between pupil size (average across eyes) and
g by race/ethnicity. There are no statistically significant relationships in the plots.
Table 3. Mean ± standard deviation of pupil size (mm) and g by race.
Pupil size left
3.56 ± 0.97
3.34 ± 0.84
3.23 ± 0.98
Pupil size right
3.56 ± 0.96
3.36 ± 0.87
3.23 ± 0.95
Pupil size mean
3.56 ± 0.96
3.35 ± 0.85
3.23 ± 0.96
0.00 ± 1.00
-0.78 ± 0.89
-1.24 ± 0.87
A recent study by Tsukahara et al. (2016) reported that pupil size is relatively
strongly related to intelligence at baseline, correlating about .30. We carried out
a replication study of this potentially interesting observation in a large sample, but
were unable to replicate this relationship. In fact, we find correlations hovering
around zero, with small standard errors. We see no obvious reasons for the
One might suspect that it is due to the relatively crude measure of pupil size
in the VES study when compared to modern equipment, but the high correlation
across eyes (0.97) speaks against this interpretation. Neither do analyses of the
same data by others find null results (Silva et al., 2012; Vanderploeg et al., 2005,
2007), nor does the use of the same equipment by others (e.g. Liou & Chiu, 2001).
The rounding of the pupil size to whole millimeters in our study does not explain
the lack of relationship, as this discretization of the data would be expected to
reduce the correlation only slightly, as can be demonstrated by the Interactive
Statistics Simulator (
discretization). The control for race/ethnicity does not explain the discrepancy,
because Tsukahara et al. also conducted a regression analysis with control for
race/ethnicity, and still find a significant size association with measures of
intelligence. They found that race/ethnicity was related to pupil size, but reported
only ANOVA results, so we cannot see the direction of effect.
We observed a negative association with age across all models (beta = -0.03
in the full model and in the white subsample). This was also observed by
Tsukahara et al., though their slope was about twice as strong (-0.07). Because
we compared the unstandardized slope of age on pupil size (in mm), the age
distribution difference between the samples should not matter, so the source of
the difference is still unclear. We included current smoking status as a covariate,
because Tsukahara et al. found a strong size correlation (-.21) to nicotine use,
but could not establish any relationship to this variable either.
In conclusion, we used large-scale measurements of pupil size and related
the outcome to a large battery of diverse cognitive tests, but were still unable to
establish any meaningful relationship between intelligence and pupil size,
whether or not covariates were adjusted for. We did replicate negative
associations with age.
Interestingly, we established a correlated rank order of pupil size and general
intelligence, g, to race/ethnicity with whites>Hispanics>blacks. This particular
order of race/ethnicity begs an evolutionary explanation. Populations which
migrated farther from the equator would spend relatively more of their waking time
during periods of darkness or dim light than populations living closer to the
equator, and so would be gradually selected for optimizing light intake (see also
Christopher et al., 2013; Pearce & Dunbar, 2012). A test of this climatic
hypothesis is the subject for a forthcoming paper.
We would like to thank the Centers for Disease Control, USA, for collecting
and releasing the Vietnam Experience Study dataset, which continues to offer
much insight for science. Materials and full statistical output from the study can
be found at
Excerpt from test manual for visual examination
The following text is quoted from the medical examination manual for the
visual testing (Health Status of Vietnam Veterans: Supplement C, Medical and
Psychological Procedure Manuals and Forms, p. 331ff). A photo of an identical
machine is reproduced below.
N. Vision Testing
1. Introduction
The procedures outlined in this manual are to be used in conjunction with the physical
assessment to detect abnormalities in vision. Those abnormalities to be evaluated include
near vision, far vision and peripheral vision.
2. Equipment
a. Optec 2000
(1) The Optec 2000 is a precision designed stereoscopic instrument for
measuring visual performance and thereby detecting visual problems. The instrument
is semiautomatic with an illuminated control panel. It weighs 13.5 lbs and can be used
on a desk since it requires less than 2 sq. ft. of space. All tests are concealed in a
closed housing allowing only the tester and the subject being tested to view the
(2) For the operator, all switches are located on one panel within easy reach.
Each switch is illuminated for quick, easy identification. The dial which controls the
slides is located on the side of the instrument.
(3) Some interface features of the Optic 2000 include an advanced light system
which renders a white light, resulting in high contrast images and truer color
reproduction. A built-in baffle assembly isolates the left and right eyes, thus
eliminating unwanted reflective light. By eliminating crossover, true binocular and
monocular tests are guaranteed. The front surface mirror offers a ghost-free image.
Up to 12 test slides can be mounted on a rotatable drum. We will be using only the
near and far vision letter charts in this study.
(4) External features include a forehead trigger which controls illumination inside
the Vision Tester. It will only activate the illumination when the subject maintains
pressure for testing. When forehead pressure is applied to the bar, the green "Ready"
indicator will illuminate and the subject is ready to be tested. The lens system consists
of two lenses. The upper lens is for FAR POINT testing (simulated distance of 20
feet). The lower lens is for NEAR POINT testing (simulated distance of 14 inches).
FAR and NEAR indicator lights indicate how the instrument is set to test, yellow for
FAR and blue for NEAR. The colors will correspond with the FAR/NEAR switch on
the control panel. The test dial is used to change slides in the viewing area. The
numbers on the dial correspond to the numbers on the record form for identifying the
slide test.
5. Procedure
a. Operation of the OPTEC 2000
(1) The unit should be placed on a flat table top.
(2) The power plug should be inserted into a 110-120 V AC power outlet.
(3) To turn the unit power on, press the red switch located on the rear panel to
the "in" position. If the unit is receiving power from the power outlet the switch light
will turn on.
b. Subject Preparation
(1) The participant should be asked if he wears corrective lenses or contact
lenses. If he answers yes, the initial exam of visual acuity for near and far vision
should be performed without corrective or contact lenses. (Participants who wear
contacts are asked not to put in their contacts on Medical Day morning, but to wear
glasses and bring their contacts with them. This request is made during the
orientation, the evening of Arrival Day.) The participant should then be re-examined
with the corrective lenses.
(2) The subject should be informed that he should keep both eyes open at all
times and should always look straight ahead.
(3) Prior to administering the test, the subject should be seated in front of the unit
and the unit height adjusted til conform to the subject's height. This is done by
pressing the light grey button located on the unit base and moving the upper portion
of the unit up or down.
(4) For hygienic purposes, new tissue inserts should be in place on the forehead
trigger for each subject tested.
(5) Prior to administering the vision tests, the subject should place his forehead
firmly against the forehead trigger located at the middle upper edge of the unit.
OPTEC 2000
A photo of the machine is reproduced below.
