ArticlePDF Available

Sex Differences in Intelligence: Developmental Origin Yes, Jensen Effect No


Richard Lynn’s developmental theory of sex differences in intelligence is evaluated using the administration of the Armed Services Vocational Aptitude Battery in the NLSY79. Score increases between age 15 and age 23 are found to be greater in males than in females, supporting an essential element of the theory. On the other hand, neither the sex differences themselves nor their developmental changes are related in any consistent way to the g loadings of the subtests. Therefore sex differences should not be conceptualized as differences in “general” intelligence (g).
MANKIND QUARTERLY 2017 58:1 101-108
Sex Differences in Intelligence:
Developmental Origin Yes, Jensen Effect No
Gerhard Meisenberg*
Ross University Medical School, Dominica
*Address for correspondence:
Richard Lynn’s developmental theory of sex differences in
intelligence is evaluated using the administration of the Armed
Services Vocational Aptitude Battery in the NLSY79. Score
increases between age 15 and age 23 are found to be greater in
males than in females, supporting an essential element of the theory.
On the other hand, neither the sex differences themselves nor their
developmental changes are related in any consistent way to the g
loadings of the subtests. Therefore sex differences should not be
conceptualized as differences in “general” intelligence (g).
Key Words: ASVAB, NLSY, Intelligence, Sex differences, g
loadings, Development
The theory outlined by Richard Lynn in his target article makes two important
testable assumptions. First, it proposes that there are cognitive sex differences
that can be conceptualized meaningfully as differences in general intelligence.
The concept is operationalized either as an IQ calculated as the average (or,
more bombastically, a “unit-weighted factor score”) of subtest scores on a
complex test battery such as the Wechsler tests, or as the unrotated first factor
or first principal component from a factor analysis or principal components
analysis on the subtest scores. The second claim is that sex differences are age-
dependent, with minimal and inconsistent differences in childhood and a male
advantage developing gradually from about age 15 or 16. This developmental
trend is assumed to be related to the later timing of puberty in males than females,
which is associated with later and more prolonged male brain maturation as well
as physical maturation. In the following, I will examine these claims using the
1980 administration of the Armed Services Vocational Aptitude Battery (ASVAB)
in the National Longitudinal Survey of Youth 1979 (NLSY79).
1. The NLSY79 sample
The National Longitudinal Survey of Youth was launched as a prospective
longitudinal survey by the US Department of Labor in 1979. Subjects aged 14-22
years were enrolled. The sample is not entirely representative of the US
population because those from lower socioeconomic backgrounds and some
ethnic/racial minorities were oversampled. However, males and females were
sampled in proportion from each group. The Armed Services Vocational Aptitude
Battery (ASVAB) was administered to the entire cohort in 1980. Complete test
results are available for 5975 males and 5939 females.
2. Properties of the ASVAB
The ASVAB is a vocational aptitude test that is used for screening of
prospective recruits and for assignment to diverse military duties and training
programs in the US armed forces. It is composed of 10 subtests:
1. General Science: Knowledge of physical and biological sciences.
2. Arithmetic Reasoning: Word problems that emphasize reasoning rather than
mathematical knowledge.
3. Word Knowledge: Understanding the meaning of words.
4. Paragraph Comprehension: Understanding the meaning of paragraphs.
5. Numerical Operations: A speed test of mental addition, subtraction,
multiplication and division.
6. Coding Speed: A speed test to match words and numbers.
7. Auto and Shop Info: Knowledge of automobiles, shop practices and use of
8. Mathematics Knowledge: Knowledge and skills in algebra, geometry and
9. Mechanical Comprehension: Understanding of mechanical principles such
as gears, pulleys and hydraulics.
10. Electronics Info: Knowledge of electricity, radio principles and electronics.
These descriptions of the subtests are from Maier & Grafton (1981).
Psychometrically, the ASVAB is a test of crystallized intelligence: acquired
knowledge and skills rather than context-free reasoning ability. As such, it is
closely related to tests of literacy (Marks, 2010).
3. Scaling of scores
Because scores on all subtests increased with age in an approximately linear
fashion, the subtest raw scores were residualized for age and scaled to the IQ
metric, with a mean of 100 and standard deviation of 15. Principal components
analysis of these age-residualized scaled scores produced an unrotated first
principal component (g factor) accounting for 66.1% of the total variance. This g
factor, scaled to the IQ metric, was used as a measure of general intelligence.
1. The (un)importance of g
Table 1 shows the g loadings (correlations with g) of the scaled subtest
scores, and the male and female means and standard deviations on each subtest
and the g factor. The scaling implies that because of the nearly equal numbers of
males and females, male and female scores average out to 100. Because of the
large sample sizes almost all sex differences (d) are statistically significant.
Therefore interpretation of the results should be based on the magnitude of the
differences rather than their statistical significance.
Table 1. g loadings of ASVAB subtests, and sex differences. N = 5,975 males
and 5,939 females. d = standardized sex difference: (♂ mean - mean) /
averaged standard deviation; ** p<.01; *** p<.001, two-tailed. Δ age trend is the
extent to which males gain more than females per year, expressed on the IQ
g loading
♂ mean
± SD
♀ mean
± SD d Δ age trend
1. Science
2. Arithmetic
3. Words
4. Comprehension
5. Numerical Ops.
6. Coding
7. Auto & Shop
8. Math knowledge
9. Mechanical Compr.
10. Electronics Info
The results confirm that males do indeed have higher g than females.
However, we also see that sex differences on 5 of the 10 subtests are larger, and
in some cases far larger, than the differences in g. Average absolute sex
differences are 0.37d (5.6 IQ points) on the subtests, as opposed to 0.23d (3.5
IQ points) on the general factor. This is not expected if the sex differences are
only or even mainly on g.
Another prediction of the hypothesis that sex differences are mainly on
general intelligence is that the sex differences favoring males are larger on those
tests that are the best measures of the general factor, meaning those that
correlate best with g. The actual correlation of the subtest g loadings with the d
values is +.142, which is in the expected direction but not nearly significant:
essentially, a null result. Inspection of the ASVAB subtests shows the nature of
the sex differences. There are five tests with primarily academic content: Science,
Arithmetic, Words, Comprehension, and Math Knowledge; two tests of
psychomotor speed: Numerical Operations, and Coding; and three tests of
vocational knowledge and skills: Auto & Shop Info, Mechanical Comprehension,
and Electronics Info. Sex differences favor males on the vocational tests, females
on the speed tests, and sex differences are small on the academic tests.
Because it can be argued that the vocational subtests are related to specific
experiences and knowledge that men are more exposed to than women, let’s see
what happens to the sex differences when these three tests are omitted. In that
case, the remaining 7 subtests produce a g factor on which females outscore
males by 0.8 IQ points. However, this time the correlation between g loadings and
sex differences is +.693, which comes close to conventional statistical
significance (p=.084). This suggests that males tend to do better on highly g-
loaded tests, and females do better on tests with lower g loadings.
However, we can also argue that psychomotor speed is conceptually
different from intelligence. In dual-processing theories of cognition, quick
responses require automatic processing while intelligence is a property of a slow
processing system (Evans, 2008). What happens to sex differences when the two
speed tests are removed but all others are retained? As expected, the male
advantage on the g factor extracted from the remaining eight subtests is
enhanced: from 3.4 points in the complete ASVAB to 5.2 points when the speed
tests are deleted. In addition, the sign of the correlation between g loadings and
sex differences reverses, to -.447. Thus the answer to the question of whether
tests with higher g loadings favor males or females depends very much on the
composition of the test battery.
2. Changes with age
Let us now examine the developmental trajectory that is proposed by Lynn’s
theory. Table 2 shows how sex differences on the general factor, extracted from
all 10 subtests, change with age. There is no sex difference at age 15, but males
pull ahead of females as they get older. At age 20 and beyond they outscore
females by almost one third of a standard deviation, or 5 IQ points.
Table 2. Sex differences on the general factor extracted from the ASVAB
subtests, by age. d = standardized sex difference: (♂ mean - mean) / averaged
standard deviation.
♂ N
♀ mean ± SD
♀ N
We saw before that, ignoring age, the pattern of sex differences on the
subtests shows no consistent relationship with the subtests’ g loadings. It is
nevertheless possible that, for example, prenatal androgen action creates the
strengths and weaknesses of the sexes on specific subtests while continued brain
development after the age of 15 years creates an omnibus male advantage that
is strongest on tests with higher g loadings. To test whether the greater male than
female improvement in test performance after age 15 is related to the subtests’ g
loadings, simple regressions were performed predicting subtest score with age,
separately for males and females. The unstandardized B coefficients were
recorded for each regression, and the female B coefficient was subtracted from
the male B coefficient. This difference score is taken as the difference in score
gains between males and females, expressed as IQ points gained or lost per
The last column in Table 1 shows the results. On each subtest and the
common factor, the signs are positive indicating that the increase in performance
with rising age is greater in males than in females. The extent of this sex
difference is smallest for Word Knowledge, where it amounts to 0.112 IQ points
per year. This means that between the ages of 15 and 23 years, males gain 0.112
* 8 = 0.896 points relative to females. At the other extreme, male gains on Auto &
Shop Info exceed female gains by as much as 8.232 points. In other words,
between these ages males and females acquire new word knowledge at similar
rates, but males acquire auto and shop knowledge at much higher rates than do
females. Gains on the other subtests are in between, and between the ages of
15 and 23 years males gain 4.39 IQ points relative to females on the common
When the sex difference in score gains in the last column of Table 1 is
correlated with the g loadings of the subtests, we obtain a Pearson’s r of -.492,
which is non-significant. As before, we can exclude the vocational tests and the
speed tests from the analysis. Without the three vocational tests we obtain r = -
.357, and without the two speeded tests we obtain r = -.785. The last of these
correlations is statistically significant at p = .021 with a sample size of 8 tests. The
negative signs of these correlations show that, if anything, the extent to which
score gains of males outpace those of females between the ages of 15 and 23
years tends to be greater on tests with lower g loading. This contradicts the view
that males gain on females in general intelligence during late adolescence.
A look at the first and last data columns in Table 1 shows the reasons for the
negative signs obtained in this exercise. We see that the vocational tests are
those on which males gain much faster than females in late adolescence. These
tests have g loadings that are rather low (Auto & Shop Info) or middling
(Mechanical Comprehension, Electronics Info). After excluding the vocational
tests, the low-g speeded tests show somewhat greater male-versus-female gains
than the academic tests; and when the speeded tests are excluded but the
vocational tests are retained, there is a fairly consistent pattern of vocational tests
having larger male-versus-female gains with age while also having somewhat
lower g loadings.
The results presented in this comment illustrate two aspects of Richard
Lynn’s developmental theory of sex differences in intelligence. The first is that sex
differences are small and/or variable up to the age of about 15 years but that
males tend to pull ahead of females after that age. This part of the theory is
supported, as indicated by the d values in Table 2. Even the final magnitude of
the sex difference, of nearly 5 IQ points, agrees well with results from many other
studies compiled by Lynn. Furthermore, there is some generality to the greater
male than female gains between the ages of 15 and 23 years, in the sense that
these are observed on all subtests (last column of Table 1).
On first sight, the results confirm Lynn’s conclusion that in adulthood, males
outscore females by 4 to 5 points in general intelligence. However, a closer look
at the results shows that the male advantage on the ASVAB is due to the
presence of three subtests that concern vocational skills and knowledge. Without
these three tests, the sex difference is virtually zero. Even in the 20-23 years age
group, where males outscore females by 4.8 points on the complete test, they
score only a negligible 0.3 points higher than females when the vocational tests
are omitted. Furthermore, there is no consistent relationship between the g
loadings of subtests and their sex differences. Sex differences do not show a
Jensen effect. Spearman’s hypothesis, which proposes that score differences
between racial and ethnic groups are largest on the most g-loaded tests (Jensen,
1985), does not apply to sex differences. Therefore sex differences cannot be
explained as differences in a general ability factor, but only as differences in
specialized abilities, at least for the range of abilities that are tested with the
Also the sex differences in subtest score gains, presented in the last column
of Table 1, do not show a Jensen effect. This sex difference is general only in the
sense that males gain faster than females on all subtests, but it cannot be
conceptualized as g. Specifically, we observe that the extent to which yearly gains
are greater in males than females is most pronounced on the three vocational
tests and on numerical operations (mental arithmetic). This suggests that
accelerated male development in these domains is not only the result of faster
overall brain maturation, which would presumably affect all abilities in proportion
to their g loadings. It is better explained by content-specific factors such as greater
male than female exposure to or interest in tools, engines, gears, hydraulics and
On the other hand, we observe that male gains with age exceed those of
females also on the other ASVAB subtests. This indicates that there is a general
component to the sex difference in cognitive trajectories during late adolescence,
although this general component cannot be conceptualized as g. We saw that on
the g factor extracted from all 10 subtests, this difference in developmental
progression accounts for 4.39 IQ points between the ages of 15 and 23 years.
When the g factor is extracted from the seven non-vocational subtests only, this
developmental difference is reduced to 3.13 points. These results suggest that
the true developmental component in Lynn’s developmental theory amounts to
approximately 3 IQ points that males gain on females between the ages of 15 and
23 years, at least on a composite of those abilities that are tested with the ASVAB.
Evans, J.S.B.T. (2008). Dual-processing accounts of reasoning, judgment, and social
cognition. Annual Review of Psychology 59: 255-278.
Jensen, A.J. (1985). The nature of the black-white difference on various psychometric
tests: Spearman’s hypothesis. Behavioral and Brain Sciences 8: 193-263.
Maier, M.H. & Grafton, F.C. (1981). Aptitude Composites for ASVAB 8, 9, and 10 (No.
ARI-RR-1308). Army Research Institute for the Behavioral and Social Sciences,
Alexandria VA.
Marks, D.F. (2010). IQ variations across time, race, and nationality: An artifact of
differences in literacy skills. Psychological Reports 106: 643-664.
... The recent study also highlights the need for traits that are more closely linked to the knowledge-era: "competent", "intelligent", and "persistent". These descriptors have been wellresearched as gender-neutral traits (Duckworth, 2016;Haier, 2004;Meisenberg, 2017). However, researchers have not studied how these traits impact leadership and their role in the social construct of gender. ...
Full-text available
As more women joined the workforce in the last few decades, scholars have continued to research why women do not occupy more senior levels of leadership. While many variables have been researched, a pervasive theory is that women are expected to act in communal ways, but leadership is described as agentic; typically attributed to male behaviors. Namely, women in more senior roles must display male, agentic behaviors to be perceived as a credible leader, yet still maintain their communal traits to avoid being perceived as duplicitous. With more females in the workplace, acting as new exemplars for the millennial workforce, have the views of leadership changed to be less agentic? This quantitative study investigated; whether male millennials in the workforce maintain as agentic a view of leadership as their predecessors, whether female millennials in the workforce maintain as agentic a view of leadership as their predecessors, and whether the presence of women in leadership roles has influenced leadership behaviors in either gender. In this study, millennials are surveyed regarding the most important leadership characteristics and how gender undulates through the perceived effectiveness. The researchers found that leadership descriptors are more gender-agnostic, influencing a broader view of how leadership is seen across both genders. The implications for this finding are that millennials are softening the more traditional view of agentic leadership and expanding leadership to include more communal traits.
Real-life outcomes for men and women suggest the existence of cognitive sex differences, but the evidence for a sex difference in general intelligence is equivocal. Here, we examine the role of spatial ability for IQ test performance, in light of the developmental hypothesis that male performance increases more than female across adolescence. Using longitudinal data from Block and Block data set on the Wechsler scales and the rod-and-frame test (RFT) for ages 4 (N = 108), 11 (N = 101), and 18 years (N = 100), we find that males' performance becomes greater than females' with age, both on IQ and the RFT. At 18 years of age, males' mean IQ and RFT score was 116.4 and 4.05 (lower scores representing less error), as compared to111.5 and 7.85 for females. Importantly, we found that the RFT mediates the sex difference in IQ, and that the factor loadings of the RFT on the g factor increases with age, from −0.06 at age 4 to −0.52 at 11 and −0.67 at age 18. In conclusion, g becomes more integrative of spatial ability across time and this finding may explain sex differences in g after puberty and potentially has interesting implications for the understanding of the development of intelligence. One important direction for future research is to incorporate biologically based pubertal neural changes into our understanding of developmental sex differences in intelligence.
Full-text available
A body of data on IQ collected over 50 years has revealed that average population IQ varies across time, race, and nationality. An explanation for these differences may be that intelligence test performance requires literacy skills not present in all people to the same extent. In eight analyses, population mean full scale IQ and literacy scores yielded correlations ranging from .79 to .99. In cohort studies, significantly larger improvements in IQ occurred in the lower half of the IQ distribution, affecting the distribution variance and skewness in the predicted manner. In addition, three Verbal subscales on the WAIS show the largest Flynn effect sizes and all four Verbal subscales are among those showing the highest racial IQ differences. This pattern of findings supports the hypothesis that both secular and racial differences in intelligence test scores have an environmental explanation: secular and racial differences in IQ are an artifact of variation in literacy skills. These findings suggest that racial IQ distributions will converge if opportunities are equalized for different population groups to achieve the same high level of literacy skills. Social justice requires more effective implementation of policies and programs designed to eliminate inequities in IQ and literacy.
Aptitude composites for the Armed Services Vocational Aptitude Battery (ASVAB) were developed using training success and Skill Qualification Test (SQT) scores, measures of job proficiency, as the criterion. The aptitude composites had high validity in the range .52 to .75 for predicting job proficiency. Criticisms of the usefulness of SQTs as measures of job proficiency are addressed.
Although the black and white populations in the United States differ, on average, by about one standard deviation (equivalent to 15 IQ points) on current IQ tests, they differ by various amounts on different tests. The present study examines the nature of the highly variable black–white difference across diverse tests and indicates the major systematic source of this between-population variation, namely, Spearman's g. Charles Spearman originally suggested in 1927 that the varying magnitude of the mean difference between black and white populations on a variety of mental tests is directly related to the size of the test's loading on g, the general factor common to all complex tests of mental ability. Eleven large-scale studies, each comprising anywhere from 6 to 13 diverse tests, show a significant and substantial correlation between tests' g loadings and the mean black–white difference (expressed in standard score units) on the various tests. Hence, in accord with Spearman's hypothesis, the average black–white difference on diverse mental tests may be interpreted as chiefly a difference in g, rather than as a difference in the more specific sources of test score variance associated with any particular informational content, scholastic knowledge, specific acquired skill, or type of test. The results of recent chronometric studies of relatively simple cognitive tasks suggest that the g factor is related, at least in part, to the speed and efficiency of certain basic information-processing capacities. The consistent relationship of these processing variables to g and to Spearman's hypothesis suggests the hypothesis that the differences between black and white populations in the rate of information processing may account for a part of the average black–white difference on standard IQ tests and their educational and occupational correlates.
This article reviews a diverse set of proposals for dual processing in higher cognition within largely disconnected literatures in cognitive and social psychology. All these theories have in common the distinction between cognitive processes that are fast, automatic, and unconscious and those that are slow, deliberative, and conscious. A number of authors have recently suggested that there may be two architecturally (and evolutionarily) distinct cognitive systems underlying these dual-process accounts. However, it emerges that (a) there are multiple kinds of implicit processes described by different theorists and (b) not all of the proposed attributes of the two kinds of processing can be sensibly mapped on to two systems as currently conceived. It is suggested that while some dual-process theories are concerned with parallel competing processes involving explicit and implicit knowledge systems, others are concerned with the influence of preconscious processes that contextualize and shape deliberative reasoning and decision-making.